data-cleaning-and-preprocessing

Here are 550 public repositories matching this topic...

CyberMatic-AmAn / cleaneasy

CleanEasy is a powerful, user-friendly Python library designed to simplify data cleaning and preprocessing for data scientists and analysts

data-science data pipeline python3 data-analysis data-analysis-python data-cleaning-pipeline data-cleaning-and-preprocessing

Updated Jul 5, 2025
Python

venkat-0706 / Sugarcane-Production

Star

This exploratory data analysis (EDA) project focuses on examining sugarcane production data. Through this analysis, we seek to gain valuable insights into factors influencing sugarcane production, develop predictive models for future yields, and ultimately support efforts to optimize production efficiency and sustainability.

data-science machine-learning data-mining exploratory-data-analysis jupyter-notebook data-visualization data-analysis data-cleaning-and-preprocessing

Updated Aug 31, 2024
Jupyter Notebook

venkat-0706 / Black-Friday

Star

Black Friday Sales Analysis explores customer demographics, purchasing behaviors, and product trends to uncover insights and patterns driving sales during Black Friday events.

python data-mining numpy pandas data-visualization data-analysis predictive-analytics matplotlib-pyplot customer-behavior-analysis data-cleaning-and-preprocessing

Updated Dec 15, 2024
Jupyter Notebook

Mindful-AI-Assistants / SP2024-Election-Analysis

Sponsor

Star

📊 An analysis of voting patterns in São Paulo's 2024 elections, focusing on voter behavior, absenteeism, and geographic trends.

python data-science maps geolocation power-bi dashboards data-analysis dataset-creation beautifulsoup geolocator datavisualization geolocalization web-scraping-python data-cleaning-and-preprocessing election-sp-brazil-2024 oneness-consciousness

Updated May 4, 2026
HTML

SaloniJhalani / Food-Delivery-Time-Prediction-Model

Star

Leveraging advanced data cleaning techniques and feature engineering, a robust food delivery prediction model was developed using regression algorithms.

python machine-learning regression feature-engineering streamlit data-cleaning-and-preprocessing

Updated Jun 27, 2023
Jupyter Notebook

Pratiikpy / Data-science-cheatsheet

Star

Welcome to my data science repository! Here you will find a collection of resources and examples for exploring, analyzing, and manipulating data using Python. The repository includes code templates, case studies, and exercises to help you learn and practice data science concepts and techniques. The topics covered include data exploration, data visu

data-cleaning-and-preprocessing data-visualization-matplotlib-seaborn data-manipulation-pandas data-reporting-jupyter-notebooks web-scraping-beautiful-soup advanced-data-manipulation-transformation collaborating-git-github

Updated Jan 2, 2023

DablewCodes / Data-Cleaning-Samples

Star

A repository where I keep all of my data cleaning samples/portfolio items.

excel datasets data-cleaning excel-data-analytics data-cleaning-and-preprocessing

Updated Apr 11, 2023

aliiimaher / Laptop-Price-Prediction

Star

This is an AI model for predicting laptop price, trained on about 1200 data.

ai linear-regression linear-algebra price-prediction-model data-cleaning-and-preprocessing

Updated Jul 29, 2024
Python

Rama-Mwenda / Indian_Ecosystem_Analysis

Star

A project analyzing the Indian startup ecosystem between 2018 and 2021.

data-visualization data-analysis-python business-understanding power-bi-dashboard data-cleaning-and-preprocessing

Updated Mar 25, 2024
Jupyter Notebook

Opikadash / world-bank-powerbi-dashboard

Star

Developed a 3-page Power BI dashboard (global and Asian overview) using Python scripts to load and clean World Bank data (1960–2020), reducing data processing time by 25\%. and Containerized the database in Docker, enabling scalable access, and visualized trends (e.g., 3\% annual GDP growth in Asia), enhancing stakeholder insights.

database data-visualization python3 data-analysis sqlite3 data-modelling data-cleaning-and-preprocessing powerbi-dashboards power-query-interactive-dashboard-visualization

Updated Mar 17, 2026
Python

PatilNi3 / PROJECT_POWER_BI

Star

Global Superstore BI Dashboard

database dashboard report powerbi data-cleaning-and-preprocessing

Updated Nov 12, 2022

MAGICS-LAB / SMUTF

Star

[Information System] SMUTF: Schema Matching Using Generative Tags and Hybrid Features

data-science feature-extraction lightgbm schema-matching data-cleaning hxl tag-generation lightgbm-models lightgbm-classifier t5-model llm data-cleaning-and-preprocessing hdxsm

Updated May 4, 2025
Python

Daniel-Andarge / AiML-ethiopian-medical-biz-datawarehouse

Star

The Ethiopian Medical Business Data Warehouse & Analytics Platform is a comprehensive data solution tailored to enhance the efficiency and efficacy of Ethiopia's healthcare and medical sectors.

python sqlalchemy sql etl postgresql ci-cd pytest data-modeling image-detection dpt data-warehousing etl-pipeline jupiter-notebook fastapi yolov5 data-cleaning-and-preprocessing

Updated Apr 2, 2025
Jupyter Notebook

ericsun153 / Illuminating_US_Outage_Landscape

Star

Power Outage Data Analysis in USA

markdown-editor eda hypothesis-testing pandas-library permutation-test plotly-express missing-data-imputation data-cleaning-and-preprocessing

Updated May 19, 2023
Jupyter Notebook

dhvani-k / F1_Race_Winner_Prediction

Star

EDA and Prediction of F1 Race WInners

python data machine-learning formula1 eda neural-networks xgboost decision-trees f1 data-cleaning decision-tree-classifier random-forest-classifier svc-model vizualization fastapi streamlit guassian-naive-bayes data-cleaning-and-preprocessing

Updated Jun 7, 2023
Jupyter Notebook

VivekAgrawl / medical-appointments-analysis

Star

This project involves analyzing real-world medical appointment data through Time Series Analysis. The tasks include dataset cleaning, comprehensive analysis, and extracting insights using Python and MySQL.

python sql data-analysis data-cleaning-and-preprocessing

Updated Aug 17, 2023
Python

Akashborse3 / Gear-Box-Fault-Diagnosis-Using-Machine-Learning-and-Deep-Learning

Star

Designed and implemented machine learning and deep learning models to diagnose gearbox faults. Preprocessed sensor data, engineered features, and trained models using techniques like SVM, random forests, LSTM and naive bias. Evaluated model performance and optimized hyperparameters to achieve high diagnostic accuracy.

python data-science machine-learning-algorithms data-analysis data-cleaning-and-preprocessing

Updated Jun 25, 2024
Jupyter Notebook

pouyasattari / HR-Dataset-Analysis

Star

A comprehensive Data analysis project using SQL for data cleaning and pre-processing and Tableau for visualization, focusing on key HR KPIs. Features interactive dashboards and detailed insights.

sql data-analysis tableau kpis data-cleaning-and-preprocessing

Updated Jun 29, 2024

jim60105 / image-dataset-prep-tools

Star

Scripts for cleaning, converting, and managing image datasets for ML training. (Zsh/Python)

python zsh ml data-cleaning data-cleaning-and-preprocessing

Updated Jun 14, 2026
Shell

omari-kd / TransBorder-Freight-Data-Analysis

Star

This project analyses transportation data from the Bureau of Transportation Statistics (BTS) to uncover insights into cross-border freight's efficiency, safety and environmental impacts across road, rail, air and water modes.

data-science data-visualization data-analysis powerbi data-analysis-in-r data-cleaning-and-preprocessing

Updated May 27, 2025
R

Improve this page

Add a description, image, and links to the data-cleaning-and-preprocessing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-cleaning-and-preprocessing topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-cleaning-and-preprocessing

Here are 550 public repositories matching this topic...

CyberMatic-AmAn / cleaneasy

venkat-0706 / Sugarcane-Production

venkat-0706 / Black-Friday

Mindful-AI-Assistants / SP2024-Election-Analysis

SaloniJhalani / Food-Delivery-Time-Prediction-Model

Pratiikpy / Data-science-cheatsheet

DablewCodes / Data-Cleaning-Samples

aliiimaher / Laptop-Price-Prediction

Rama-Mwenda / Indian_Ecosystem_Analysis

Opikadash / world-bank-powerbi-dashboard

PatilNi3 / PROJECT_POWER_BI

MAGICS-LAB / SMUTF

Daniel-Andarge / AiML-ethiopian-medical-biz-datawarehouse

ericsun153 / Illuminating_US_Outage_Landscape

dhvani-k / F1_Race_Winner_Prediction

VivekAgrawl / medical-appointments-analysis

Akashborse3 / Gear-Box-Fault-Diagnosis-Using-Machine-Learning-and-Deep-Learning

pouyasattari / HR-Dataset-Analysis

jim60105 / image-dataset-prep-tools

omari-kd / TransBorder-Freight-Data-Analysis

Improve this page

Add this topic to your repo