Diabetes Data Analysis using Apache Spark and Power BI
Sam Saji Aamir Khan Hardik Tiwari
202101052 2022012005 202101061
BDA Mini Project
Department of Computer Engineering
XIE
University of Mumbai
2024-25
Name of the project/Thesis
Problem Statement:
• Diabetes data management is hindered by the inability of existing systems to
efficiently process and analyze vast, diverse healthcare data from various
sources in real-time.
• The lack of integration of advanced analytics, such as machine learning,
limits predictive capabilities and delays personalized treatment interventions,
compromising effective patient care.
• A scalable and intelligent solution is needed to handle high-velocity data
streams, cohesively analyze structured and unstructured data, and deliver
insights for improved diabetes management outcomes.
DEPT. OF COMPUTER ENGINEERING XAVIER INSTITUTE OF ENGINEERING 2
Name of the project/Thesis
Literature Survey:
Traditional Healthcare Databases (SQL-based)
Hadoop-Based Solutions
Proprietary Healthcare Analytics Platforms
Wearable Device Data Integration Systems
DEPT. OF COMPUTER ENGINEERING XAVIER INSTITUTE OF ENGINEERING 3
Name of the project/Thesis
Limitations of Existing System:
Scalability Issues
High Cost and Limited Flexibility
Lack of User-Friendly Visualization Tools
DEPT. OF COMPUTER ENGINEERING XAVIER INSTITUTE OF ENGINEERING 4
Name of the project/Thesis
Functions/ Features:
Data Ingestion and Storage
Real-Time Data Processing
Data Visualization
Simplified Architecture
Visually Appealing Dashboards with friendly UI
DEPT. OF COMPUTER ENGINEERING XAVIER INSTITUTE OF ENGINEERING 5
Technology Stack: Name of the project/Thesis
• Apache Spark
• Python
• Power BI
• Anvil
DEPT. OF COMPUTER ENGINEERING XAVIER INSTITUTE OF ENGINEERING 6
Name of the project/Thesis
Diagram:
DEPT. OF COMPUTER ENGINEERING XAVIER INSTITUTE OF ENGINEERING 7
Name of the project/Thesis
Conclusion:
• The proposed system leverages big data analytics, machine learning, and
insights to improve diabetes management.
• It integrates Apache Spark, Python, and Power BI for distributed processing,
machine learning, and data visualization, providing a scalable platform for
healthcare data analysis.
• Predictive modeling helps healthcare providers foresee potential
complications and monitor patient trends in real time for personalized care.
• The real-time processing and visualization empower clinicians with up-to-
date information for faster, informed decision-making.
• The system's modular and scalable architecture ensures adaptability to future
healthcare data needs and technology advancements, with potential to expand
into other healthcare areas.
DEPT. OF COMPUTER ENGINEERING XAVIER INSTITUTE OF ENGINEERING 8
Name of the project/Thesis
Output:
DEPT. OF COMPUTER ENGINEERING XAVIER INSTITUTE OF ENGINEERING 9
Name of the project/Thesis
References:
• Apache Spark for Analysis of Electronic Health Records: A Case
Study of Diabetes Management
• Big Data Analytics for Diabetes Prediction on Apache Spark
DEPT. OF COMPUTER ENGINEERING XAVIER INSTITUTE OF ENGINEERING 10
Name of the project/Thesis
Thank You!!!
DEPT. OF COMPUTER ENGINEERING XAVIER INSTITUTE OF ENGINEERING 11