Disaster Casualties Prediction with ML
Disaster Casualties Prediction with ML
The project Disaster Casualties Prediction Using ML Tools for Enhanced Resource
Planning project accommodates the pressing need for effective disaster management through
advanced machine learning approaches. The system forecasts critical disaster parameters in
terms of casualty counts, affected people, and disaster durations through models such as
XGBoost, Random Forest Regressor, SVR, and LightGBM [1,2]. Based on the data obtained
through these forecasts, efficient resource planning, such as the deployment of medical kits,
food, and shelter, can be maximized to address the emergencies promptly.
The system is supported by large-volume processing of data from datasets of varied sources
such as EMDAT [4], Kaggle [5], OpenWeatherAPI [6], and OpenCageAPI [7], supplemented
by synthetic data produced with CTGAN [8,10]. This supports precise modeling of a broad set
of disaster situations like earthquakes, epidemics, and floods. Automated resource allocation,
inventory management, processing of SOS requests, and real-time alert are some of the main
features [2,9].
The web-based application provides role-based access to authorities, response teams, and
general users, encouraging coordination and communication during a disaster [3]. Real-time
predictions, supported by weather and geographical location data, allow the authorities to
make informed decisions at the right time, minimize the damage, and ensure efficient
deployment of resources [6,7]. The system is data-driven, scale-able solution for disaster
management issues of the current era.
i
List of Tables
ii
List of Figures
iii
Table of Contents
Abstract..........................................................................................................................……………………… i
List of Tables.........................................................................................................................……………….…ii
List of Figures.........................................................................................................................………………...iii
Table of Content...........................................................................................……….….............................…..iv-v
1. Introduction
1.1 Problem Statement.............................................................................................................… 1
1.2 Objectives.........................................................................................................….........…… 1
1.3 Significance of the Study......................................................................….....…….………… 2
2. Literature Review.................................................................………..................................................3-4
3. Methodology
3.1 Description
3.1.1 Product Perspective.................................................................................................…….. 5
3.1.2 Product Features................................................................................................………… 5
3.1.3 User Classes and Characteristics..............................................….....................………. 6-8
iv
3.7.3 Handling Missing Data Using KNN Imputation...............................….....….......…..26
3.7.4 Data Synthesis Using CTGAN....................................................................…......26-27
3.7.5 Model Training and Validation.......................................................................………27
3.7.6 Deployment..................................................................................…………..........27-28
4. Results
4.1 Overall Product Diagram……………………………………………………………………35
v
1. Introduction
Natural disasters continue to be one of humanity's greatest challenges, causing enormous losses of
lives, infrastructure, and economic losses worldwide. In the Indian scenario, India's geographical
diversity and population size make it most vulnerable to a variety of disasters like earthquakes, floods,
cyclones, and epidemics. Statistics from the EM-DAT (Emergency Events Database) show that India
has experienced over 780 significant disaster events between 1900 and 2020, affecting millions of
people and causing heavy economic losses.
Conventional disaster management efforts have been mostly of a reactive nature, with the central
focus being placed on relief after disasters, rather than anticipation, warning mechanisms, and
resource management. The reactivity has, in the majority of cases, resulted in greater casualties and
devastation than must have been incurred with enhanced predictive methods and better resource
allocation protocols.
The advancement of information technology, particularly in the areas of machine learning, web
technology, and data analysis, provides unprecedented opportunities to transform disaster
management from a largely reactive framework to a much more proactive and data-driven process. By
using historical disaster records, environmental factors, and advanced predictive models, it has now
become possible to develop systems that can predict disaster impacts with greater accuracy, thus
making it easier to implement better preparation and response strategies.
Limited Predictive Capabilities: Present systems are generally not able to predict casualties, affected
population sizes, and disaster durations reasonably well from initial parameters.
Data Fragmentation: Disaster data is usually fragmented across various platforms and formats,
which makes it challenging and time-consuming to analyze in entirety.
Resource Inefficiencies: In the absence of realistic impact projections, emergency resources are often
improperly allocated, resulting in shortages in the most affected areas and surpluses in less-affected
areas.
Communication Gaps: During disasters, communication between governments and victims is always
inadequate, and this complicates evacuation and relief efforts.
Data Scarcity: There is not enough historical data for certain types of disasters and locations, making
it difficult to utilize traditional machine learning techniques.
This project aims at solving these problems through the development of an integrated disaster
management system that integrates state-of-the-art machine learning algorithms for prediction with a
web-based large-scale platform for information sharing, emergency reporting, and coordination of
resources.
1.2 Objectives
The main objectives of this program are:
To create reliable predictive models for various types of disasters (earthquakes, floods, and epidemics)
to predict casualty numbers, population affected, and duration of disaster.
In order to counteract the issue of data shortage with cutting-edge synthetic data generation methods,
to improve performance without sacrificing statistical significance.
To design and deploy a complete web-based system incorporating prediction models, emergency
reporting systems, alarm systems, and modules for resource management.
To create a strong and efficient API framework that facilitates seamless data exchange among
different system components and potentially with third parties.
The aim is to offer an easy, welcoming interface through which administrative users and the general
public can access relevant information and services in the case of a disaster.
1
1.3 Significance of the Study
This study contributes to disaster management in a number of ways:
Improved Forecast Accuracy: Using specific machine learning algorithms for different types of
disasters, the system provides more accurate forecasts of potential effects, thus enabling increased
preparedness and strategic response planning.
Data synthesis technology, in the form of the employment of Conditional Tabular Generative
Adversarial Networks (CTGAN), is a revolutionary mechanism for addressing data limitations in
disaster management research.
Integrated Platform Development: Unlike most of the existing systems that focus on specific areas
of disaster management, the project offers an integrated platform that addresses the whole gamut of
disaster management, from prediction to recovery.
Practical Applicability: The system has been designed with an emphasis on practical deployment,
incorporating practical attributes like mechanisms for reporting emergencies, alarm systems, and
administrative control.
2
2. Literature Review
The discipline of disaster management has evolved considerably in the last decades. Early research by
Quarantelli[11] established the fundamental framework of disaster response and preparedness,
emphasizing the need for coordination among the stakeholders. Alexander[12] expanded this
theoretical framework with the four-phase cycle of disaster management, including mitigation,
preparedness, response, and recovery, which has been the traditional model in disaster management
literature.
The introduction of information technology called for academic research into the function of
technology in disaster management. Comfort et al.[13] were the first to venture into the function of
information systems in the coordination of disaster responses. Their work cemented the importance of
timely information dissemination in reducing response time during crises, a factor that has become the
hallmark of modern paradigms of disaster management.
The application of machine learning methods to the field of disaster management is a major milestone
for the discipline. Nayak and Dutta[14] showed the application of predictive analytics to historical
disaster data to contribute to the prediction of possible effects. In their research, regression models
were employed to predict flood damage, considering meteorological and geographical factors, and
these attained accuracy levels ranging from 78% to 85%.
In earthquake forecasting, Asim et al. [15] applied Random Forest and XGBoost to seismic data and
achieved Mean Absolute Error (MAE) of a maximum of 0.32 in the estimation of magnitude. This is
closely comparable to our use of XGBoost in the prediction of earthquake casualties, where we
achieved an MAE of 1.52 in the estimation of deaths.
In epidemic forecasting, Santillana et al. [16] were able to successfully apply machine learning
algorithms to predict influenza outbreaks based on search engine query data and social media trends.
These models possessed an appreciable lead time benefit of 2 to 3 weeks over traditional surveillance
systems. This study had a significant impact on our epidemic prediction strategy, with environmental
parameters like temperature and humidity being included as predictors.
Careem et al.[17] actually tested FastAPI as an emergency management system framework taking
advantage of its performance benefits over conventional frameworks such as Flask and Django. Their
tests showed response times 2-3 times that of other comparable frameworks, and due to this reason,
we have selected FastAPI for our backend services.
The issue of not having disaster data has been investigated within a number of research papers. Rao et
al.[18] illustrated the functionality of applying Generative Adversarial Networks (GANs) for synthetic
data generation to enhance sparse disaster data sets. Results indicated that models trained on real and
synthetic data performed better than models trained on real data alone, with an improvement in the
accuracy of predictions of 12-18%.
The application of the Conditional Tabular Generative Adversarial Network (CTGAN) for generating
synthetic disaster data is based on the research of Xu et al.[19], who directly developed the technique
for tabular data set generation. Through their experiments, CTGAN was shown to preserve statistical
properties more effectively than the traditional oversampling methods, thereby guaranteeing the
column correlation and distributions that are essential to accurate modeling of the impact of disasters.
Earlier research has shown the capability of integrated disaster management systems within the web-
based environment. Kumar et al.[20] have suggested an end-to-end disaster management system that
combined GIS, sensor networks, and mobile apps. They proved that integrated platforms reduced the
time taken to access information by 65% in simulated emergency scenarios as opposed to
conventional approaches.
Similarly, Hassan and Chen[21] proposed an emergency management web-based system with
predictive capabilities for floods and earthquakes. Their system showed a 30% increase in the
effectiveness of resource allocation during field tests. Our research supports our motivation for
developing an end-to-end platform combining predictive models, emergency reports, and alert
systems.
3
The importance of feature engineering in disaster impact prediction has been highlighted by many
scholars. Mohammadi et al.[22] showed that interaction features like the magnitude-depth interaction
for earthquakes improved the accuracy of prediction by 22% compared to models using raw features.
That observation informed our feature engineering for active tectonics earthquakes where we
incorporated similar interaction features.
In flood forecasting, researchers Wang et al.[23] showed that the incorporation of temporal features
like disaster duration highly enhanced forecasting models. They found that duration was significantly
correlated with economic loss (r=0.67) and casualty counts (r=0.42). This supports our incorporation
of the "Days Difference" feature of start and end disaster event dates.
Missing data in disaster data has been a problem that has been researched by many scholars. Kabir et
al.[24] compared different imputation methods for disaster data and found that K-Nearest Neighbors
(KNN) imputation was better at preserving statistical relationships compared to mean or median
imputation, with an improvement of 17% in the accuracy of the resulting models. This finding
justifies our selection of using KNN imputation to handle missing values in the EM-DAT dataset.
In addition, Wang and Chen [25] undertook a specific research on missing geospatial information in
disaster reports and described methods of inferring location from surrounding context data. Their
approach achieved a 89% success rate in location prediction, and this corroborates our way of using
external APIs like OpenCage to complete missing location information.
Security aspects of disaster management systems have been studied by various researchers. Zhang et
al.[26] evaluated various authentication mechanisms for emergency response systems and promoted
the use of JWT-based authentication because of its reasonable trade-off between security and
performance at high loads. Their test of performance showed that JWT authentication maintained
response times under 150 milliseconds even when handling 1000 requests per second, thus justifying
our implementation choice.
Although previous research has touched on numerous areas of disaster management systems, the
disconnect still exists when it comes to integrating machine learning prediction models into
comprehensive web-based platforms for both administrative and public stakeholders. This research
fills this disconnect by:
The system that we developed leverages the theoretical underpinnings developed in the past literature
and meets the challenges of practical implementation using contemporary software engineering and
machine learning methods.
4
3. Methodology
3.1 Description
3.1.1 Product Perspective
The DMRAS system implements advanced technologies, such as machine learning and real-time data
to infrastructure owned by central government agencies. It uses collision prediction, resource
distribution, and relief driving algorithms to manage predictive and post disaster tasks. Furthermore, it
maintains relevant communication with Central Authorities, Regional Administrators, Response
Teams, and Public Users. The system has data connectors with several external databases like
OpenWeatherAPI and EMDAT.
1. Impact Prediction: The subsystem handles the different weather types, geolocation, and
disasters to create disaster forecasts.
2. Distribution: Responsible for monitoring and managing vital resources.
3. Communications: Serving as a means to alert every of the relevant stakeholders.
4. Response Coordination: Related to prompting action from designated groups or individuals to
manage the various authorities and teams to work simultaneously.
5. User SOS requests by means of a web framework to demand for help, remove, resting place
and health services.
1. Disaster Tracking and Management: Monitors ongoing disasters to predict their impact
(deaths, affected individuals, homeless populations).
4. User Management: Role-based access for different user types and maintaining public user
data for personalized support.
5. SOS Requests and Emergency Response: Allows users to request rescue, shelter, or
medical aid via the web interface. Requests are forwarded to response teams and regional
admins for immediate action.
These features enable efficient, transparent, and effective disaster response and resource management.
5
3.1.3 User Classes
CentralAuthority
Collaborators
Responsibilities
Allocate disaster recovery assignments for admins. RegionalAdmin
Supervise and control resource allocation request. ResourceAllocator
NotificationService
Send out emergency announcements.
Table 1: Central Authority class.
RegionalAdmin
Collaborators
Responsibilities
Control and responsively manage sub regional CentralAuthority
disaster management.
ResourceAllocator
Supervise and control resource allocation request ResponseTeam
for their region. Forward division responses and
supervise teams.
Provide on site assistance and carry out rescue
missions.
Table 2: Regional Admin class.
ResponseTeam
Collaborators
Responsibilities
Provide on site assistance and carry out rescue RegionalAdmin
missions.
MonitoringSystem
Report the status of the devastated regions and ResourceAllocator
tell what tasks have been done.
Follow orders.
Table 3: Response Team class.
ResourceAllocator
Collaborators
Responsibilities
Assign resources according to the predictive Resource Predictor
needs analysis.
Inventory Service
Track active resources and those in different
Resource Monitor
locations.
Location Service
Track active resources and those in different
locations.
Table 4: Resource Allocation class.
6
ResourceMonitor
Collaborators
Responsibilities
Monitor the region with ready to use emergency ResourceAllocator
resources. Resource Monitor
Adjust the inventory when resources are added or
removed.
Table 5: Resource monitor class.
InventoryService
Collaborators
Responsibilities
Monitor the region with ready to use emergency ResourceAllocator
resources. Resource Monitor
Adjust the inventory when resources are added or
removed.
MonitoringSystem
Collaborators
Responsibilities
Monitor in real time for the affected regions. Sensor Device
Control information for prisoners scale of deaths Data Dashboard
and damages information shown.
Authority
Provide live updates to authorities Device Controller
Table 7: Monitoring System class.
NotificationService
Collaborators
Responsibilities
Receive SOS signal from user User
Forward SOS signal data to emergency Emergency Services
services NotificationService
ResourcePredictor
Collaborators
Responsibilities
Receive SOS signal from user User
Forward SOS signal data to emergency Emergency Services
services NotificationService
7
SOSService
Collaborators
Responsibilities
Receive SOS signal from user User
Forward SOS signal data to emergency Emergency Services
services NotificationService
DamageAssessment
Collaborators
Responsibilities
Predict damage of potential disaster ML model
Employ past event history to guide Historical data
predictions Authority
ResourceAvailability
Collaborators
Responsibilities
Show resources available for residents Public Interface
PublicUser
Collaborators
Responsibilities
Get access alerts, resources, and evacuation
plans AlertSystem, EvacuationPlanner
SOSService
Send SOS message if necessary HospitalLocator
Display location-based hospital and
resource details
Table 13: Public User class.
AlertSystem
Collaborators
Responsibilities
Send public alert AlertSystem, EvacuationPlanner
Customize alert by location and severity SOSService
HospitalLocator
Alert by location and severity Ensure
delivery to all communication avenues
Table 14: Alert System class.
8
3.2 System Features
This section presents the primary system features, their functionality, priority, and relationship
to different system components
Description: It employs historical data and machine learning algorithms to forecast the probable
damage from a calamity (e.g., number of fatalities, infrastructure loss). It notifies Authorities in
case the forecasted damage crosses specified thresholds.
Priority: Medium - Facilitates authorities to take preventive measures on the basis of damage
forecasts.
REQ-1: The system should predict likely damage from disaster data and machine learning
models.
REQ-2: The system should alert Authorities if forecasted damage surpasses a threshold.
REQ-3: The system should present Authorities with actionable advice based on the forecasted
damage.
9
[Link] Description and Priority
Description: The system forecast the quantity of resources needed (e.g., shelter, food, medical
kits) in case of a disaster and allocates it automatically. RegionalAdmins validate or invalidate
the allocation.
Priority: High — Prompt allocation and monitoring of the resources are essential in successful
disaster response.
Description: The system tracks the inventory of the critical resources at all times. When the
inventory falls below critical levels, the system alerts RegionalAdmins to procure more
resources so that disaster response operations are not hindered by stockouts.
Priority: High — A sufficient amount of resources in the event of a successful and long-term
disaster response is important.
10
[Link] Functional Requirements
REQ-1: The system should monitor current levels of resources (e.g., food, shelter, medical
supplies).
REQ-2: The system must automatically update the inventory whenever resources are assigned
or stocked.
REQ-3: The system should alert RegionalAdmins when the inventory levels go below set levels.
REQ-4: The system must enable RegionalAdmins to make further resource requests when they
are notified of low stock.
REQ-5: The system should present a graphical view of resource status and stock levels.
Description: The system issues advisories and notifications to stakeholders regarding disasters,
deployment of resources, and evacuation protocols.
Priority:High — Public safety and disaster management are guaranteed by timely
communication.
Stimulus: There is a new disaster occurrence or the situation of the existing disaster chang
Response: The system sends alerts to RegionalAdmins, ResponseTeams, and PublicUsers.
3.2.5 Seeking Rescue, Medical Care, and Shelter via Web Interface
PublicUsers can also request specialized disaster response services such as rescue, medical aid, and
shelter via the web interface.
Description:PublicUsers can submit rescue operation requests, medical assistance, and shelter
requests in disaster situations via an easy-to-use web interface.
Priority: High —Ensures affected individuals can access crucial relief during a disaster.
11
[Link] Stimulus/Response Sequences
Stimulus: PublicUser triggers a rescue, medical aid, or shelter request via the web interface.
Response: The request is routed and notifies the correct ResponseTeams or authorities.
REQ-1: PublicUsers must be provided an interface to request shelter, medical assistance, and
rescue.
REQ-2: The system must notify ResponseTeams or RegionalAdmins whenever a request is
made.
REQ-3: The system should track the status of the request and provide progress notifications to
PublicUsers.
Description: This role is responsible for controlling user roles and permissions to restrict
sensitive disaster-related information to approved users only.
CentralAuthority:
RegionalAdmins:
ResponseTeams:
PublicUsers:
Obtain help and receive disaster warnings through the web interface.
Priority: Medium — Gives secure access to the system.
12
Response: The user profile is created by the system and provides role-based access.
Description: PublicUsers can send an SOS message through the web interface to request
emergency services such as rescue, medical, or protection.
Priority: High — Facilitates timely support for PublicUsers in emergency situations, enhancing
response time and coordination.
REQ-1: PublicUsers must be able to send SOS messages via the web interface.
REQ-2: The system must be capable of receiving and forwarding SOS requests to the
respective ResponseTeam or RegionalAdmin.
REQ-3: The system must append the user's location and the type of SOS request to the message
being sent to ResponseTeams.
REQ-4: The system must keep a record of SOS request status and inform PublicUsers
whenever their requests are being processed.
13
3.3 External Interface Requirements
Access Method: Web-based user interface accessed using desktops, laptops, tablets, and
smartphones.
Access Roles:
CentralAuthority: Complete control over the system and access to everything.
RegionalAdmins: Disaster data access at the regional level, resource authorization, and
SOS requests.
ResponseTeams: Task assignments, resource status, and SOS requests access.
PublicUsers: Can request SOS help, see alerts, and track resource availability.
14
Machine Learning Models: Methods like XGBoost, Random Forest Regressor, LightGBM, and
SVR are employed to forecast disaster damage and impacted people.
Database Integration:
SQL Database — Store disaster event logs, user information, inventory information, allocation
information, and SOS requests.
15
Disaster Management Database Design
16
Sequence Diagram
17
3.5 Initial Data Preperation
Data preparation is a critical step for the majority of machine learning initiatives. In the interest of
predicting disaster fatalities and optimizing resource distribution, raw disaster data must be converted
into a format suitable for analysis. The following report provides data cleaning, processing, and
enrichment of EM-DAT (Emergency Events Database) data for machine learning use in regression,
classification, and clustering.
The data set has 783 records and 46 columns, and every column mentions different facets of disasters in
India. They are event characteristics in terms of type, location, population affected, and economic
damage. This report provides a complete description of the steps followed in data preparation and the
reasons behind it.
The dataset was obtained from EM-DAT, which is renowned for providing world disaster statistics with
high accuracy. It is an exhaustive record of disasters happening in India between 1900 and 2020,
including various types of disasters, their impacts, and their metadata. Observing trends, predicting
casualties, and facilitating the allocation of resources in the event of any future occurrences of disasters
are the primary purposes of employing this dataset.
The raw data needed to be preprocessed and cleaned extensively to overcome these issues and make it
more informative.
18
3.5.2 Data Cleaning and Preprocessing
Reconstruction Costs: Over 90% of entries were missing, making it unreliable for analysis.
AID Contribution: Also had the problem with too many null values.
By dropping such columns, the dataset was preprocessed, removing noise and computational burden.
Why One-Hot Encoding? One-hot encoding was employed since it avoids the introduction of a spurious
ordinal relationship among categories.
Why KNN Imputation? Previous techniques like mean or median imputation do not preserve
relationships between features. KNN leverages patterns in the data and makes more accurate and
contextually relevant imputations. 6.6 Disaster-Specific Datasets Given the dataset's diversity, certain
attributes were irrelevant to specific disaster types. For instance: River Basin is significant in terms of
floods but not earthquakes. Magnitude is extremely significant in earthquakes but not in drought. To
address this, the data set was partitioned into subsets, each representing a particular type of disaster.
This focused approach eliminated noise and kept useful features only for further investigation.
19
noise and ensured that only applicable features were used in the subsequent analysis. Feature
Engineering
This formula is an outer join operation in which df and df1 are the two dataframes to be merged based
on the 'Year' column. The merged_df resulting from this operation has all the records of both
dataframes, with NaNs for the missing values of one dataframe.
To fill missing columns like 'Depth' and 'EQ Primary', we combined values in both sets:
These equations use the [Link]() function to fill missing values by seeking non-NaN values in either
data set.
20
[Link] Handling Coordinates:
For the rows where we were missing latitude and longitude, we divided the 'Coordinates' column:
Here, [Link]() function is used to divide the coordinates into latitude and longitude, and then both are
converted into float to enable numerical operations.
These conditional assignments use logical tests to preserve values from the source datasets whenever
possible.
This formula calculates the percentage of missing values in every column and removes columns
with greater than 25% missing values.
These commands print the unique values in the corresponding columns, from which we can see
their structure and distribution.
Logarithmic Magnitude
This transformation allows for one-sided distributions and avoids zero issues.
21
Magnitude-Depth Interaction
This parameter includes the combined influence of size and depth on earthquake impact.
Depth Categories
Depth was placed in bins for more convenient interpretation:
Epicenter Distance
The distance from the disaster epicenter was calculated as:
One-Hot Encoding:
It creates binary indicator variables for 'Depth_Category', enhancing the ability of the model to process
categorical data.
Confirmed synthetically generated data by comparing distributions to the original data and identified
significant overlap.
22
Figure 4: Plot of Original Data Distribution vs Synthetic Data (Earthquake)
This equation adopts the K-Nearest Neighbors approach to fill in missing values with the closest
neighbors in feature space.
This formula normalizes numerical characteristics to have a mean of 0 and a standard deviation of 1.
This equation splits the dataset into training and test sets (80% train, 20% test) to check model
performance.
[Link] Model Training: We tried various models, such as Random Forest, Support Vector Regressor
(SVR), and XGBoost:
Equation for Random Forest:
23
Equation for XGBoost:
Here, ntest is the number of test examples and |ytest[i] - modelpred[i]| cis the absolute error for each
test example.
This formula calculates the model with the minimum MAE (Mean Absolute Error) for better
predictability.
24
Figure 5: Plot of Model Performance (Earthquake)
3.7 Working with Epidemic
Disasters, including epidemics, have far-reaching effects on human populations, straddling health,
resources, and infrastructure. Disaster management predictive models are useful since they allow for
early intervention, efficient allocation of resources, and mitigating adverse effects. The aim of this
project is thus to develop machine learning models for casualty and disaster duration prediction from
various factors of influence.
After missing data handling, date columns were combined with pandas' datetime feature to form 'Start
Date' and 'End Date'. The data conversion enabled the calculation of the 'Days Difference' between the
start date and the end date in order to calculate the period of each disaster. The formula used while
calculating 'Days Difference' was:
This approach presented a distinct time frame, which made it simple to examine the effect and recovery
time for every disaster.
Weather data obtained were combined with the master dataset, allowing each record of a disaster to
hold corresponding weather data. Categorical data (weather conditions like 'haze', 'rain', 'clear sky', etc.)
were one-hot encoded. This conversion transformed these text categories into numerical values that
could be processed easily using machine learning models. The formula used for one-hot encoding is:
This conversion was used to minimize the dataset's dimensionality while preserving the categorical data
necessary for correct prediction.
25
3.7.3 Handling Missing Data Using KNN Imputation:
Missing values, particularly those influenced by outliers, were managed using K-Nearest Neighbors
(KNN) imputation. The KNN model estimates the missing value by finding the closest 'k' neighbors and
using the average of the target column's value. This step was effective in preserving the inter-feature
relationships where points were missing.
KNN imputer was applied to the complete dataset and filled missing values in a way that the overall
statistical properties of the data were preserved. This step was critical in ensuring the models learned on
complete and consistent data.
The algorithm is based on calculating the mean or median of the 'k' nearest neighbors for each missing
value, calculated as:
After imputation, the data were validated and converted into the correct data types, particularly ensuring
categorical data were encoded and numerical data were formatted correctly.
By training the CTGAN on the raw data, we were successful in creating a new dataset that possessed
the same statistical properties, enabling us to train models more robustly. The synthetic dataset
produced was then used to test the performance of the model in predicting results on new data.
The CTGAN model uses adversarial training techniques to generate the data. It does so by transforming
the original data distribution to a latent space and then back to the original space, such that the
generated data possesses the same statistic distribution as the original data. This minimizes the
possibility of overfitting as more variability is added to the training data.
26
8.5 Model Training and Validation:
The data was split into train (80%) and test (20%) sets to check the performance of the model on
unknown data. The split helps avoid over-fitting of the model to the train data and provides a better
estimate of its performance in actual use cases.
Various regression models, i.e., Lasso, Ridge, XGB (Extreme Gradient Boosting), and Random Forest,
were employed to predict outcomes such as Total Deaths, Total Affected, and Duration (Days
Difference). Performance metrics such as Mean Absolute Error (MAE) and Root Mean Squared Error
(RMSE) were employed to check each model: Total Deaths: The best-performing result was achieved
using the Lasso regression model with an MAE of 76.36 and RMSE of 89.29.
Total Affected: The best result was obtained using the Ridge regression model with an MAE of 5807.44
and an RMSE of 6473.76.
Duration (Days Difference): The best result was achieved using the Random Forest regression model
with an MAE of 11.05 and an RMSE of 12.77.
These figures were critical in determining the performance of the models, and lower figures of MAE
and RMSE signify better model performance.
3.7.6 Deployment:
An API was created using FastAPI for real-time prediction. The API takes user inputs like temperature,
humidity, classification, subtype, event, and weather condition. It gives predicted values for total deaths,
total affected, and the duration of the disaster.
27
This integration of the API enables disaster management officials and public health agencies to utilize
the predictive models directly in their decision-making, giving actionable insights to disaster
preparedness as well as response.
Date Handling:
For missing dates, use fill values from start to end with the following formulas:
If End Month is missing, default it to Start Month.
If End Day is null, set it to 30.
If End Year is missing, then set it to Start Year.
28
Convert these dates to datetime format to get the difference between the end date and the start date with:
To do one-hot encoding, convert categorical variables into distinct binary columns. For example, taking
a categorical column like Disaster Subtype and converting it into distinct columns.
Where Q1 and Q3 are the first and third quartiles respectively. Substitute values outside this range by
NaN.
29
Figure 8: Plot of Original Data Distribution vs Synthetic Data (Flood)
Train these models on the preprocessed data set, ensuring you validate their performance using
metrics such as Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).
[Link] Evaluation:
Calculate the mean absolute error of the predicted results against actual values:
Utilize these metrics to compare the performance of different models and data (synthetic vs.
original).
30
Total Affected: XGBRegressor stood at 5.26 MAE and 40.13 RMSE, and
RandomForestRegressor (n_estimators=100) stood at MAE 5.07 and RMSE 5.77.
Days Difference: XGBRegressor and RandomForestRegressor gave the MAE values of 11.29
and 11.05, respectively.
31
[Link] Processing:
It transforms the input using OneHotEncoders and Scalers
Predicts various outcomes based on differing models
[Link] Output:
A JSON object with estimated: Death, Injured, Affected, and Homeless
[Link] Processing:
Assigns categorical variables and scales numerical ones
Predicts several outcomes using XGBoost and Random Forest models
[Link] Output:
A JSON object with: Death, Homeless, Affected, Total Day
[Link] Processing:
It encodes categorical data and accepts numerical inputs Makes predictions using pre-
trained models
[Link] Output:
A JSON object with: Death, Affected, Total Day
32
3.10 Backend Endpoints and API Architecture
The backend of the Disaster Management System is built with FastAPI, a new high-performance web
framework for building web APIs using Python. It consists of several functionalities like user
management, disaster prediction models, emergency services, alerts, and administrative control.
33
3.10.3 Authentication and Authorization
Users are authenticated using JWT tokens.
After authentication, a token is issued with an expiration time (default 60 minutes).
Secure paths require a valid token and can be utilized to authenticate user roles (user, admin,
super_admin).
34
4. Result
4.1 Overall Product Diagram
35
4.2 System Architecture
4.2.1 Multi-Tier Architecture
The application boasts a sound three-tier architecture with:
Presentation Level: User interfaces for User, Admin, and Super Admin based on role
Application Tier: Business logic that manages request processing, resource management,
and forecasting procedures
Data Tier: Database systems that store user information, resource stockpiles, and `
disaster histories
This model offers adequate escalation channels and responsibility and ensures effective
operation in crisis situations.
36
4.3 Landing Page
The landing page serves as the portal to the Disaster Management System. It is user experience-focused
and provides a simple and visually appealing interface that familiarizes users with the application's
primary features.
4.3.2 Purpose
The landing page aims not only to give users an overview of what the application can offer
but also to inform them and lead them to discover more about more in-depth functionalities
concerning disaster forecasting, response, and recovery.
37
4.4 Secure Login System
The web service also includes a role-based and secure login service to verify users, admins, and super
admins. The authentication service includes
38
4.4.2 Sign-In (Login) Functionality
The Sign-In page serves as an entry point to role-based dashboards (User, Admin, Super Admin).
[Link] Dashboard:
User Dashboard
Admin Dashboard
Super Admin Dashboard
39
4.5 User Module Features
4.5.1 Dashboard
40
Requests Flow Diagram
41
[Link] Report Emergency
Allows users to report losses following a disaster or claim compensation assistance.
Fields:
- Description of damage
- Help required (e.g., money, rebuilding)
- Post evidence image
- Auto-location retrieval
Verification Step: Admin needs to assign a field team for field verification.
Warning: Clients are notified of legal consequences in the event of providing false or
misleading information.
42
4.5.2 My Request Section
Displays all the requests made (Food, Shelter, Medical, Emergency Reports).
Each request can have one of three statuses:
Pending: Default on submission
Approved: After being approved by the admin
Rejected: When denied (reason shown)
Each request has:
- Timestamp
- Request Type
- Location
- Admin action log
43
4.6 Admin Module
4.6.1 Admin Dashboard
44
[Link] Super Admin Fund Request
Admins can request funds by demonstrating:
- Required quantity
- Purpose (e.g., buying materials, remodeling houses)
In this section we predict the disaster casualties there are 3 type of disaster we are currently
looking into Earthquake, Flood and epidemic.
[Link] Earthquake
where in earthquake we are getting magnitude, depth and location coordinate which can also
fetch current location coordinates automatically, and state also a filed of OFDA/ BHA response
after predicting impact it shows estimated death, injured affected, homeless, and total affected
population.
[Link] Flood
In flood we have input filed of longitude, latitude, disaster sub type, classification on which
disaster to be classified according to admin, location and origin where it started from and other
associated event related to flood which on predicting provides total death, total affected,
number of homeless, total day flood will remain and total affected population.
Inputs: Latitude and Longitude, Disaster Subtype, Origin and associated event
Outputs: Deaths estimated, Number of days flood will last, Number of homeless, Total
affected
45
[Link] Epidemic
There is another epidemic disaster predictor which requires temperature, humidity, location
coordinates, disaster type, syndrome of disaster and current weather condition which on
prediction provides total death, total affected population, duration till disaster will remain and
total affected population.
Integration: After prediction these disaster we send it to dashboard for considering as concern
disaster and accordingly dashboard section of number of resources required will be change.
47
4.4 Super Admin Module
4.4.1 Dashboard
Worldwide overview of all the districts and cities under their jurisdiction.
Exhibits:
- Resources available by city
- Fund Utilization
- Projected
- Allocated
- Available
Admin Requests Outstanding
- Regional admin asks for resources or money
- Approve or reject with audit trail
48
4.4.2 Admin Management
Super Admin can create and remove Admins.
Admins are assigned through email IDs.
Every admin is associated with a place.
49
5. CONCLUSION AND FUTURE SCOPE
The project Disaster Casualties Prediction Using ML Tools for Enhanced Resource
Planning highlights the capability of machine learning to revolutionize disaster management
practices. Utilizing the synergy among advanced models and multi-source data sets, the system
provides accurate predictions for casualties, affected population, and resources needed to
enable authorities to make strategic resource allocation decisions.
The system's modular architecture ensures harmonious coordination among the stakeholders,
ranging from central authorities to ground response teams. With its web interface, the project
provides real-time data visualization and actionable insights to enable users to make informed
decisions in the event of emergencies. The application of synthetic data generation techniques
and role-based access control further enhances its reliability and usability, rendering it a
scalable solution in disaster preparedness and response. However, disaster management is an
evolving and dynamic process with numerous avenues for further growth.
The future could see the system's advancement towards including real-time data from IoT
sensors, satellites, and social media to improve responsiveness and accuracy. Making the
system accessible globally, with regional data and support for additional languages, would
make it an asset for global disaster management practices. Incorporating ethical AI practices
and explainable machine learning models would also lead to transparent and equitable
decision-making, particularly in multicultural and vulnerable societies. Field responders' and
affected populations' mobile applications development and high-tech visualization technologies
such as geospatial mapping can further enhance usability and accessibility.
Additionally, simulation testing and sophisticated resource optimization models may make the
system more robust and responsive to sophisticated, cascading disasters. Overall, this project is
a critical component of modernizing disaster management using technology. By addressing the
issues of the present and capitalizing on the promise of the future, it has the potential to be an
all-inclusive and globally scalable tool for disaster mitigation and recovery.
50
6. REFERENCES
1. Ochoa, K.S., & Comes, T. (2021). A Machine learning approach for rapid disaster response based on multi-
modal data. ArXiv, abs/2108.00887.
2. Linardos, V., Drakaki, M., Tzionas, P., & Karnavas, Y.L. (2022). Machine Learning in Disaster Management:
Recent Developments in Methods and Applications. Machine Learning and Knowledge Extraction, 4(2), 446-473.
3. Gin, J.L., Levine, C.A., Canavan, D., & Dobalian, A. (2022). Including Homeless Populations in Disaster
Preparedness, Planning, and Response: A Toolkit for Practitioners. J Public Health Manag Pract, 28(1), E62-E72.
8. Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. (2014). Generative Adversarial Networks. Communications
of the ACM.
9. Andrei, M., & Feitosa, H. (2020). KNN-Based Imputation Techniques in Machine Learning Models for Disaster
Data Prediction. Journal of Data Science.
10. Dattatreya, A., & Lemos, A. (2023). Synthetic Data Generation for Disaster Prediction Models: A
Comparative Study. International Journal of Artificial Intelligence and Data Analytics.
11. Quarantelli, E.L. (1988) Disaster Crisis Management: A Summary of Research Findings. Journal of
Management Studies, 25, 373-385. [Link]
13. Comfort, Louise & Ko, Kilkon & Zagorecki, Adam. (2004). Coordination in Rapidly Evolving Disaster
Response Systems: The Role of Information. American Behavioral Scientist - AMER BEHAV SCI. 48. 295-313.
10.1177/0002764204268987.
14. Nayak, M. A., & Dutta, S. (2017). Prediction of flood using machine learning techniques. International Journal
of Computer Applications, 165(11), 13-19.
15. Asim, Khawaja & Martínez-Álvarez, Francisco & Basit, Abdul & Iqbal, Talat. (2017). Earthquake magnitude
prediction in Hindukush region using machine learning techniques. Natural Hazards. 85. 471-486.
10.1007/s11069-016-2579-3.
16. Santillana, Mauricio & Nguyen, Andre & Dredze, Mark & Paul, Michael & Brownstein, John. (2015).
Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance. PLoS
computational biology. 11. 10.1371/[Link].1004513.
17. Careem, M., De Silva, C., & De Zoysa, K. (2019). Performance evaluation of web frameworks for emergency
management systems. Journal of Emergency Management, 17(4), 321-335.
18. Figueira, A., & Vaz, B. (2022). Survey on Synthetic Data Generation, Evaluation Methods and GANs.
Mathematics, 10(15), 2733. [Link]
51
19. Xu, Lei & Skoularidou, Maria & Cuesta-Infante, Alfredo & Veeramachaneni, Kalyan. (2019). Modeling
Tabular data using Conditional GAN. 10.48550/arXiv.1907.00503.
20. Kumar, S., Barbier, G., Abbasi, M. A., & Liu, H. (2017). TweetTracker: An integrated system for real-time
analysis of Twitter feeds during disasters. Journal of Homeland Security and Emergency Management, 14(1), 1-25.
21. Hassan, M. M., & Chen, F. (2020). An integrated web-based decision support system for disaster management.
Disaster Prevention and Management, 29(3), 372-394.
22. Mohammadi, A., Yazdani, N., & Askarnejad, A. (2018). Feature engineering in earthquake casualty estimation:
The importance of interaction effects. Natural Hazards, 92(3), 1545-1569.
23. Wang, Zhaoli & Lai, Chengguang & Chen, Xiaohong & Yang, Bing & Zhao, Shiwei & Bai, Xiaoyan. (2015).
Flood hazard risk assessment model based on random forest. Journal of Hydrology. 527. 1130-1141.
10.1016/[Link].2015.06.008.
24. Kabir, E., Guikema, S. D., & Kane, B. (2021). Statistical methods for handling missing data in disaster
damage assessment. Risk Analysis, 41(6), 970-985.
25. Wang, J., & Chen, Y. (2019). A hybrid approach for missing geospatial data imputation in disaster datasets.
Computers, Environment and Urban Systems, 78, 101397.
26. Zhang, L., Wu, Y., & Lei, G. (2018). Authentication mechanisms for emergency response information systems
under high-load conditions. IEEE Transactions on Information Forensics and Security, 13(12), 3089-3103.
52
7. ANNEXURE
A. MODEL CREATION
ONE HOT ENCODING
ohe_loc = OneHotEncoder(sparse=False, drop='first')
encoded = ohe_loc.fit_transform(df[cat_cols]).astype(int)
KNN IMPUTATION
knn_imputer = KNNImputer(n_neighbors=5)
df[columns_for_knn] = knn_imputer.fit_transform(df[columns_for_knn])
LOG TRANSFORMATION
df['Log_Magnitude'] = np.log1p(df['Magnitude'])
df['Magnitude_Depth_Interaction'] = df['Magnitude'] * df['Depth']
BIN CREATION
df['Depth_Category'] = [Link](df['Depth'], bins=[0, 10, 30, 50, 100, [Link]], labels=['Very
Shallow', 'Shallow', 'Intermediate', 'Deep', 'Very Deep'])
EPICENTER
df['Epicenter_Distance'] = [Link](df['Latitude_x']**2 + df['Longitude_x']**2)
DATE FILLING
def date_filling(df):
df['End Month'] = df['End Month'].fillna(df['Start Month'])
df['End Day'] = df['End Day'].fillna(30)
df['End Year'] = df['End Year'].fillna(df['Start Year'])
date_columns = ['Start Year', 'Start Month', 'Start Day', 'End Year', 'End Month', 'End
Day']
df[date_columns] = df[date_columns].fillna(1)
df[date_columns] = df[date_columns].astype(int)
return df
df=date_filling(df)
TOTAL DAYS
df['Start Date'] = pd.to_datetime(df[['Start Year', 'Start Month', 'Start Day']].astype(str).agg('-
'.join, axis=1), format='%Y-%m-%d')
df['End Date'] = pd.to_datetime(df[['End Year', 'End Month', 'End Day']].astype(str).agg('-'.join,
axis=1), format='%Y-%m-%d')
df['Days Difference'] = (df['End Date'] - df['Start Date']).[Link]
53
if data['results']:
lat = data['results'][0]['geometry']['lat']
lng = data['results'][0]['geometry']['lng']
return lat, lng
else:
return None, None
df['LatitudeAPI'], df['LongitudeAPI'] = zip(*df['Location'].apply(get_lat_long))
WEATHER INFORMATION
def get_weather(location):
base_url = f"[Link]
params = {
'q': location,
'appid': API_KEY,
'units': 'metric' #Temperature
}
try:
response = [Link](base_url, params=params)
if response.status_code == 200:
weather_data = [Link]()
temperature = weather_data['main']['temp']
humidity = weather_data['main']['humidity']
weather_condition = weather_data['weather'][0]['description']
return {'temperature': temperature, 'humidity': humidity, 'weather': weather_condition}
else:
return {'temperature': None, 'humidity': None, 'weather': None}
except Exception as e:
print(f"Error fetching weather data for {location}: {e}")
return {'temperature': None, 'humidity': None, 'weather': None}
54
SCALING
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = [Link](X_test)
OUTLIER REMOVAL
Q1 = df['Total Deaths'].quantile(0.25)
Q3 = df['Total Deaths'].quantile(0.75)
IQR = Q3 - Q1
df_no_outliers = df[(df['Total Deaths'] >= lower_bound) & (df['Total Deaths'] <= upper_bound)]
LINEAR REGRESSION
from sklearn.linear_model import LinearRegression
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
LASSO REGRESSION
lasso = Lasso()
param_grid_lasso = {
'alpha': [0.1, 1.0, 10.0, 100.0],
}
RIDGE REGRESION
ridge = Ridge()
param_grid_ridge = {
'alpha': [0.1, 1.0, 10.0, 100.0],
}
random_search = RandomizedSearchCV(
rf_model, param_distributions, n_iter=10, cv=3, scoring='neg_mean_squared_error',
random_state=42
)
55
'learning_rate': [0.01, 0.1, 0.2],
'subsample': [0.8, 1.0],
'colsample_bytree': [0.8, 1.0]
}
grid_search = GridSearchCV(estimator=[Link](objective='reg:squarederror',
random_state=42),
param_grid=param_grid, scoring='neg_mean_squared_error', cv=3, verbose=1)
SVR
svr_model = SVR(kernel='rbf')
svr_model.fit(X_train, y_train)
LIGHT GBM
lgb_affected = [Link](objective='regression', n_estimators=100,
random_state=42)
MLP
model = Sequential()
[Link](Dense(64, activation='relu', input_dim=X_train.shape[1])) # Input layer
[Link](Dense(32, activation='relu')) # Hidden layer
[Link](Dense(1)) # Output layer
[Link](optimizer='adam', loss='mean_squared_error')
[Link](X_train, y_train, epochs=100, batch_size=32, verbose=1)
DROPOUT
[Link](Dropout(0.2))
EARLY STOPPING
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
MODEL PICKLING
import pickle
with open('xgb_death.pkl', 'wb') as file:
[Link](xgb_death, file)
B. BACKEND
DATABASE SESSION INITIALIZATION
[Link].create_all(bind=engine)
db = SessionLocal()
admin = [Link](User).filter([Link] == "admin@[Link]").first()
DATABASE CONNECTION
engine = create_engine(URL_DB, connect_args={'check_same_thread': False})
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
Base = declarative_base() # This is the base class for all the models
def get_db():
db = SessionLocal()
try:
yield db
finally:
[Link]()
56
AUTHENTICATION
SECRET_KEY = [Link]("SECRET_KEY")
ALGORITHM = "HS256"
ACCESS_TOKEN_EXPIRE_MINUTES = 60
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/api/token")
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
VERIFY TOKEN
payload = [Link](token, SECRET_KEY, algorithms=[ALGORITHM])
APPROVE REQUEST
request = [Link]([Link]).filter_by(id=request_id).first()
[Link] = "approved"
[Link]()
[Link](request)
REJECT REQUEST
request = [Link]([Link]).filter_by(id=request_id).first()
[Link] = "rejected"
[Link]()
[Link](request)
57
SEND ALERT
db_alert = [Link](
title=[Link],
message=[Link],
type=[Link],
created_by=current_user.id
)
[Link](db_alert)
[Link]()
[Link](db_alert)
GET ALERT
alerts =
[Link]([Link]).order_by( [Link].created_at.desc()).offset(skip).limit(limit).all()
LIST ADMIN
admins = [Link]([Link]).filter([Link] == [Link]).all()
ADD ADMIN
user = [Link]([Link]).filter([Link] == email).first()
[Link] = [Link]
[Link]()
[Link](user)
DELETE ADMIN
user = [Link]([Link]).filter([Link] == email).first()
[Link] = [Link]
[Link]()
[Link](user)
REPORT EMERGENCIES
report = [Link](
description=description,
latitude=latitude,
longitude=longitude,
image_url=image_path,
user_id=[Link],
)
[Link](report)
[Link]()
[Link](report)
MIDDLEWARE
app.add_middleware(
CORSMiddleware,
allow_origins=["[Link]
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
58
The Report is Generated by DrillBit Plagiarism Detection Software
Submission Information
Result Information
Similarity 7%
1 10 20 30 40 50 60 70 80 90
Journal/ 3.12%
Publicatio
n 3.65% Ref/Bib
9.4%