Optimize Concrete Mix with ML Models
Optimize Concrete Mix with ML Models
BY
(CVE/20/5523)
MAY 2025
CERTIFICATION
I certify that this project was carried out by ADEBAYO Adeoba Eleazar (CVE/20/5523)
under the supervision of DR. S. L. Akingbonmire of the Civil Engineering Department,
Federal University of Technology, Akure, Ondo State, Nigeria in partial fulfilment of the
requirement for the award of Bachelor of Engineering ([Link].) in Civil Engineering.
________________________ ________________________
(Supervisor)
________________________ ________________________
(Head of Department)
DEDICATION
This project is dedicated to Almighty God.
ACKNOWLEDGEMENTS
CHAPTER ONE
1. INTRODUCTION
Concrete is among the most significant and basic material in the current civil engineering and
construction due to its strength, endurance, and flexibility. Its application includes high-rise
buildings and bridges, dams and pavements among others. The structure of a concrete
mixture that comprises of correct proportioning of water, cement and the aggregates, as well
as additional cementitious materials and admixtures is however a complex process.
Traditionally, it is one of the skills-based practices, which are based on the integration of
theoretical knowledge, already existing empirical standards, and practical experience that is
achieved through the wide range of trial-and-error laboratory experiments (Barbhuiya and
Sharif, 2024).
Although these old-fashioned approaches have been tested and proved, they are unproductive
in nature. The time-consuming, expensive, and frequently non-linear nature of this iterative
process of preparing trial batches, curing them, and testing their properties itself does not
only make this process time-intensive and expensive but also fails to reproduce the
interaction of the many constituent materials in a non-linear fashion. This may give over-
engineered or underperforming mix designs that may create a loss of material, or create a
structural integrity hazard. The unpredictability of raw materials also makes this task even
harder, as it is hard to get the same outcome.
Over the recent years, the trend towards more sustainable and cost-effective aspects of
building activities has required a paradigm shift. The industry is actively seeking smart and
data-driven solutions that can help in streamlining the designing process, waste reduction,
and predictive accuracy. This research addresses this urgent requirement by exploring the
application of artificial intelligence (AI), specifically machine learning (ML). Machine
learning provides a versatile, transformative solution to the computational power that can be
used to identify complex patterns in large datasets, which can model the complex relationship
of concrete mixes and thus offers a more efficient, more accurate, and sustainable solution to
predict concrete properties before a single trial batch has been cast (Altunci, 2024). This
paper attempts to give a clear and convincing argument in support of this contemporary
practice.
1.2 Problem Statement
The traditional approach to concrete mix design, which strongly relies on empirical equations
and trial-and-error laboratory experiments, is essentially constrained by the very complex and
non-linear correlation that exists between the material constituents of the concrete mix and
their final physical and mechanical properties (Barbhuiya and Sharif, 2024). This is a
phenomenon of very complicated interactions between elements such as cement, water, and
other aggregates thus making it immensely hard to predict properly and reliably the
performance of a mix without a lot of physical experiments in practice.
This dependence on a reactive test-based method poses some serious problems to civil
engineering and construction industries. First, it will lead to a manual and time-consuming
procedure, as the engineers will have to run a lot of trial batches to reach the required
specifications, which will push the project timelines. Secondly, absence of an effective
predictive tool will tend to cause an over-design whereby excessive materials are used to
ensure strength and safety margins, resulting in unnecessary material waste and increased
costs.
Also, it prevents the successful incorporation of sustainable materials, including the addition
of supplementary cementitious materials or recycled aggregates, since their influence on the
concrete properties is difficult to model using traditional methods (Altunci, 2024). Finally,
the lack of a modern, data-driven predictive tool will also keep the industry in a pattern that
consumes resources that could be used to improve the construction process and meet the
increasing demands of faster, cheaper, and environmentally friendly construction. This study
aims at trying to solve this underlying issue by offering a more effective and dependable
substitute.
The primary aim of this study is to predict a modelling approach for optimizing concrete mix
design using machine learning algorithms.
ii. to prepare and analyse a detailed dataset of concrete mix designs and their
corresponding properties.
iii. to train and evaluate several machine learning models, including ensemble methods,
justify the selection based on (R2, MAE, MSE) to concrete properties.
This research is justified by its potential to transform the practice of concrete mix designs .
The inefficiencies of conventional testing are overcome by this study by taking the traditional
empirical approaches to a data-driven one. It illustrates a way to quickly and cost-effectively
test a large selection of mix proportions, therefore eliminating the physical testing of large
volumes of material and waste of materials. The final product of this research, a working web
application, is an effective resource to civil engineers and students, helping them make the
process of design iterations faster and helping them to develop more sustainable and
optimized concrete.
The scope of this research will be to predict four major concrete properties namely slump,
setting time, compressive strength and flexural strength. The predictive models are trained on
a large dataset which comprises important input variables, which include cement, water, fly
ash, superplasticizer, coarse aggregate, fine aggregate, and age. Technical coverage of the
study includes the use of Python programming language, scikit-learn for machine learning,
and FastAPI to develop a backend with a simple web interface designed using HTML, CSS,
and JavaScript.
The significance of this study lies in its practical and academic contributions. It offers a
proof-of-concept of successfully applying machine learning to civil engineering and connects
the gap between the theoretical aspects of data science and the real material science. The
created tool can be used as an educational resource, which will help the students learn how
complicated the relationship between the constituents of concrete mixes is. To the industry, it
provides a real-life solution to enhance material efficiency, lower the cost of design, and
shorten the project schedule, eventually resulting in more sustainable and cost-effective
construction.
CHAPTHER TWO
2. LITERATURE REVIEW
The traditional methods of concrete mix design, like those prescribed in the American
Concrete Institute (ACI) or British Standards (BS) have been the foundation of construction
industry over decades. These techniques depend on the known procedure and empirical tables
to calculate the ratios of all the materials including cement, water, fine aggregate and coarse
aggregate in order to attain a desired strength and workability. Though these standardized
practices are a consistent base point and much of the construction practice is based on these,
they do not go without major limitations.
The main characteristic is that they are very inflexible and cannot completely explain the
complex, non-linear relationships that control the interaction between the constituent
components of concrete. The effects of the factors on the strength and the properties of
concrete are complex, which involves the chemical composition of cement, the size and
shape of aggregates, the use of different admixtures, and the curing environment (Barbhuiya
and Sharif, 2024). The traditional process that relies on simplified models and established
tables may not be able to reflect these complex processes, particularly when the non-
traditional or sustainable materials such as supplementary cementitious materials (SCMs) or
recycled aggregates are involved (Amar, 2025).
This restriction makes the process of finding an optimal mix for particular project
requirements to take a long and resource-consuming trial and error process. Preparation,
casting, and testing of different physical batches will require engineers, which will inevitably
take time and cause material waste and expenditures. Failure to predict performance correctly
without physical testing is a significant bottleneck and it prevents innovation and the use of
sustainable materials in large scale. That is why a more advanced, data-driven solution
becomes more and more actively discussed to get rid of such limitations.
Fresh concrete properties refer to properties of the mixture just after it is prepared and before
it begins to harden. Such properties are essential on a construction site since they determine
how simple the concrete can be transported, placed, compacted and completed without
compromising on the homogeneity.
[Link] Slump
The most popular indicator of the workability and consistency of concrete is slump. The
slump is measured by use of a standardized test which is referred to as the slump test. In this
test, a steel cone is filled with concrete and subsequently removed, which makes the concrete
to settle or rather to slump. The height difference resulting in the vertical direction is recorded
in millimetres (mm). The greater the value of slump, the more fluid and workable a mix will
tend to be and this is usually preferred in intricate formwork or heavily reinforced areas. On
the other hand, when the slump is low, it means that it is a firm, less workable mix.
Setting Time is used to establish the time within the workable, plastic state of concrete. It is a
critical factor for project scheduling as it determines the timeline for the transportation,
placement, and initial finishing activities before the concrete loses its plasticity. The setting
time depends on many factors, some of them being type of cement, water-to-cement ratio,
and chemical admixtures that may be used in the construction such as retarders or
accelerators (ACI Committee 238, 2008). The longer the setting time, the greater the
flexibility on-site and the shorter one can be faster in construction provided it is managed
properly.
The most essential and general property of hardened concrete is Compressive Strength. It is
the capacity of the concrete to resist forces that would crush or compress it. Compressive
strength directly depends upon internal structure and porosity of concrete; the less porous the
concrete paste, the greater the compressive strength of the concrete. It is normally determined
by testing of hardened concrete cylinders or cubes to failure in compression testing machine.
The standard strength is typically determined after 28 days of curing, although testing may be
conducted at a younger or older age to determine the strength gain with the passing of time
(Altunic, 2024).
The Flexural Strength (or Modulus of Rupture) is the ability of the concrete to withstand
bending forces. Concrete is very strong in compressions but quite weak in tension. Therefore,
flexural strength, which quantifies tensile strength of the concrete when subjected to bending,
is a critical factor to those structures that experience bending forces like pavements, highway
slabs, and beams. Flexural strength is also identified normally by performing a beam test in
which a concrete beam is loaded at its centre until it fails (Stel'makh et al., 2022).
The properties of a concrete, both fresh and hardened, are dynamic. The end performance of
the mix is directly dependent on the proportions of its constituent materials and the duration
of its curing.
The exact ratio of all the ingredients is the primary factor that defines the behaviour of
concrete. The most crucial of them is the water-to-cement ratio (w/c). This ratio is inversely
proportional to the compressive strength of concrete, a lower w/c ratio will mean a denser and
less porous hardened cement paste hence higher strength, provided compaction and curing
are well done (Barbhuiya and Sharif, 2024).
This is also determined by the type and quantity of aggregates (fine and coarse). The surface
texture, grading and the shape of the aggregates determine the workability of fresh concrete
and its hardened strength. A well-graded aggregate mix can be used to minimize the amount
of excess water required and enhance strength.
Curing Age is a very important parameter that affects the concrete properties. The strength of
concrete develops with time due to a chemical reaction called hydration, where cement reacts
with water to form a hardened paste that binds the aggregates. The rate of this strength gain is
rapid in the first few days and weeks, but the process continues for months and even years.
Therefore, concrete strength is not constant but a dynamic parameter, and in most cases,
standard tests are performed at 7 or 28 days to determine the design strength of concrete
(Altunci, 2024). This time-dependent nature makes age as essential input variable to any
model that aims at providing accurate prediction of concrete performance.
The predictive ability of machine learning algorithms to predict in civil engineering has been
extensively proven. Various models such as simple linear models to more complex ensemble
models have shown their effectiveness in modelling the non-linear interactions in the
concrete mix designs (Barbhuiya and Sharif, 2024). Some of these models will be trained and
evaluated in this study and they include:
Through these regression algorithms, it is hoped that the study will go beyond the
conventional empirical approach to predict the concrete properties with the end goal of
proving a significant breakthrough in efficiency and sustainability in the construction
industry.
By focusing on these regression algorithms, the study aims to move beyond traditional
empirical methods in order to develop a stronger and more dependable system for predicting
concrete properties, ultimately demonstrating a significant improvement in efficiency and
sustainability within the construction industry.
These ensemble techniques are very effective, as they are able to represent the complicated,
non-linear interactions between the components of a concrete mix and its final properties,
which is a major benefit compared to the old-fashioned simplistic models.
Artificial Neural Networks (ANNs) are a powerful class of machine learning models that is
based on the structure and functionality of the human brain. They are suitable for tasks that
involve finding complex, non-linear relationships within a dataset, making them an excellent
choice for predicting concrete properties.
i. Input Layer: This layer receives the raw data, such as the proportions of cement,
water, and aggregates. Each node in this layer corresponds to a single input feature.
ii. Hidden Layers: These layers are where the learning occurs. Each neuron in a hidden
layer processes the information from the previous layer, applies a mathematical function, and
passes the result to the next layer. The network can have one or many hidden layers, with
deeper networks capable of learning more complex patterns.
iii. Output Layer: This is the last layer, which provides the prediction of the network. The
output layer usually contains a single neuron, which gives the final continuous value in a
regression problem such as predicting concrete strength.
The neuron connections possess adjustable weights which are learned by the network during
the training process. The predicted values of the model are compared with the actual values
and the weights are modified by a process known as the backpropagation to reduce the errors
between the predicted and actual values. This iterative process allows the network to learn the
complex non-linear mappings between the input mix design and the final concrete properties
(Muliauwan et al., 2020).
The highly non-linear relationships between a concrete mix's components and its final
strength make ANNs a suitable selection for this study. Unlike simpler models like linear
regression, ANNs can capture the subtle, high-dimensional interactions between various
ingredients, such as the effect of a specific type of superplasticizer or the subtle effect of fly
ash (Benaicha, 2024). This ability to model complex systems allows for more accurate and
reliable predictions, moving beyond the limitations of traditional, empirical formulas.
DIAGRAM
Machine learning (ML) application has become a revolutionary solution in civil engineering,
particularly for rapid and precise prediction of concrete properties. By moving beyond
traditional, time-consuming, and resource-intensive laboratory tests, ML models allow for the
non-destructive and cost-effective assessment of a wide range of concrete mix designs
(Altuncı, 2024; Roy et al., 2025). This predictive potential is vital in optimizing mix
proportions, minimizing material wastage and in speeding up the project schedules.
This current research is concerned with the prediction of fresh and hardened properties of
concrete such as Slump, Setting Time, Compressive Strength, and Flexural Strength, using
ML. The essence of this method is that the ML algorithms can learn the complex, non-linear
correlations between the materials that make the concrete (e.g., cement, water, fly ash) and
the final performance.
Researchers have successfully deployed a variety of algorithms for this purpose. For instance,
ensemble methods like Random Forest and Gradient Boosting are widely recognized for their
ability to handle complex datasets and provide highly accurate predictions. A number of
studies have validated these models, with published results often showing high correlation
coefficients (R2 scores) between predicted and actual values (Prayogo et al., 2020; Stel’makh
et al., 2022). This high accuracy demonstrates the models' ability to generalize beyond their
training data, providing reliable forecasts for new mix designs.
For example, a review of recent literature shows that models for predicting compressive
strength frequently achieve an R2 greater than 0.90, signifying that they can explain over 90%
of the variance in the data (Benaicha, 2024; Miao et al., 2024). The predictive power of these
models allows for the rapid assessment of mix designs and facilitates the development of
sustainable concrete by enabling engineers to quickly evaluate the impact of new or recycled
materials on performance (Amin et al., 2022). This capability is a significant step toward
developing smarter, efficient, and more sustainable construction practices.
The application of machine learning extends beyond prediction to the more advanced domain
of optimization, offering a direct path toward achieving sustainability goals in construction.
By integrating ML models with optimization algorithms, engineers can systematically design
concrete mixes with specific, desirable properties that are difficult to achieve through
traditional methods.
This combined approach allows for the intelligent creation of concrete that is not only strong
and durable but also meets crucial environmental targets, such as a low carbon footprint.
Rather than implementing a complex and expensive trial-and-error approach to discovering
the appropriate balance between sustainable materials, the ML-powered system can search
through a huge design space virtually and find the best ratios of supplementary cementitious
materials (SCMs), recycled aggregates, and other eco-friendly materials.
For example, a study might use an ML model to predict compressive strength, slump, and
cost, while an optimization algorithm simultaneously searches for the mix proportion that
minimizes cost and carbon emissions while maintaining a target strength (Mahjoubi et al.,
2025). This synergy of prediction and optimization streamlines the design process, making it
faster and more efficient to develop and implement new, green concrete formulations. This
not only aids in the development of more sustainable and eco-friendly concrete mixes but
also provides a competitive advantage by reducing both material waste and project costs.
A significant body of research confirms that data-driven machine learning (ML) models
consistently and substantially outperform traditional empirical methods when it comes to
predicting concrete properties. This superiority is based on the fundamental difference in how
these two approaches handle the complex, non-linear relationships inherent in concrete mix
design (Barbhuiya and Sharif, 2024).
Empirical methods rely on simplified formulas and tables derived from a limited set of
experiments. They are often rigid and struggle to accurately predict the performance of
concrete when new variables or new materials are introduced. While, ML models can ingest a
vast array of variables and learn the complex interactions between them. This capability
allows them to provide more accurate and generalized predictions for a wider range of mix
designs and conditions (Amar, 2025).
For example, while a traditional ACI mix design might be effective for standard concrete, it
cannot easily account for the complex effects of a new type of supplementary cementitious
material or the inconsistency in a specific aggregate supply. However, the outcome can be
well predicted by an ML model that has been trained on a large dataset which contains these
variables. This is an advantage, as it reduces the need for extensive physical testing, that
requires a lot of time, money, and resources. The consistent outperformance of ML models in
terms of key metrics like R 2 and Mean Absolute Error (MAE) validates their use as a reliable
and efficient alternative to conventional methods (Altuncı, 2024; Roy et al., 2025).
The evaluation and comparison of machine learning model performance are essential steps in
research that provides a quantitative basis for selecting the most effective models. This
process relies on set of statistical metrics that objectively measure the predictive accuracy and
reliability of a model.
i. Coefficient of Determination (R2): The R2 score is a key metric that measures the
proportion of the variance in the dependent variable that is predictable from the independent
variables. A score of 1.0 indicates that the model perfectly predicts the outcome, while a
score of 0.0 suggests the model is no better than simply predicting the mean of the data. The
closer the R2 value is to 1, the better the model's performance. In the context of concrete, a
high R2 score means the model can accurately account for the variability in properties like
strength and slump (Altuncı, 2024).
ii. Mean Absolute Error (MAE): The MAE quantifies the average magnitude of the
errors in a set of predictions, without considering their direction. It is the average of the
absolute differences between the predicted and actual values. The smaller the MAE value, the
more precise the model.
iii. Mean Squared Error (MSE): The MSE is similar to the MAE, but it squares the
differences between predicted and actual values before averaging them, reducing large errors
heavily. A smaller value of MSE depicts a better model.
Studies consistently show that ensemble models such as (Random Forest and Gradient
Boosting) and Artificial Neural Networks (ANNs) tend to perform better in predicting
concrete properties compared to simpler algorithms (Muliauwan et al., 2020). These models
are particularly well-suited for this task due to their ability to capture the complex, non-linear
interactions between concrete's components. For example, a study by Altunci (2024) found
that ensemble models achieved a higher R2 for compressive strength prediction than a single,
less-complex model. This highlights their effectiveness in providing more accurate and
reliable predictions, which is vital for real-world engineering applications.
CHAPTER THREE
3. RESEARCH METHODOLOGY
3.1 Materials
A complete range of seven input features is used in the dataset that characterizes a concrete
mix design and its conditions. These characteristics are used as the independent variables of
the machine learning models.
i. Cement, Water: These are the primary binding and hydration agents of the mix. The
ratio between these two is a critical determinant of strength and workability.
ii. Fly Ash, Superplasticizer: These are key supplementary cementitious materials and
chemical admixtures used to enhance workability, durability, and long-term strength. The
presence of these variables makes the dataset suitable for modelling sustainable concrete.
iii. Coarse Aggregate, Fine Aggregate: The granular materials that provide volume
stability and structural integrity to the mix.
iv. Age (days): The age of the concrete sample on which the test was conducted. This is
an essential consideration factor since the strength of concrete grows tremendously with time
as hydration process continues.
The dataset contains four key target variables, which are the continuous properties the models
are designed to predict. These variables represent the concrete's performance in both its fresh
and hardened states.
The diversity and comprehensiveness of this dataset are paramount for this research. By
including a wide range of material proportions and ages, the dataset enables the models to
learn the complex, non-linear relationships between the mix components and the final
properties. This approach is fundamental to creating a predictive system that can generalize
effectively to new and varied concrete formulations.
i. Pandas: This library was used for data manipulation and analysis. It was essential for
loading the data file into a structured data frame, allowing for easy handling and exploration
of the dataset.
ii. NumPy: A fundamental library for numerical operations. The high-performance
arrays and mathematical functions of NumPy were used for efficient calculations and data
handling throughout the machine learning.
iii. Scikit-learn: Scikit-learn provided the tools for building, training, and evaluating the
predictive models, including main algorithms like Random Forest, Gradient Boosting, and
Linear Regression, as well as utility functions to split the data and calculate the metrics.
iv. Joblib: This library was used for serialising and saving the trained models and
performance metrics. By saving the models as .pkl files, they can be easily loaded and reused
by the web application's backend without needing to be retrained.
3.2 Methodology
The project began by loading the dataset file into a pandas’ Data Frame. Initial inspections
confirmed that the dataset was clean, with no missing values or significant inconsistencies
that would require imputation or extensive cleaning. This allowed the project to proceed
directly to the feature and target selection.
The columns were clearly defined as either input features or output targets based on their role
in the predictive task.
(I) The input features (X), which are the variables used by the models to make
predictions, were defined as follows: ['Cement', 'Water', 'FlyAsh', 'Superplasticizer', 'Coarse
Aggregate’, 'Fine Aggregate', 'Age '].
(II) The target variables (y), which are the properties the models were trained to predict,
were defined as: ['Slump mm', 'Setting Time min', 'Compressive Strength MPa', 'Flexural
Strength MPa'].
In order to get a sound and objective analysis of the models, the dataset was split into two
distinct subsets: a training set and a testing set. This was accomplished using the split
function from the scikit-learn library. A standard ratio of 80% training data and 20% testing
data was used. The training set (80% of the data) was used to train the machine learning
models, allowing them to learn the patterns and relationships between the input features and
target properties. The testing set (the remaining 20%) was then used to evaluate the models'
performance on unknown data. This step is crucial for assessing how well the models will
generalize to new, real-world data and for preventing data leakage. To ensure the
reproducibility of the split, a fixed random state was applied.
The training process, as detailed in the FIXED_training.py script, was executed for each of
the following algorithms:
(I) Random Forest Regressor: An ensemble model known for its accuracy and ability to
handle non-linear data.
(II) Gradient Boosting Regressor: Another powerful ensemble model that builds upon the
errors of previous models to achieve high predictive performance.
For each target variable (Slump mm, Setting Time min, Compressive Strength MPa, and
Flexural Strength MPa), a fresh instance of each of the three models was trained on the
(X)train and (y) train datasets.
After training, each model's performance was rigorously evaluated on the unseen (y)test data
using three key metrics:
(I) Mean Absolute Error (MAE): The average magnitude of the errors, which
provides a straightforward measure of how close the predictions are to the actual
values.
(II) Mean Squared Error (MSE): A metric that penalizes larger errors more heavily,
providing insight into the model's consistency.
(III) Coefficient of Determination (R2): A crucial metric that indicates the proportion of
the variance in the target variable that can be predicted from the input features. An
R2 score close to 1.0 signifies a strong predictive fit.
The model that achieved the highest R 2 score for each target property was selected as the
"best performer." This best-performing model for each target was then serialized and saved to
a single file, concrete_models.pkl, using the joblib library. This allows the best models to be
easily loaded and used by the web application's backend API, ensuring that the deployed
system is using the most accurate models identified during the training phase.
The web application was developed with a clear separation between the backend and
frontend, allowing for a modular and scalable design.
[Link] Backend
The backend of the application was built using FastAPI, a modern Python web framework. Its
primary function is to serve the pre-trained machine learning models and handle all
prediction requests. The [Link] script was set up to perform the following key tasks:
1. Load Assets: It loads the pre-trained models from the concrete_models.pkl file and
the performance metrics from [Link] when the application starts.
2. API Endpoints: It provides several API endpoints to facilitate communication with the
frontend.
(I) A predict endpoint receives a JSON object containing the user's concrete mix
design input. It then uses the loaded models to make predictions for each of
the four target properties and returns the results.
(II) A metrics endpoint serves the saved performance metrics, allowing the
frontend to display the models' accuracy.
(III) A graph endpoint generates and returns an image of a graph based on user-
selected parameters, providing a visual analysis tool.
3. Data Handling: It uses Pydantic to validate incoming user data, ensuring it is in the
correct format before it is passed to the models.
[Link] Frontend
The frontend of the application is a user-friendly interface that interacts with the backend
API. It was built using a standard web development stack:
1. HTML ([Link]): Provides the basic structure and content of the web page,
including the input forms for the mix design, the section for displaying results, and the
area for graphs.
2. CSS ([Link]): Handles all the styling and visual presentation of the application,
ensuring a clean, intuitive, and modern user interface. It defines the layout, colours,
fonts, and responsive design for a consistent user experience.
3. JavaScript ([Link]): Manages all the dynamic and interactive functionality. It listens
for user actions, such as submitting the form, collects the input data, and sends it to
the backend API via asynchronous fetch requests. Once the API returns a response,
the JavaScript code processes the data and dynamically updates the HTML page to
display the predictions, performance metrics, and generated graphs.
The web application functions on a standard client-server architecture, with the frontend
acting as the client and the FastAPI application serving as the backend. This architecture
ensures a clear separation of concerns, allowing for efficient processing and dynamic content
delivery.
The user's journey begins on the frontend, a web page rendered using HTML, CSS, and
JavaScript. This page, served by the FastAPI server, presents a user-friendly form where the
user can input the proportions of a concrete mix design (e.g., Cement, Water, Aggregates, and
Age).
When the user enters the mix data and clicks the "Predict" button, the [Link] file takes over.
It gathers the input values from the HTML form and packages them into a structured JSON
object. This data is then sent to the backend via an HTTP POST request to the /api/predict
endpoint, which is defined in the [Link] script.
Upon receiving the HTTP request, the FastAPI backend spring into action. It first validates
the incoming JSON data using a Pydantic model to ensure all required fields are present and
in the correct format. It then accesses the pre-trained machine learning models, which were
previously saved in the concrete_models.pkl file using joblib.
The backend feeds the user's input data into each of the models (Random Forest, Gradient
Boosting, etc.) to generate predictions for the four key concrete properties: Slump, Setting
Time, Compressive Strength, and Flexural Strength.
After making the predictions, the backend organizes the results into a new JSON object and
sends it back to the frontend as an HTTP response. The [Link] on the client side receives
this response, parses the JSON data, and dynamically updates the HTML elements on the
page. The user then sees the predicted values displayed on the interface, without needing to
reload the entire web page. This seamless interaction provides an immediate and efficient
user experience.
1. User Input (Frontend): The process is initiated when a user enters concrete mix design
parameters—such as Cement, Water, and Age into the form on the frontend
([Link]). The input is then captured by the interactive javascript code in [Link].
2. Data Transmission (Client-to-Server): The [Link] file packages the user's input into
a JSON object. It then sends this data to the fastapi backend through a structured
HTTP POST request directed at the /api/predict endpoint defined in [Link].
3. Backend Processing (Server-Side): The [Link] script receives the request and
validates the incoming data. It then loads the pre-trained machine learning models for
each of the four target properties from the concrete_models.pkl file using the joblib
library.
4. Prediction Generation: The validated user input is fed into the loaded models
including the random forest regressor, gradient boosting regressor, and linear
regression which then generate a precise prediction for each target property (Slump
mm, setting time min, compressive strength MPa, and flexural strength MPa).
5. Results Transmission (Server-to-Client): The backend compiles the predictions into a
new JSON response. This response is then sent back to the frontend, completing the
server-side portion of the process.
6. Results Display (Frontend): Finally, the [Link] file receives the JSON response,
parses the prediction data, and dynamically updates the [Link] page. The
predicted values for the concrete's properties are displayed to the user without a full
page reload, providing an immediate and efficient experience.
The entire process is a continuous loop, allowing users to quickly test different concrete mix
designs and see the predicted outcomes in real-time.
CHAPTER FOUR
An essential first step in any data-driven project is to understand the characteristics of the
dataset. This involves performing a descriptive statistical analysis to summarize the main
features and distributions of the input variables. For this project, a thorough examination of
the expanded_concrete_data.csv file was conducted.
The analysis revealed that the dataset provides a wide and diverse range of values for each
input feature, which is crucial for training a robust machine learning model. A diverse dataset
helps the model learn to generalize to various concrete mix designs beyond what it saw
during training.
Here is a summary of the key descriptive statistics for the main input features:
(I) Cement: The amount of cement used per cubic meter of concrete ranged from 100 kg
to 600 kg. This wide range is representative of various mix designs, from low-strength
to high-strength concrete.
(II) Water: The water content varied from 120 kg to 250 kg. The water-to-cement ratio is
a critical factor, and this range allows the models to learn its significant impact on
both workability and strength.
(III) FlyAsh and Superplasticizer: These variables included values from 0 kg up to 150 kg,
indicating the dataset contains both traditional mixes and those with supplementary
cementitious materials and chemical admixtures.
(IV) Coarse Aggregate and Fine Aggregate: The aggregates, which make up the bulk of the
concrete's volume, varied from 600 kg to 1800 kg, providing a comprehensive view of
different aggregate grading curves and proportions.
(V) Age: The age of the concrete ranged from 7 to 365 days, which is vital for modelling
the progressive increase in compressive and flexural strength over time due to the
hydration process.
This analysis confirms that the dataset is well-suited for a machine learning task, as it
contains a rich variety of data points that allow the models to learn the complex relationships
between mix proportions and concrete properties effectively.
To gain a deeper understanding of the relationships within the dataset, a correlation analysis
was performed. This statistical technique measures the strength and direction of a linear
relationship between two variables, providing valuable insights into how changes in one input
feature relate to changes in a target property. The analysis was conducted by generating a
correlation matrix, which provides a quantitative overview of these relationships.
The analysis yielded several key findings that align with established principles of civil
engineering:
(I) Age and Strength: A strong positive correlation was found between the Age input
feature and both Compressive Strength MPa and Flexural Strength MPa. This
confirms the fundamental principle that concrete strength increases over time as the
hydration process continues. The positive correlation means that as the age of the
concrete increases, its strength tends to increase as well.
(III) Other Ingredients: Features such as FlyAsh and Superplasticizer showed more
complex relationships. While they may not have a strong linear correlation on their
own, their effect is highly synergistic and often captured by more advanced models
like Random Forest and Gradient Boosting, which are designed to handle non-linear
interactions.
The performance of each trained model was rigorously evaluated to identify the most
accurate algorithm for each of the four concrete properties. This analysis was crucial for the
final model selection and was based on a suite of standard regression metrics, including the
Coefficient of Determination (R2), Mean Absolute Error (MAE), and Mean Squared Error
(MSE). The results were stored in the [Link] file, providing a clear summary of each
model's predictive capability.
The model with the highest R2 score for each target property was selected as the best
performer. The R2 metric, which ranges from 0 to 1, indicates the proportion of the variance
in the target variable that can be predicted from the input features. A score closer to 1
signifies a more accurate model.
As noted, the Random Forest model was found to have the highest R2 score for predicting
Compressive Strength MPa. Its high performance is attributed to its ability to capture the
complex, non-linear interactions between the various concrete mix components that influence
strength.
[Link] Slump
For the Slump mm property, which measures the workability of fresh concrete, a different
model emerged as the top performer. This highlights the importance of evaluating models
individually for each target.
Predicting the setting time of concrete is another crucial factor for on-site application. For
this target, a different ensemble model was identified as the best.
Finally, for predicting Flexural Strength MPa, a different model proved to be the most
accurate.
Best Model: Random Forest Regressor
In conclusion, the systematic evaluation of each model's performance on the test data
confirmed that ensemble methods, specifically Random Forest and Gradient Boosting,
consistently outperformed the simpler Linear Regression model. This result aligns with the
understanding that the relationship between concrete mix ingredients and final properties is
highly complex and non-linear, making these advanced models particularly well-suited for
the prediction task. The model with the highest R2 for each property was then selected and
saved, forming the core of the web application's predictive engine.
A comprehensive comparative analysis of the performance metrics confirmed that the choice
of machine learning model is paramount for accurate prediction. The results consistently
demonstrated that the ensemble models specifically, the Random Forest Regressor and the
Gradient Boosting Regressor significantly outperformed the simpler Linear Regression model
across all four target properties.
This finding is not surprising and is highly consistent with existing literature in the field of
concrete science and machine learning (Altuncı, 2024; Roy et al., 2025). The reason for this
clear superiority lies in the fundamental nature of these models and the complexity of the
concrete dataset.
(I) Linear Regression assumes a linear relationship between the input features and the
target variables. This assumption is a major limitation, as the interactions between
concrete components (e.g., cement, water, fly ash, and aggregates) are highly
complex, non-linear, and synergistic. For instance, the effect of FlyAsh on
Compressive Strength MPa is not a simple, straight-line relationship; it depends on
the proportion of other materials in the mix and the age of the concrete.
(II) Ensemble models, on the other hand, are designed to handle exactly this type of
complexity. Both Random Forest and Gradient Boosting are built from a collection of
simpler decision trees. By combining the predictions of many trees, these models can
capture intricate, non-linear patterns that a single, simple model would miss. The high
R2 scores achieved by these models (often above 0.95) are a direct result of this
capability, indicating that they can account for over 95% of the variability in the
target properties.
The consistent outperformance of ensemble models, as shown by their lower MAE and MSE
values and higher R2 scores, provides a data-driven justification for their selection as the best-
performing models for this project. This aligns with the consensus in academic research that
ensemble methods are a more reliable and powerful tool for predicting complex material
properties like those of concrete.
The final selection of the models for the web application was a data-driven decision based on
the comprehensive evaluation of the performance metrics. For each of the four target
properties, the model that achieved the highest R 2 score was chosen. This selection was based
on a clear and widely accepted principle in regression analysis: the R 2 value is the most
reliable single metric for indicating a model's predictive capability for a given dataset.
The Coefficient of Determination (R2) measures the proportion of the variance in the
dependent variable that can be predicted from the independent variables. In simpler terms, it
quantifies how well the model's predictions align with the actual, observed values.
(I) An R2 score of 1 indicates a perfect fit, where the model's predictions perfectly match
the actual data.
(II) An R2 score of 0 means the model is no better at predicting the outcome than simply
using the mean of the dataset.
By selecting the model with the highest R 2 score, we ensured that the final deployed models
were the ones that best explained the variability in each concrete property. For example, the
Random Forest Regressor was chosen for Compressive Strength MPa because its high R 2
score demonstrated its superior ability to account for the complex interactions between the
various mix ingredients and the final strength of the concrete.
While MAE (Mean Absolute Error) and MSE (Mean Squared Error) are also important
metrics for understanding a model's performance, they primarily measure the average
magnitude of the prediction errors. The R 2 score, however, provides a more holistic view of
the model's overall fit and reliability, making it the most suitable metric for justifying the
final model choice for a real-world predictive application.
The user interface (UI) of the web application was designed with a focus on simplicity,
clarity, and ease of use. It provides a clean and intuitive experience for anyone, regardless of
their background in concrete science or machine learning. The entire UI is contained within a
single HTML page ([Link]) and is styled with CSS ([Link]) to ensure a modern and
responsive design that works well on different devices.
The main component of the UI is the input form. This form presents users with a
straightforward set of fields where they can enter the quantities of the seven key concrete
ingredients:
Cement
Water
FlyAsh
Superplasticizer
Coarse Aggregate
Fine Aggregate
Age
Each input field is clearly labelled to minimize confusion. Once the user has entered the
desired values and submitted the form, the JavaScript ([Link]) code takes over to handle the
communication with the backend.
In addition to the input form, the interface includes a dedicated area for displaying the
prediction results. After the backend processes the data, the frontend dynamically updates this
section to show the predicted values for the four target properties. The UI also features a
separate section where users can view the performance metrics of the models (R 2, MAE,
MSE), as well as an interactive section for generating visualizations of the data. This design
ensures that the user can both input new data and review the models' performance and
predictions all from a single, cohesive interface.
The web application's core functionality lies in its ability to take user input and provide near-
instantaneous predictions for the properties of a given concrete mix. This feature serves as the
primary demonstration of the successful integration between the frontend and backend.
Upon entering a set of valid input parameters for a new concrete mix design into the user
interface and clicking the "Predict" button, the following process is initiated:
(I) API Call: The [Link] file, which manages all frontend interactions, sends an
asynchronous HTTP POST request to the /api/predict endpoint on the FastAPI
backend. This request includes a JSON payload containing the user-defined quantities
of all seven concrete ingredients.
(II) Backend Processing: The backend ([Link]) receives the request and, in less than a
second, processes the data. It uses the pre-trained machine learning models to
generate predictions for Slump, Setting Time, Compressive Strength, and Flexural
Strength.
(III) Instantaneous Results: The backend then sends a JSON response back to the frontend
with the predicted values. The [Link] file immediately receives this response and
updates the [Link] page to display the results in a clear and organized format. The
predictions are presented in real time, offering a seamless and efficient user
experience.
This demonstration confirms that the prediction pipeline—from user input to frontend-
backend communication and finally to the display of results—is fully operational, enabling
users to leverage the power of the trained models without any knowledge of the underlying
code.
A key feature of the web application is its commitment to transparency and interpretability.
Beyond simply providing predictions, the user interface includes dedicated sections that
allow users to explore the performance of the underlying machine learning models and to
visualize the relationships between the input data and the predictions. This adds a critical
layer of trust and understanding for the end-user.
A dedicated section on the web page retrieves and displays the performance metrics for each
of the four trained models. This data, sourced from the [Link] file via a call to the
/api/metrics endpoint, shows key metrics like R-squared (R 2), Mean Absolute Error (MAE),
and Mean Squared Error (MSE). This allows users to quickly verify the reliability of the
model for each specific concrete property (e.g., Slump, Compressive Strength) and
understand the models' accuracy in a quantitative way.
The application also features an interactive graph generation tool. Users can select any input
parameter (e.g., Age days) and any target variable (e.g., Compressive Strength MPa) from a
dropdown menu. Upon submission, the JavaScript sends a request to the
/api/graph/{parameter}/{target} endpoint. The FastAPI backend processes this request,
generates a scatter plot showing the relationship between the two selected variables, and
returns the plot as an image. This visual representation allows users to intuitively understand
how each ingredient influences the final properties of the concrete, offering valuable insights
that complement the numerical predictions.
CHAPTER FIVE
5.2 Conclusion
This study successfully demonstrated that a data-driven approach using machine learning is a
powerful and reliable method for predicting the mechanical properties of concrete. By
leveraging a comprehensive dataset and systematically evaluating multiple models, the
project validated its core hypothesis: that the complex, non-linear relationships within
concrete can be accurately modelled by algorithms.
The project's key achievement is the integration of the machine learning models into a fully
functional web application. This not only proves the technical feasibility of the solution but
also transforms a complex model into a practical, accessible tool. This application can assist
engineers and material scientists in optimizing concrete mix designs, potentially leading to
significant reductions in material waste and costly, time-consuming laboratory testing.
In essence, this work bridges the gap between theoretical machine learning and real-world
engineering challenges, offering a valuable resource for advancing sustainable and efficient
practices in the construction industry.
1. Incorporating a Wider Range of Datasets: Future work should focus on expanding the
dataset to include new variables, such as different types of aggregates, recycled
materials, and admixtures. This will make the models more robust and applicable to a
wider range of sustainable concrete formulations.
3. Integrating Optimization Algorithms for Mix Design: The current application predicts
properties based on a given mix. A powerful future development would be to integrate
optimization algorithms to suggest the ideal mix proportions to achieve a desired set
of properties, providing a complete solution for concrete design (Mahjoubi et al.,
2025).
The primary contribution of this project is the creation of a practical, accessible tool that
bridges the gap between theoretical data science and civil engineering practice. It provides a
tangible proof of concept for how AI can be leveraged to streamline design processes,
improve material optimization, and support more sustainable construction.
Machine learning improves concrete mix design by efficiently modeling the non-linear correlations between constituents, reducing the need for extensive physical experiments. This data-driven approach allows for quicker testing of mix proportions, minimizes material waste, and enables the integration of sustainable materials which traditional methods struggle to accommodate .
The research integrates data science and civil engineering by applying machine learning algorithms to predict concrete properties, demonstrating a practical use of these algorithms in solving real-world engineering problems. This approach offers a proof-of-concept for modernizing traditional empirical methods with more data-driven solutions .
Traditional concrete mix design is limited by its reliance on empirical equations and trial-and-error experiments, which do not adequately model the complex and non-linear interactions between materials such as cement, water, and aggregates. This results in a time-consuming and resource-intensive process that is not conducive to incorporating sustainable materials like supplementary cementitious materials or recycled aggregates .
Data integrity is maintained through the use of Pydantic models for validating incoming data to ensure all required fields are correct. The backend processes the data using pre-trained models, ensuring consistent and accurate predictions, while an efficient client-server architecture allows for dynamic updates without reloading pages .
R2 indicates the proportion of variance explained by the model, MAE provides a straightforward measure of prediction accuracy, and MSE penalizes larger errors heavily, indicating model consistency. These metrics collectively help determine the suitability and accuracy of the machine learning models in capturing the complex relationships in concrete properties .
Curing age is critical as concrete's strength gains over time due to hydration, making it a dynamic and time-dependent parameter. Accurate modeling requires this variable, as the reaction rate affects initial strength gains rapidly and continues to develop over months, leading to significant variations in properties such as compressive strength .
The interactive graph allows users to select and visualize relationships between ingredients and properties of concrete, offering insights into how each ingredient affects final properties. This visualization aids in understanding material interactions and provides an educational tool for comprehending the dynamic influences on concrete mixes .
Ensemble learning, particularly methods like Random Forest and Gradient Boosting, is beneficial because they handle complex, non-linear data effectively, providing more accurate and stable predictions compared to simpler models. They correct errors of previous models, enhancing predictive performance crucial for material properties like concrete .
The web application provides a practical tool for civil engineers and students by enabling faster design iterations and promoting the development of more sustainable and optimized concrete solutions. It serves as an educational resource that bridges the gap between data science theory and practical material science, illustrating the complex interactions within concrete mixes .
The user interface is designed to be user-friendly with HTML for structure, CSS for styling, and JavaScript for dynamic interactions. Users input mix designs, and with a simple click, predictions are displayed instantly, providing seamless user experiences without page reloads. This allows users to intuitively engage and explore model predictions and metrics .