TextSter: An all in one text processing website
Satyam Subham Swain
Dibyajyoti Mohapatra Dharitri Danee Rout
School of Computer Engineering
School of Computer Engineering School of Computer Engineering
KIIT Deemed to be University
KIIT Deemed to be University KIIT Deemed to be University
Bhubaneswar - 751024, India
Bhubaneswar - 751024, India Bhubaneswar - 751024, India
22053543@[Link]
22053685@[Link] 22054132@[Link]
DR. Suneeta Mohanty
School of Computer Engineering
KIIT Deemed to be University
Bhubaneswar - 751024, India
smohantyfcs@[Link]
Abstract—As a critical industry for humanity's survival, exhibiting similar symptoms when multiple diseases exist.
agriculture has important challenges that farmers face every
In this context, Machine Learning (ML) and Computer
growing season; challenges such as what crop is best to grow, if
fertilizers are applied correctly and at the right time, and if
Vision (CV) technologies offer transformative alternatives.
diseases are identified as soon as they are present on the crop, For instance, ML can leverage extensive databases to
if not before. All of these issues lead to lower yields and can analyze and discern patterns across a variety of data types
ultimately lead to a loss of environmental resources. The and to generate accurate predictions as to what crops and
current system, Machine Learning Driven Crop and Fertilizer combinations of fertilizer are suitable based on soil
Advisor with Computer Vision Based Disease Detection composition, temperature, humidity, and rainfall. Similarly,
System, addresses these agricultural issues in an innovative CV technologies can perform automatic disease detection in
manner that increases yields through intelligent physical leaves simply by analyzing photographic images and
automation. The Machine Learning component of the project minimize the reliance on human input and medical attention
utilizes Machine Learning based algorithms to predict and promptly.
recommend an optimal crop and fertilizer for the specified soil
and environmental parameters, and the computer vision
component is designed to detect plant diseases using image- The ultimate goal for this project is to create and deliver a
based classification techniques. We used the Flask web web-based application combining ML and CV models for
development framework to implement the system. This precision agriculture. The proposed application would help
software reflects a deployable and maintainable version of the farmers utilize data-driven decision-making, including
agricultural tools, which also incorporates a MySQL database predicting suitable crops, suggesting fertilizer combinations,
to facilitate data taking and interaction, one of the many uses and identifying plant diseases related to the photographic
of the application. We also used, collectively, NumPy, Pandas, images of the leaves. The proposed web application will
Scikit-learn, Seaborn, PyTorch, and ResNet-9 to support an
comprise a robust web interface design (in Flask) for ease of
integrated approach between artificial intelligence prediction
access and will be user-friendly for any farmer, even non-
and real-time usability for an end-user application. Overall, it
has been shown that there is an overall increase in accuracy technical.
and efficiency with this project; that overall, it is a one-stop
solution for addressing overall precision agriculture.
The system also provides a way to unite computational
Keywords—Machine Learning, Computer Vision, Precision intelligence and agricultural practice with precise results at a
Agriculture, Smart Farming, Crop Recommendation very simple and interactive user interface. It demonstrates
how the combination of ML, CV, and web-based
I. INTRODUCTION technologies will support the agricultural sector with
Agriculture is the cornerstone of the world economy, automation, sustainability, and real-time intelligence.
and more than half of the world's population, directly and II. TECHNICAL DEFINITIONS
indirectly, depends on agriculture for their livelihoods.
Although both agriculture and other industries have been The planned system employs a range of software
progressively developed over the years, through the technologies, libraries, and frameworks to develop and
application of new technology, agriculture still relies upon deliver a stable and responsive platform for users.
manual labor, particularly in many underdeveloped A) Programming Language
economies, leading to moments of unfitting and ineffective
production. Suboptimal crop management, poor fertilization Python 3: Selected for its flexibility and robust supporting
practices, slow disease diagnostics, and differences in the ecosystem of ML and data manipulation libraries.
environment all act as substantial barriers to improved yield. B) Front-End
HTML, CSS, JavaScript, Bootstrap, Flask: Used to
In traditional agricultural systems, decisions about what develop a responsive and dynamic end-user interface for
crops to grow, when to apply fertilizer, and how much to their web application.
apply are typically driven by farmers' experience and C) Back-End
instinct, which can be seriously misguided because it does
MySQL: Utilized as a relational database for user data
not account for the changing parameters of the environment
and in-soil factors affecting the crop. Manual diagnostics for storage and communication between backend routes and
plant diseases also require expertise and are subject to error the ML models.
or oversight, particularly with proportions of the same plant D) Python Libraries
NumPy: Used for its capabilities for efficient numerical
computation and array calculations. Liakos and coworkers (2018) have presented a
Pandas: Used for dataset preprocessing and tabular data systematic review of machine learning (ML) in agriculture.
management. They offered examples of how algorithms such as Random
Forest, K-Means, and Gradient Boosting can help with
Scikit-Learn: Used for ML algorithms for classifications automating processes such as irrigation management, soil
and model evaluation type classification, and future crop yield prediction. They
Seaborn: A visualization library utilized for plotting described the prospects and future potential of artificial
model accuracy and performance metrics. intelligence in agriculture, while mentioning the challenge
PyTorch: A very flexible framework for building deep of limited and unreliable data regarding subsystems.
learning and computer vision models.
ResNet-9: A CNN architecture used for detecting leaf Tripathy et al. (2021) explored the fusion of IoT and ML
disease. for monitoring systems of agriculture. They emphasized
their methods and illustrated how prediction performance
SQLAlchemy: Used to connect to MySQL in Flask.
improvement can be boosted with real-time sensor data in
Pickle: Used for the model serialization/deployment in conjunction and provided ML algorithms. Their research
real-time. was limited in implementation or lack of computer vision
These libraries create the foundation for computational integration, and required a considerable hardware
purposes and data handling for ML training, testing, and implementation, which may have limited their economic
uptake.
communication with your web application.
III. RELATED WORK
Among other tracks, the review by Kamilaris and
Over the past decade, there has been a lot of exploration Prenafeta-Boldú (2018) disclosed deep learning frameworks
assessing the applications of Artificial Intelligence (AI), for plant phenotyping and detection for plant stress, and
Machine Learning (ML), and Computer Vision (CV) in Koirala et al. (2019) proposed CNN architectures for fruit
agriculture to improve productivity, assist farmers in real- detection and yield predictions. As a whole, they
time, and improve and support data-driven decision- demonstrate that AI-based agricultural systems, and their
making. This smart farming innovation has shown that development, are rapidly evolving, albeit generally in either
computational intelligence can be used to improve very targeted domains of deployment or as computationally
productivity and sustainability. accessible.
Chlingaryan et al. (2018) looked at whether it is Even with these advances, there is still a need for a
possible to use ML models to estimate crop yield and cohesive and cost-effective platform that combines ML for
predict nitrogen status using multispectral and predicting crops and fertilizers, CV for detecting diseases,
hyperspectral data. Their research demonstrated that and is public web-based. The proposed system fills this need
supervised learning approaches (specifically Support by producing a complete agricultural advisory solution that
Vector Machines (SVM) and Random Forests) are more includes predictive analytics, image classification, and user
effective than traditional regression models, assuming you interaction in real time.
have trained the model on a large and structured dataset. IV. METHODOLOGY
However, the process was not immediate, and the use of
satellite imagery to collect multispectral and hyperspectral The proposed framework comprises several
data was expensive, limiting the potential use of the model computational methods and frameworks to offer a
for smallholder farmers. consolidated solution provision. The framework includes six
stages: data collection, data preprocessing, model training,
performance evaluation, framework integration, and
Patil and Kale (2016) developed a fertilizer framework deployment. Each stage is designed to provide
recommendation method based on Decision Tree and Naïve confidence in the reliability, scalability, and interpretability
Bayes algorithms trained on soil- and crop-specific data. of the resulting framework.
They demonstrated that fertilizer advisory models based on A. Data Collection
data could minimize the risk of over-fertilizing and result in
improved soil health. However, their proposal only dealt
with fertilizer recommendations and did not address disease Data collection is the foundation of any ML-based
detection or integrated recommendations involving multiple agricultural initiative. The datasets included in this solution
end-user agents. are publicly available from reliable open-source sources,
including the Kaggle Agricultural Crop Dataset, the Indian
Soil Data Portal, and the PlantVillage Leaf Dataset.
Ramesh et al. (2019) provided a deep learning approach
to classify plant diseases based on Convolutional Neural The datasets consist of the following:
Networks (CNN). Their model used plant disease images
from the PlantVillage dataset as a training basis for their Soil Parameters: Acceptable Nitrogen (N), Phosphorus
architecture and achieved significant results in detecting (P), Potassium (K), pH level, soil moisture content, and
early blight and leaf spot diseases. The authors, soil type.
nonetheless, mentioned limitations in their approach
because of working on a fixed number of crops and the Environmental parameters: Acceptable values of
need for high-quality images for failure-free classifications, weather data such as temperature, rainfall data, and
potentially reducing flexibility within different agricultural humidity.
contexts. Crop yield and fertilizer datasets: For Training ML
models for advisory recommendations.
Leaf images dataset: High-quality photos for healthy D. Disease detection based on computer vision
and diseased leaf images for CV-based classification.
The disease detection component is based on a residual
The integrated use of tabular and image datasets convolutional neural network (CNN) architecture, ResNet-9,
promotes sound model training and aids in predictive which uses skip connections to enhance feature extraction
generalization. and mitigate the vanishing gradient problem. The
architecture comprises convolutions, batch normalization,
activations, and residual blocks, enabling deep hierarchical
B. Data Preprocessing learning from leaf images.
The data collected is preprocessed using Pandas and For each input image, the input passes through many
NumPy. Initially, missing values are addressed through convolutional filters, which detect edges, colors, and
statistical imputation methods. The categorical data, which textures, all of which are prominent in the development of
includes the names of the crops and types of soils, are label- disease symptoms such as blight, mildew, or rust. The model
encoded using Scikit-learn's LabelEncoder. is trained with a Cross-Entropy Loss function and the Adam
optimizer with a learning rate of 0.001.
For machine learning-based models, normalization
using Min-Max scaling is performed with the aim of The loss function is mathematically formulated as:
uniformly contributing all parameters to the model training.
The data is further split into a training set that consists of formula
80% of the data and a testing set with 20% for evaluation where is the true class label, while is the predicted
without bias. probability.
For the image dataset used for disease detection, image Upon completion of the model training, an accuracy rate
augmentation methods are applied to improve the of 94% was achieved on test images not seen by the model
robustness of the model, which include rotation, horizontal during training. The predictions are then presented to the
flipping, and zoom transformation of the images. All user as disease name classifications, alongside suggested
images are resized to 128×128 pixels and then converted cures.
into tensors for compatibility with the PyTorch framework.
E. Model Integration and Deployment in Flask
C. Predictive Models for Crop and Fertilizers
Following training and evaluation of models, trained ML
The system features supervised ML algorithms that and CV models are pickled using the Pickle module and run
predict crop suitability as well as fertilizer within a Flask web application. Flask is a lightweight web
recommendations. The algorithms chosen were Decision framework that allows routing, so each model responds to a
Tree, Logistic Regression, Support Vector Machine (SVM), specified endpoint:
and Random Forest. crop for yield recommendation
fertilizer for fertilizer prediction
All the models had the same dataset as training data and disease for image-based disease classification
were compared against the conventional performance
metrics of accuracy, precision, recall, and F1 score. The
Random Forest detected the highest prediction accuracy The Flask server processes the user requests, processes
amongst all four models at 99%. Because the random the data through the appropriate model, and responds with
forest is an ensemble learning method, it reduces overfitting the prediction result on the frontend. The bridging of Flask
and is good at modelling nonlinear relationships between with the MySQL database is facilitated by the SQLAlchemy
features. ORM for efficient query handling as well as user
authentication and persistence to a database.
The mathematical notation for Random Forest is written
as: formula: formula F. Frontend and Database Integration
Where Ti (x) is every decision tree in the ensemble, and The interface was developed in HTML, CSS, Bootstrap,
N is the total number of trees. The final prediction output is and JavaScript to ensure frontend responsiveness. The user
calculated using a majority-vote ruling among all decision can insert soil and environmental conditions, or they can
trees of the random forest. also upload a leaf image through the graphical interface, and
the predictions appear in a clean way in the functionally
Table1. Performance Comparision of Machine Learning rendered Flask template.
Models
Algorithm Accuracy(%) The backend database is MySQL for storing and keeping
Decision Tree 90 a continuous history of user data, prior prediction results,
Logistic Regression 95 and feedback. The database schema is composed of Users,
SVM 97 Predictions, and DiseaseRecords tables, which allow for
Random Forest 99 tracking longitudinal records of the data.
G. Model Evaluation Metrics and Performance with MySQL database will ensure it can run with user-
friendly performance while always considering scalability
for use in rural environments.
To assess the reliability of our model, we calculated the
following evaluation measures:
This experimental analysis demonstrates that this unique
integrated system has enhanced overall predictive modeling
Accuracy: That is the number of correctly predicted outcomes while shortening the measurement of time spent
instances. by the decision maker. In seconds, farmers could obtain
recommendations that would have potentially taken a whole
day in some instances to determine through guessing alone.
Precision & Recall: Approaches used in model
evaluation for imbalanced data.
As a result, the project advances precision agriculture,
resource reduction, and environmental sustainability by
F1-Score: The harmonic mean of recall and precision.
democratizing rational fertilizer application and timely
disease identification.
Confusion Matrix: That is the graphical representation
of correct and incorrect predictions.
From a research standpoint, this research connects two
existing technology gaps: previous efforts have considered
Both the Random Forest and SVM models had the top crop forecasting and fertilizer recommendation and disease
F1-scores, given that ResNet-9 had a superior recall, identification as independent modules.
indicating it could detect the disease under alternative
lighting and background situations.
The work presented here integrates these into a single
holistic system, representing a step toward fully autonomous
H. Conclusion agricultural intelligence.
This methodology will provide a timely, end-to-end B. Future Work
workflow - from dataset formation to producing a web tool
- to provide applicability to analytical intelligence in user-
Though the prototype is highly accurate and usable,
friendly tools and systems when needed. Its modular
there are more things that can be done to improve the
structure naturally allows for scalability for incorporation
functional and scalable feature set.
with IoT-based data in real-time, or inclusion of cloud
capabilities for automated systematic updating and Future work will pursue the following directions:
monitoring.
1. IoT Integration for Real-time Monitoring:
V. IMPLEMENTATION AND RESULTS The next folded version of the system will have Internet
of Things (IoT) sensors like soil-moisture probes,
images temperature and humidity sensors, and pH meters on the
VI. CONCLUSION AND FUTURE WORK system to consistently monitor the fields.
Readings will be sent to the server via Wi-Fi or LoRa
A. Conclusion gateways, and the ML models will monitor the incoming
readings and change recommendations as needed.
The combination of all IoT and ML will create a self-
The presented Machine Learning-Based Crop and
learning adaptive system for real-time precision farming.
Fertilizer Advisor with Computer Vision-Based Disease
Detection System illustrates the potential of artificial
intelligence to convert traditional agricultural practices into 2. Cloud and Edge Deployment:
an intelligent, data-informed pursuit. By employing
machine learning algorithms for predictive modeling and Deployment to a cloud platform (e.g., AWS IoT Core,
computer vision models for image-based diagnostics, the Azure ML) will provide scalability and access to multiple
system provides extensible decision support to farmers via users, while edge deployment on devices like Raspberry Pi
an easy-to-use web framework. might allow lower-powered inferences without requiring
high-speed internet directly on farms.
The ML component concludes the analyses of various
soil and environmental factors - including N (nitrogen), P 3. Mobile Application Interface:
(phosphorus), K (potassium), pH, temperature, humidity, A multilingual Android application will be created for
and rainfall - to determine the best theoretical crops and remote farmers to access data. This app will be connected to
fertilizers to use. The algorithm that showed the best the Flask server using REST APIs and have offline caching
accuracy, and was therefore the most robust for nonlinear for areas with intermittent connectivity.
agricultural data, was Random Forest at 99% accuracy. The
Computer Vision (CV) module, based on the ResNet-9
deep learning network, achieved 94% accuracy in 4. Expansion of Regional Datasets:
classifying leaf images with plant diseases. This means that By adding localized soil and crop data from additional
farmers will be able to obtain a rapid diagnosis and states in India, we will refine regional accuracy. The disease
preventively treat the disease and stop it from causing even classes, in addition to the fertiliser categories, can be
greater crop loss. Integration through Flask Framework updated through continuous retraining of the model.
5. Decision Visualization Dashboard:
An analytical dashboard developed using Seaborn and
Plotly will present trends of soil health, fertiliser
consumption, and disease occurrence to provide longer-
term insights into agricultural trends.
By developing the enhancements, we could support the
system in moving from a decision-support tool to a fully
automated smart-farming ecosystem. In addition, it helps
support Digital India and Sustainable Agriculture 2030 by
allowing farmers to use IoT sensors and mobile technology
to facilitate more efficient monitoring and management of
farming practices.
In summary, the work here offers a concrete route to
integrate data analytics, machine learning, and computer
vision into an easy-to-deploy agriculture advisory platform.
Its modular architecture, high accuracy, and real-time
performance underscore its potential for adoption by
farmers, researchers, and all agricultural organizations. This
system will evolve into an intelligent, autonomous,
inclusive one, with forthcoming integration of many IoT
and mobile-cloud features, bridging the technological
divide between digital innovation and better farming
practices.