0% found this document useful (0 votes)
18 views6 pages

Crop Yield Prediction with ML Techniques

The document discusses the use of machine learning algorithms, particularly the Random Forest algorithm, for predicting crop yields in India, which is vital for the agricultural sector. It highlights the importance of accurate yield predictions for farmers to make informed decisions about crop selection and management, addressing challenges posed by environmental factors. The proposed system utilizes data from various districts and employs machine learning techniques to enhance agricultural productivity and sustainability.

Uploaded by

Siddu Sankapal
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views6 pages

Crop Yield Prediction with ML Techniques

The document discusses the use of machine learning algorithms, particularly the Random Forest algorithm, for predicting crop yields in India, which is vital for the agricultural sector. It highlights the importance of accurate yield predictions for farmers to make informed decisions about crop selection and management, addressing challenges posed by environmental factors. The proposed system utilizes data from various districts and employs machine learning techniques to enhance agricultural productivity and sustainability.

Uploaded by

Siddu Sankapal
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CROP YIELD PREDICTION USING MACHINE LEARNING ALGORITHM

Ms. Ranjani J, Assistant Professor,[Link]@[Link], Sri Sai Ram Engineering


College(Autonomous Institution), Chennai
Ms. V.K.G Kalaiselvi, Assistant Professor, [Link]@[Link],Sri Sai Ram Engineering
2021 4th International Conference on Computing and Communications Technologies (ICCCT) | 978-1-6654-1447-0/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICCCT53315.2021.9711853

College(Autonomous Institution), Chennai


Ms. [Link] , Assistant Professor, [Link]@[Link],Sri Sai Ram Engineering
College(Autonomous Institution), Chennai
Deepika Sree D,UG Scholar, e8it104@[Link], Sri Sai Ram Engineering College(Autonomous
Institution), Chennai
Janaki G, UG Scholar, e8it057@[Link], Sri Sai Ram Engineering College(Autonomous
Institution), Chennai

well-established models. Machine learning


I. Abstract: is increasingly widely used around the world
due to its success in a range of disciplines
Agriculture is the backbone of the Indian such as forecasting, fault detection, pattern
economy, with more than half of the identification, and so on. A key agricultural
country's people relying on it for concern is a yield prediction. Farmers will
subsistence. Crop production is predicted be able to determine the yield of their crop
using machine learning techniques based on before growing on
parameters such as rainfall, crop, and the agricultural field using the results of this
meteorological conditions. The most popular study, allowing them to make informed
and powerful supervised machine learning decisions. To assist farmers in maximizing
algorithm, Random Forest, can do both agricultural yield, timely instructions to
classification and regression tasks. They are forecast future crop output and analysis are
used in crop selection to reduce crop yield required.
output losses, regardless of the distracting
environment. Weather, climate, and other Keywords: Crop Yield Prediction, Random
related environmental elements have posed a Forest Algorithm
significant danger to agriculture's long-term
viability. Machine learning (ML) is II. Introduction:
significant since it offers a decision-support In terms of farm output, India is rated
tool for Crop Yield Prediction (CYP), which second in the world. Agriculture and related
may help with decisions like which crops to industries such as forestry and fisheries
cultivate and what to do during the crop's accounted for 16.6% of GDP in 2009,
growing season. Crop yield estimation's employing about half of the country's
major purpose is to boost agricultural crop workers. Agriculture's monetary
production, and it does so using a variety of contribution to India's GDP is steadily
decreasing. Plant crop yield is influenced by

978-1-6654-1447-0/21/$31.00 2021
c IEEE 611

Authorized licensed use limited to: M S RAMAIAH INSTITUTE OF TECHNOLOGY. Downloaded on May 03,2025 at 05:52:24 UTC from IEEE Xplore. Restrictions apply.
a variety of factors, including their expectations. Yield prediction used to
meteorological, geographical, organic, be calculated by looking at a farmer's
political, and economic considerations. previous experience with a specific crop.
When there are multiple crops to raise, it can Agricultural yield is mostly determined by
be challenging for farmers, especially if they weather conditions, pests, and harvest
are unfamiliar with market values. process planning. For making judgments
According to Wikipedia, the farmer suicide about agricultural risk management, having
rate in India has fluctuated between 1.4 and accurate information about crop production
1.8 per 100,000 people over the last decade. history is critical.
In 2015, the number of farmer suicides
surpassed 8000 up from 5650 in 2014. III. Literature Survey:
The employment of technology to raise
cultivation awareness has grown Farmers can use a variety of programs to
unavoidable in recent years. Seasonal forecast crop yields based on climate
climate change is also wreaking havoc on variables. The crops were predicted using
key assets such as land, water, and air, machine learning algorithms. The random
resulting in food insecurity. In one scenario, forest technique is used to train the model
agricultural yields are continually falling for the five meteorological parameters, but
short of demand, necessitating the additional agriculture inputs such as soil
development of a smart system to address quality, pests, chemicals utilized, and so on
the issue of declining crop yields. To address are not taken into account. To build the
this issue, we suggest a system that will give random forest, the model was trained using
crop selection based on economic and 200 decision trees. The trained model's
environmental variables, allowing farmers to accuracy was tested using 10 fold cross-
get the most yield from their crops while validation.
also helping to fulfill the country's rising
demand for food supply. Machine learning is Machine learning, which is a good empirical
used in the suggested method to produce approach for category and prediction, is
predictions. The system will give crop yield some other method to crop yield estimation.
and crop selection depending on weather It defined the corn yield estimation in Iowa
attributes appropriate for the crop, allowing State the usage of 4 system learning
farmers to get the most out of their crops. strategies including RF (Random Forest),
The method provides crop output ERT (Extremely Randomized Trees), and
projections based on characteristics such as DL (Deep Learning). Also, comparisons of
rainfall, temperature, area (in hectares), the validation information amongst them
season, and so on. Crop yield forecasting is had been presented. To observe the seasonal
a significant agricultural issue. Every farmer sensitivities of the corn yields, three-length
is interested in determining how much yield companies had been set up: (1) MJJAS
will be generated and whether it will match (May to September), (2) JA (July and

612 2021 4th International Conference on Computing and Communications Technologies (ICCCT)

Authorized licensed use limited to: M S RAMAIAH INSTITUTE OF TECHNOLOGY. Downloaded on May 03,2025 at 05:52:24 UTC from IEEE Xplore. Restrictions apply.
August), and (3) OC (surest mixture of the regression, decision Tree, Random Forest,
month).In terms of the correlation and Support Vector Machine can be used.
coefficient, the DL approach had the highest
accuracies for the three-period groups. The IV. Existing System:
accuracies in the OC group were relatively
good, indicating that the best month A crop yield prediction model based on
combination can be important in statistical CNN and Geographical Index. The existing
agricultural yield modeling. model had an issue with agricultural drifts
for crop cultivation that were not compatible
R. Ghadge el [1] concluded that this work with environmental elements such as
aids in enhancing agricultural production temperature, weather, and soil condition.
rates by employing several classification BPNN was utilized to train the created CNN
methods and comparing various model that utilized spatial characteristics as
characteristics. To predict crop yield, input for error prediction. The created model
various machine learning techniques were had the advantage of being deployed on a
examined. Artificial neural networks, real-time dataset derived from legitimate
support vector machines, K-Nearest geospatial resources. However, while the
Neighbors, Decision Trees, Random forests, new model reduced relative error, it
Gradient boosted decision trees, regularised decreased crop yield forecast efficiency.
greedy forests, and the proposed CSM
technique (Crop Selection Method), which The previous model employed SVM to
aids in predicting the sequence of crops that classify crop data based on the texture,
can be considered for planning in the shape, and color of patterns on the sick
coming seasons, are among the algorithms surface since it includes a clear perception
included for comparative analysis. of the faults. A previously utilized
technology, CNN, reduced the relative
Instead of utilizing MLR (multiple linear inaccuracy as well as the crop production
regression) and RF (random forest) models, forecast. Similarly, an existing model that
Khaki al [6] suggested an ELM model based combined a time series model with a Back
on artificial intelligence for coffee yield Propagation Neural Network (BPNN) and
prediction for small farms. Different used a smaller dataset size had inferior
machine learning models were compared to performance since fewer samples were used
the ELM models. In terms of extracting for prediction. In the realm of selection
features, the author claims that ELM models stability and precision, machine learning
are more efficient than RF and MLR methods were used.
models. To predict agricultural yield, many
supervised machine learning techniques ML has several useful techniques for
such as linear regression, polynomial determining the input and output link in
yield and crop prediction. In agriculture,

2021 4th International Conference on Computing and Communications Technologies (ICCCT) 613

Authorized licensed use limited to: M S RAMAIAH INSTITUTE OF TECHNOLOGY. Downloaded on May 03,2025 at 05:52:24 UTC from IEEE Xplore. Restrictions apply.
machine approaches are utilized for yield played a key part. This study examines the
prediction, smart irrigation, crop disease many agricultural strategies that employ
prediction, crop selection, weather machine learning, as well as their benefits
forecasting, and determining the minimum and drawbacks.
support price, among other things. These
strategies will increase the production of the
fields while reducing the farmers' input
efforts. Furthermore, machine and
technology advancements were accurate
because they utilized considerable data and
V. Architecture Diagram Random Forest Algorithm

Figure 1: Architecture Diagram Figure 2: Pictorial Representation of


Random Forest Algorithm

VI. Proposed System :


In Machine Learning, data is extremely well as the government website
crucial. Data from numerous cities in India [Link], and the climate data
was used to create and implement a crop comes from numerous districts and regions
yield forecast system. The data is considered in India, as well as the government website
at the district level because latitude varies [Link].
from location to location. To run the system,
three things are required: data about the
crop, the climate of a specific district, and Random forest is a supervised learning
the region. The data for our project comes technique (as shown in figure 2) that may
from several districts and regions in India as be applied to both classification and

614 2021 4th International Conference on Computing and Communications Technologies (ICCCT)

Authorized licensed use limited to: M S RAMAIAH INSTITUTE OF TECHNOLOGY. Downloaded on May 03,2025 at 05:52:24 UTC from IEEE Xplore. Restrictions apply.
regression. The Random Forest algorithm 2. The vote given by each decision tree
generates decision trees on distinct data for each predicted event was then
samples, predicts data from each subset, and calculated.
then votes on which option 3. Finally, we looked at the most
We employed the Random Forest technique popular predicted outcome, which is
to achieve high accuracy, which provides the random forest algorithm's final
accuracy that predicts by model and the forecast.
actual outcome of prediction in the dataset.
In the random forest, a decision tree is
created from a sample of data, and the trees VII. Conclusion:
provide predictions for each family. The best
solution is chosen by voting, which Data cleaning and processing, missing value
improves the model's accuracy. It produces analysis, exploratory analysis, and model
the best results for the system. creation and evaluation were all part of the
analytical process. Finally, we use a machine
Pseudocode of the Proposed System: learning method to predict the crop, with
varying outcomes. This leads to some of the
1. We first randomly select the 'k's to following crop forecast insights. Because
feature out of the total 'm' feature in this system will cover the most sorts of
the model. crops, farmers will be able to learn about
2. Using the best split point the k crops that have never been farmed before
feature is chosen and node d is and will be able to see a list of all possible
calculated. crops, which will aid them in deciding
3. Using the split method, split the which crop to cultivate. Furthermore, this
nodes into daughter nodes. method takes into account previous data
4. Repeat steps 1 to 3 until several production, allowing the farmer to gain
nodes have been reached. insight into market demand and costs for
5. To make an n number of trees, repeat particular crops. The user-friendly web page
steps 1 to 4 for an n number of times. built for estimating crop yield can be
utilized by any user with their choice of the
crop by giving climate data for that location.
To perform prediction using the trained
random forest algorithm uses the below VIII. References:
pseudocode as shown in figure 1:
[1] R. Ghadge, J. Kulkarni, P. More, S.
1. We used the test features and each Nene, and R. L. Priya, “Prediction of crop
random decision tree to predict the yield using machine learning,” Int. Res. J.
output and the outcome, which was Eng. Technology, vol. 5, 2018.
then saved.

2021 4th International Conference on Computing and Communications Technologies (ICCCT) 615

Authorized licensed use limited to: M S RAMAIAH INSTITUTE OF TECHNOLOGY. Downloaded on May 03,2025 at 05:52:24 UTC from IEEE Xplore. Restrictions apply.
[2]. [Link], [Link], [Link], and the rest 2020. Data Mining Based Marketing
of the crew - Random forests used to Decision Support System Using Hybrid
forecast global and regional crop yields. Machine Learning Algorithm.
PLoS ONE in a peer-reviewed journal.
[9] “Crop yield prediction using deep neural
[3] Crop Yield Prediction Using Machine networks,” by S. Khaki and L. Wang. pp.
Learning Algorithms, Aruvansh Nigam, 621 in Frontiers in Plant Science, vol. 10,
Saksham Garg, Archit Agrawal, Parul 2019.
Agrawal,2019 Fifth International
Conference on Image Information
Processing (ICIIP), pp 125-130.

[4] Computers and Electronics in


Agriculture, Volume 153, October 2018,
Pages 213-225, "Techniques for spatial
prediction of soil parameters and corn
production," Computers and Electronics in
Agriculture, Volume 153, October 2018,
Pages 213-225.

[5] "Crop Production-Ensemble Machine


Learning Model for Prediction,"
International Journal of Computer Science
and Software Engineering (IJCSSE),
Volume 5, Issue 7, July 2016

[6] “Crop yield prediction using deep neural


networks,” by S. Khaki and L. Wang. pp.
621 in Frontiers in Plant Science, vol. 10,
2019.

[7] T. Vijayakumar, T. Vijayakumar, T.


Vijayakumar "Journal of Innovative Image
Processing (JIIP), vol. 2, no. 03, pp. 121-
127, 2020.

[8] T. Senthil Kumar, T. Senthil Kumar, T.


Senthil Kumar, "Journal of Artificial
Intelligence, vol. 2, no. 03, pp. 185-193,

616 2021 4th International Conference on Computing and Communications Technologies (ICCCT)

Authorized licensed use limited to: M S RAMAIAH INSTITUTE OF TECHNOLOGY. Downloaded on May 03,2025 at 05:52:24 UTC from IEEE Xplore. Restrictions apply.

Common questions

Powered by AI

Predicting the yield of crops that have never been cultivated in a particular region is challenging due to a lack of historical data, which is critical for accurately training prediction models. Without region-specific data on soil types, local climate conditions, and historical yield patterns for these crops, models must rely on extrapolation from similar environments, which can introduce significant uncertainties. Additionally, the model might not account for local practices or unforeseen environmental stresses that could influence yields. Thus, this scenario requires the integration of diverse data sources and potentially supplemented by experimental trials to improve accuracy .

Incorporating data from multiple geographical locations is important for crop yield prediction models because it accounts for spatial variability in environmental and soil conditions that inherently affect crop growth. Using diverse location-based data ensures that models are trained on a wide variety of conditions, making them robust and generalizable across different regions. Such comprehensive data inclusion minimizes the risk of overfitting to specific local conditions and enhances the model's ability to accurately predict yields under different climatic scenarios. Consequently, this breadth of data significantly increases the predictive power and reliability of the models .

When implementing machine learning algorithms in agriculture, especially in regions with low technological adoption, ethical considerations include ensuring equitable access to technology, data privacy, and addressing potential biases in data collection. There is a risk that the benefits of machine learning could primarily favor large-scale, tech-savvy operators to the detriment of smallholder farmers who may lack access to necessary tools and education. Furthermore, care must be taken to protect farmers' data from misuse and ensure transparency in how data predictions are generated. Efforts must be made to facilitate knowledge transfer and capacity building to empower local communities to benefit equitably from technological advancements .

Machine learning models help mitigate challenges posed by climate change in agriculture by offering predictive capabilities that enable farmers to adapt their practices to changing environmental conditions. By analyzing historical and current climate data, these models can forecast weather trends, helping to mitigate risks associated with adverse climate events. This allows for proactive management strategies, such as timely irrigation and crop selection, which can reduce crop failure risks and help maintain food security despite climate fluctuations. These models thereby serve as critical decision-support tools in facing the uncertainties of climate change .

Key steps in developing a crop yield prediction model using machine learning include data collection, preprocessing, model training, and deployment. Data collection involves gathering climatic, soil, and crop yield data from various sources, ensuring coverage across different geographic and temporal scales. Preprocessing the data involves cleaning, normalization, and handling missing values to prepare it for model input. Model training employs machine learning algorithms like Random Forest or Deep Learning, using historical data to forecast future yields. Finally, deployment entails integrating the model into a user-friendly interface where farmers and stakeholders can enter new data and receive predictions, often accompanied by updates and feedback mechanisms for continuous improvement .

The Random Forest algorithm is favored for crop yield prediction because it is a powerful supervised learning technique that can handle classification and regression tasks effectively. It generates decision trees on distinct data samples, predicts data from each subset, and takes the average of all predictions, which enhances accuracy and reduces overfitting. Random Forest's ability to handle large datasets and model complex interactions between inputs like meteorological data make it suitable for the variable nature of crop yields. Additionally, its robustness against noise and default reliance on 'voting' across multiple trees can lead to a more stable and generalizable model compared to singular methods like decision trees or linear regression .

Machine learning can enhance agricultural decision-making beyond crop yield prediction by supporting smart irrigation systems, crop disease detection and prediction, efficient pest management, and soil health monitoring. By integrating AI with IoT devices, farmers can receive real-time updates and alerts on soil moisture levels and pest outbreaks, facilitating timely interventions to optimize crop production. Furthermore, ML algorithms can assist in forecasting market trends and determining optimal planting schedules, thereby helping farmers to maximize economic returns. These advanced data-driven insights can significantly improve the efficiency of resource usage and sustainability of agricultural operations .

The benefits of using machine learning for crop yield prediction in the Indian context include improved decision-making for farmers, aiding in crop selection, and optimizing resource usage through accurate yield forecasts based on climatic and geographical data. This technology can address challenges like unpredictable weather patterns and market uncertainties, ultimately aiming to increase agricultural productivity and sustainability. However, limitations exist such as the need for substantial and accurate datasets, which are often difficult to obtain in developing regions. Additionally, the variability of local farming practices and access to necessary technological infrastructure may hinder the adoption and efficacy of such models in diverse Indian farming communities .

Weather and climate factors are critical to the accuracy of machine learning models for crop yield prediction as they directly affect plant growth and development. Models incorporating variables such as rainfall, temperature, and seasonal changes can capture these impacts, allowing predictions to be more aligned with actual crop yields. Inaccurate climate data could lead such models to underperform in real-world scenarios if the local conditions deviate significantly from historical patterns. Thus, access to precise and updated environmental data strengthens predictive outcomes and ensures the model's relevance over time .

Supervised learning techniques differ from unsupervised ones in crop yield prediction by relying on labeled datasets to train models, allowing them to predict outcomes based on known input-output pairs. This method is often preferred because it enables more accurate and predictable modeling of the complex relationships between environmental factors and crop yields. Supervised techniques like Random Forest can leverage this structure to make reliable yield forecasts. In contrast, unsupervised learning explores patterns without prior labeling, which is less directly applicable to tasks requiring explicit target prediction, such as estimating crop yields .

You might also like