1
Concept Paper
A. Basic Information
Project Title:
HyperAi - Web Based Detection of Hypertisan Among News Article using Natural Language
Processing and Machine Leaning Algorithms
Topic:
Web Application , NLP , LSTM
Proponent:
CSA11 - Labing Ryz Vincent
Technical Description:
Crop yield estimation crucial in agriculture. Remote sensing (RS) methods are rapidly being employed
in the creation of decision support systems for modern agricultural systems to boost crop output and
nitrogen control while lowering operating expenses and environmental impact. However, because RS-
based systems require the processing of large amounts of remote sensing - based data from a variety of
platforms, machine learning (ML) techniques are gaining popularity. This is owing to the ability of ML
algorithms to handle non-linear problems and a large number of data. Over the last 15 years, there have
been significant breakthroughs in machine learning-based methods for reliable crop production. Rapid
advances in sensor technology and ML methodologies, according to the study, will provide cost-
effective and extensive solutions for improved crop and ecological status estimations and decision
making. In the near future, more targeted application of sensor platforms and machine learning
techniques, the fusion of different sensor modalities and expert knowledge, and the development of
hybrid systems combining different ML and signal processing techniques are likely to be included in
precision agriculture (PA).
Several Internet of Things (IoT) technologies has been designed for PA management [6]-[8]. A smart
farming system was designed to gather crop information and use a production system through
correlation analysis between crop statistical data and farming environment data to enhance agricultural
production [9]. The portals in [10] and [11] provide and control functionality based on the information
being observed.
Several Internets of things solutions has recently been created to control crop water usage. The brief
definition developed in [12] is one example. Modern devices, such as the one described in [13], enable
users to control the watering system via smart devices.
2
Concept Paper
Statement of the Problem:
Due to the sunlight rising every morning in the Greenly Plants on their Farm the rays of the sun is
damaging the green texture of the plants which can decrease the chlorophyll content of the plant and
also the other reason is the pest on that plant absorbs the nutrients of the plants making more damage
on it
1. How can they manage to protect the chlorophyll content of the plant and also avoid the pest on
absorbing the nutrients on it
2. How can an Web Application helps them on monitoring the texture and condition of the Plants
3. How can such Analysis from Time Series Data really helps on monitoring the plants everyday
4. How can accurate the prediction may be after the model training and testing
Objectives: General and Specific General Objective:
The main objective of this study is to design and develop a website for internal auditors to increase
their time efficiency.
Specifically, it aims to:
1. To manage the chlorophyll content of the plant they need to store the plants not directly
everyday on the Sun , re arrange it on the places that can get the right amount of Sun
2. The Web Application may help on monitoring it efficiently and not needing to much actions to
monitored it time to time
3. The Time Series Data helps the farmers on monitoring which part of the plants will they
monitored like if the pest is at risk or the plants is on risk based on the percentage that may
gathered time to time on monitoring process
4. To ensure the accuracy of the prediction they need to test it out and if they're is some issues
about it we can fix it
3
Concept Paper
How did others solve the problem?
1. Remotely monitored Web based Smart Hydroponics System for Crop Yield Prediction using
IoT
The proposed web application provides the farmers with an estimate of how much crop yield will
be produced based on the given sensor and user input. The crop yield prediction is provided in
tones and is estimated using Random Forest algorithm.
2. Intelligent decision support system for crop yield prediction using hybrid machine learning
algorithms
Based on the experimental results done on the agricultural data, the following observations have
been made. The performance of the individual algorithm and hybrid ML algorithms are compared
using cross-validation to identify the most promising performers for the agricultural dataset. The
accuracy of random forest regressor, gradient boosted tree regression, and stacked generalization
ensemble methods are 87.71%, 86.98%, and 88.89% respectively.
3. Development of crop chlorophyll detector based on a type of interference filter optical sensor
To achieve a non-destructive detection of chlorophyll content in the field crop canopy, based on
the difference in the reflectance characteristics of chlorophyll in the visible and NIR spectra
(400 nm to 950 nm), the hardware part of the designed sensor performed module integration and
sensor calibration, whereas the software part performed data acquisition and processing. The best
wavelength for modeling was selected by conducting field experiments, and the selected model
was embedded in the
4. Time-Series Monitoring of Transgenic Maize Seedlings Phenotyping Exhibiting Glyphosate
Tolerance
In this study, a glyphosate-sensitive cultivar exhibited significant changes in leaf reflection and
photosynthetic activity at 6 DAT, and this may be useful for phenotyping. Then we demonstrated
the feasibility of using the LCC-predicted results from robust machine learning models to analyze
physiological responses to glyphosate stress. By updating source domain and applying TCA
algorithm, the model performance was improved effectively (R2p increased from 0.65 to 0.84,
RMSEp decreased from 4.94 to 4.03 μg⸱cm−2). We also investigated the potential of using a time-
series rapid ChlF transient to dynamically dissect the photosynthetic physiological response of
different glyphosate-tolerance maize cultivars. Combining LCC with ChlF data, it was concluded
that the oxidative stress caused by glyphosate and the detrimental effects on photosynthesis are
interconnected. For glyphosate-sensitive cultivars, the inhibition of the shikimic acid pathway led
to changes in redox status and acted on the photosynthesis of plants. Moreover, according to the
result of PCA loadings, φE0, VJ, ψE0, and M0 could be used to indicate damage caused by
glyphosate stress to differentiate resistant cultivars. This research was the first attempt to analyze
the response of LCC to glyphosate by using improved machine learning algorithms with transfer
4
Concept Paper
strategies based on datasets from different experiment times. Moreover, the rapid ChlF transient
technique exhibits potential for glyphosate stress response monitoring and was able to facilitate
superior genotype screening and developing.
5. Smart Pest Control System in Agriculture
This process is aided by the temperature and humidity sensor which is interfaced with Raspberry
pi-3B to check the level of temperature and humidity in the atmosphere. When the temperature
and humidity value exceeds the threshold value then the images of the plant is captured and
compared with the healthy image. Raspberry pi is used to process all the obtained parameters. In
case all the parameters such as temperature and humidity exceeds the predetermined level and the
image obtained from the camera does not match with the healthy leaf, then the obtained real time
values are compared with the database to identify the affected leaf and to alert the farmer through
the GSM module.
Other Related Studies (Data Gathering and Machine Learning Methods)
6. Anomaly detection in smart agriculture systems
This research was carried out on anomaly detection systems in order to increase the security level of
smart agriculture systems. The work sought to develop a first anomaly detection prototype. To this end,
an architecture based on a machine-learning algorithmic approach was designed. MLR and LSTM are
two different ML techniques that could also be merged to make the best use of them. On the dataset
used for the experiments, the results obtained with the use of LSTM networks are really close to
[Link] content for millet leaf using hyperspectral imaging and an attention-convolutional
neural network
In this study, hyper-spectral imaging technology was used to obtain the spectral and image information
of millet leaves at different growth stages, and the average spectra of the millet leaves were extracted
by intelligently extracting the ROI. The original spectral data were pre-processed by MSC and were
subsequently analyzed to study the effect and correlation of pre-treatment. We used the CC-SPA model
for data reduction and extracted the characteristic parameters based on the spectral and image
information. Furthermore, single characteristic and multi-characteristic parameter fusion were used to
build the PLSR model. The multi-characteristic parameter fusion achieved accurate prediction results
(Rv 2 = 0.813, RMSEv = 1.766) and exhibited better prediction accuracy than the single characteristic
parameter model. Based on the multi-characteristic parameter fusion model, the attention-CNN model
5
Concept Paper
yielded more accurate results (Rv 2, RMSEv, and RPD were 0.839, 1.451, and 2.355, respectively)
than the PLSR model (Rv 2, RMSEv, and RPD were 0.813, 1.766, and 2.167, respectively) and the LS-
SVM model (Rv 2, RMSEv, and RPD were 0.806, 1.576, and 2.061, respectively). In addition, the
difference between the modeling accuracy and the actual prediction ability of the attention-CNN model
was the smallest (0.026). These results demonstrated that the attention-CNN model has a higher
prediction accuracy and regression fit than the conventional models and exhibits better adaptability to
the sample data. Therefore, the attention-CNN model is a highly advantageous novel method for non-
invasive measurement of chlorophyll content in millet plants.
6
Concept Paper
Conceptual Framework
Figure 2. Conceptual Framework of the Study
Data cleaning and pre-processing are essential steps to ensure the accuracy and reliability of
subsequent data analysis. This phase involves systematically identifying and addressing
incomplete or inconsistent data entries that could introduce bias or inaccuracies. During this
process, missing values, outliers, and inconsistencies within the dataset are carefully
scrutinized and handled. In addition to removing problematic data entries, this stage also
involves standardizing the data format across the dataset. This standardization ensures that all
data entries follow a consistent structure, such as uniform date formats or consistent labeling
of categorical variables, which is crucial to avoid errors in analysis or misinterpretations of
7
Concept Paper
results. Handling missing data is another key aspect of this stage. Rather than simply
discarding entries with missing values, imputation techniques are often employed to estimate
and fill in these gaps, preserving the overall integrity and comprehensiveness of the dataset.
Once the data has been cleaned and pre-processed, the next step is data visualization, where
raw data is transformed into visual representations that are easier to interpret. Creating charts,
graphs, and dashboards provides a clearer understanding of data distribution and patterns.
Tools like Tableau and Power BI are particularly effective for this purpose, offering dynamic
and interactive ways to explore the data. These visualization tools allow users to drill down
into specific data points, apply filters, and visualize trends over time, making it easier to
identify key insights. Data visualization is not only useful for initial data exploration but also
plays a vital role in communicating findings to stakeholders who may not have a technical
background.
Following the visualization, data clustering is used to group similar data points together. This
step typically employs algorithms like K-means and hierarchical clustering to identify natural
groupings within the dataset. K-means clustering partitions the data into a predefined number
of clusters, each represented by a centroid, which reflects the average of the points within that
cluster. This approach is particularly effective for identifying distinct groups within large
datasets. In contrast, hierarchical clustering builds a tree-like structure known as a
dendrogram, grouping data points based on varying levels of similarity. This method offers
flexibility in determining the number of clusters, as the tree can be cut at different levels to
form larger or smaller clusters. Utilizing both methods provides a comprehensive
understanding of the data's underlying structure, which is essential for subsequent analysis.
The next critical step is risk scaling, where internal auditors assess and rate various risks
associated with the dataset or its outcomes. Risks are quantified on a scale from 1 to 5, with 1
indicating low risk and 5 representing extreme risk. This quantification helps prioritize risks,
determining which need immediate attention and which can be monitored over time. To
enhance understanding, a risk matrix is created, visually mapping the risks according to their
severity and likelihood. This matrix provides a clear overview of the risk landscape, aiding in
8
Concept Paper
informed decision-making and effectively communicating risk levels to stakeholders.
Data analysis follows, where the cleaned and clustered data is subjected to various analytical
techniques to uncover insights. Descriptive analysis summarizes the main characteristics of the
data, such as the mean, median, and standard deviation, providing a statistical overview that
highlights central tendencies and variability within the dataset. This summary forms the
foundation for more complex analyses. Pattern recognition techniques are then applied to
identify recurring trends or relationships within the data, such as analyzing time series data to
detect patterns over time or using sequence analysis to understand event order and occurrence.
Additionally, association rule mining is employed to discover interesting relationships
between variables, often used in market basket analysis to identify items that frequently co-
occur in transactions. This technique can reveal hidden connections in the data that may not be
immediately apparent through other methods.
Finally, machine learning models are implemented to refine insights gained from earlier steps.
Gaussian Mixture Models (GMM) are used to enhance clustering results by accommodating
clusters of varying shapes and sizes, including those that overlap. This flexibility makes GMM
particularly useful for datasets that do not fit neatly into the rigid clusters formed by methods
like K-means. Beyond clustering, classification models such as Decision Trees, Random
Forests, and Support Vector Machines are utilized to predict outcomes based on the data.
These models can classify data into predefined categories or forecast future trends based on
historical data. Additionally, anomaly detection algorithms like Isolation Forest are employed
to identify outliers or unusual patterns that could indicate potential issues, such as fraud or data
entry errors. By detecting these anomalies early, the system can flag them for further
investigation, ensuring the overall integrity of the data analysis process.
Target users / Beneficiaries:(Describe each Beneficiary)
This study will be beneficial to the internal auditors since it will help them reduce paper
waste and increase efficiency in their work.
9
Concept Paper
Significance of study:
The findings of this study may be useful to Branches of Laguna State Polytechnic University's
staff.
References
● Pehlivan, D., & Cicek, K (2021) A knowledge-based model on quality management
system compliance assessment for maritime higher education institutions
○ Pehlivan, D., & Cicek, K. (2021). A knowledge-based model on quality
management system compliance assessment for maritime higher education
institutions. Quality in Higher Education, 27(2), 239–263.
[Link]
● Harald Haelterman(2020) Breaking Silos of Legal and Regulatory Risks to Outperform
Traditional Compliance Approaches
○ [Link]
● Klaus-Ulrich Remmerbach (2020) The effectiveness of compliance management systems
○ [Link]
341828773_The_effectiveness_of_compliance_management_systems
● A. Rifaut, "Compliance management with measurement frameworks," 2011 Fourth International
Workshop on Requirements Engineering and Law, Trento, Italy, 2011, pp. 15-24, doi:
10.1109/RELAW.2011.6050268. keywords: {Business;Adaptation models;Context;Object
recognition;Computational modeling;ISO standards;IEC standards},
○ [Link]