0% found this document useful (0 votes)
17 views23 pages

Research Proposal Latest

This research proposal aims to compare hybrid methods for predicting stock market prices of FAANG(MAANG) companies before and after the Covid-19 pandemic. It explores the integration of machine learning models, particularly Deep Learning techniques like LSTM, with news sentiment analysis to enhance prediction accuracy. The study will analyze historical stock price data and economic factors to identify the most effective hybrid model for stock market forecasting.

Uploaded by

divya.t2h4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views23 pages

Research Proposal Latest

This research proposal aims to compare hybrid methods for predicting stock market prices of FAANG(MAANG) companies before and after the Covid-19 pandemic. It explores the integration of machine learning models, particularly Deep Learning techniques like LSTM, with news sentiment analysis to enhance prediction accuracy. The study will analyze historical stock price data and economic factors to identify the most effective hybrid model for stock market forecasting.

Uploaded by

divya.t2h4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Comparative study of Hybrid Methods to Predict Stock Market Prices of

FAANG(MAANG) companies in Pre-Covid and Post-Covid Era

Chunduri Sai Satya Teja

Research Proposal

MAY 2023
Abstract

The stock market is a dynamic and unpredictable system, characterised by its nonlinear
nature. Forecasting stock prices is a challenging task due to the influence of various factors,
including political conditions, the global economy, financial reports, company performance,
and more. FAANG is a phrase used to refer to five note-worthy technology companies: Meta
Platforms (formerly known as Facebook), Amazon, Apple, Netflix, and Google (now
Alphabet). Collectively, these companies possess a market capitalization of nearly $5 trillion
and represent approximately 13 percent of the NASDAQ index. This substantial market
presence positions them as the fourth-largest economy in the world in terms of gross domestic
product (GDP).
The question of whether investing in FAANG stocks is a sound decision is frequently
debated, owing to the significant impact these stocks have on the broader market due to their
substantial price movements. For instance, during the onset of the Covid-19 pandemic in
2020, all FAANG stocks performed well for various reasons. However, by the end of 2021,
each of these stocks faced pressure, resulting in a sell-off in the tech sector and subsequently
affecting the broader market.
Machine learning (ML) models have gained considerable value in financial activities,
encompassing portfolio management, bankruptcy prediction, financial risk analysis, and stock
trading. The two most commonly used models for financial prediction tasks are Artificial
Neural Networks (ANN) and Support Vector Machines (SVM). These models can identify
non-linear patterns in data without requiring prior knowledge. In financial research, other
preferred methods for statistical analysis include Random Forest (RF), Linear Discriminant
Analysis (LDA), Logistic Regression (LR), and Evolutionary Computation methods.
However, as the feature space expands, the training time of the models increases, and
interpreting the outputs becomes more challenging. To address this issue and improve
generalisation in machine learning models affected by high-dimensional feature spaces and
data sparsity, dimensionality reduction techniques are often employed. Dealing with
non-linear and noisy data poses challenges in selecting appropriate features, and traditional
feature selection methods may prove inadequate.
In recent studies, Deep Learning (DL) models have emerged as powerful alternatives to
feature selection methods. These models act as feature extractors, generating complex feature
representations from raw data or simpler features at different levels of abstraction in each
layer. One popular DL model is Long Short-Term Memory (LSTM), which excels in financial
forecasting by utilising feature representations derived from time series data. LSTM's ability
to consider long-term dependencies and temporal effects in time series data through feedback
links sets it apart from traditional artificial neural networks. This research aims to predict
outcomes using the aforementioned Deep Learning Models.
Furthermore, studies have shown that news has a significant impact on stock prices. Positive
news about a company, for example, can lead to an increase in stock prices, trading volume,
and overall market activity. This paper focuses on comparing the accuracy of Hybrid Time
Series Prediction Models that combine News Sentiment Analysis using news headlines and
content with machine learning models such as LSTM, CNN, and GARCH. By integrating
news sentiment analysis with machine learning models, this research aims to explore the
relationship between news events and stock prices, thereby enhancing prediction accuracy in
financial forecasting.
LIST OF TABLES

Table 7.1 Description of Categorical Data 17

Table 7.2 Description of Numerical Data 17


LIST OF FIGURES

Figure 9.1 Gantt Chart 22


Table of Contents

LIST OF TABLES

LIST OF FIGURES

Abstract 2

1. Background 6

2. Problem Statement 9

3. Related Works 10

4. Aims and Objectives 13

5. Significance of the Study 14

6. Scope of the Study 15

7. Research Methodology 17

8. Required Resources 21

9. Research Plan 22

References 23
1. Background

1.1 Predicting stock market trends is crucial to helping investors make informed
decisions. As a complex and dynamic system, the stock market is influenced by
various factors, including economic and political conditions, company-specific
news, events, and investor sentiment. By researching stock market prediction, we
can gain a better understanding of these factors and create models that effectively
capture their impact on stock prices.

1.2 The primary goals of stock market prediction research is to make informed
investment decisions. By analysing past and present trends in the stock market,
researchers can provide insights into the potential future performance of stocks,
enabling investors to make more informed investment decisions. This can help
investors optimise their portfolio and minimise risk exposure, which is crucial
for long-term investment success.

1.3 Advantages: The advantages of using time series forecasting for stock
price prediction are numerous, and they have significant implications for
investors. Some of the primary benefits of using time series forecasting
for stock price prediction include:

● Identifying trends and patterns: Time series analysis can help investors identify
trends and patterns in stock market data, providing insight into the underlying
behaviour of financial markets. This information can be used to make more
informed investment decisions and potentially optimise investment portfolios.
● Predicting future prices: Time series forecasting can be used to generate
predictions about future stock prices, which can be valuable in making
investment decisions. Accurate predictions can help investors identify buying
and selling opportunities, take advantage of market trends, and optimise returns
on investment.
● Managing risk: By analysing past trends and patterns in the stock market,
investors can identify potential risks and take steps to mitigate them. Time
series analysis can help investors develop a more complete understanding of
market behaviour and identify patterns that may signal potential changes in
market conditions.
● Improving efficiency: Time series forecasting can help investors streamline
their investment decisions by providing a data-driven approach to decision-
making. With access to accurate predictions and insights about market trends
and patterns, investors can make more informed decisions in a shorter
amount of time.
● Enhancing decision-making: Time series forecasting can improve the quality of
investment decisions by providing a more complete picture of the market. By
taking a data-driven approach to investment decisions, investors can avoid
relying solely on intuition or emotion, leading to more objective and potentially
more successful outcomes.
1.4 Statistical Methods for Stock Market Prediction using Time Series
Analysis:
Time series analysis is a statistical technique that helps identify patterns and
trends in data collected at regular intervals over time, enabling predictions about
future values.
● The Autoregressive (AR) model is a time series model that uses past values of
a series to predict future values. The order of the model determines the number
of past values used in the prediction.
● The Moving Average (MA) model is a time series model that predicts future
values based on past prediction errors. The order of the model specifies the
number of past errors used in the prediction.
● The Autoregressive Moving Average (ARMA) model is a combination of the AR
and MA models, making predictions based on both past values and errors. The
model assumes that the current value is a function of both past values and past
errors.
● The Autoregressive Integrated Moving Average (ARIMA) model is a more
sophisticated version of the ARMA model that can handle non-stationary time
series data. The ARIMA model includes a differencing step that can be used
to transform non-stationary data into stationary data, which is easier to model.
● The Seasonal Autoregressive Integrated Moving Average (SARIMA) model is
an extension of the ARIMA model that includes a seasonal component. The
SARIMA model is used to analyse time series data that exhibit seasonal patterns,
such as monthly or quarterly data.

1.5 Machine Learning and Deep Learning Methods for Stock Price
Prediction:

● Linear regression: A simple and widely used machine learning model


for stock market prediction, linear regression models the relationship
between a dependent variable (e.g. stock price) and one or more
independent variables (e.g. market trends or other financial indicators).
● Support Vector Machines (SVMs): A popular machine learning model for
classification and regression, SVMs work by finding the linear boundary that
best separates the data points into different categories or predicted values.
● Random Forests: A type of decision tree algorithm, random forests are used
for classification and regression tasks. They generate multiple decision trees and
combine the results to provide more accurate predictions.
● Artificial Neural Networks (ANNs): A type of deep learning model inspired by
the structure of the human brain, ANNs consist of layers of interconnected
nodes that can learn to recognize patterns and make predictions.
● Convolutional Neural Networks (CNNs): A type of ANN that specialises in
image processing and has been adapted for time series analysis. They work by
using convolutional layers to extract features from the input data and then
passing those features through fully connected layers to make predictions.
● Long Short-Term Memory (LSTM) Networks: A type of recurrent neural
network (RNN), LSTMs are designed to handle sequential data and can
model long-term dependencies in time series data.
● Gated Recurrent Unit (GRU) Networks: A variation of RNNs, GRU
networks are similar to LSTMs but have fewer parameters and can be trained
faster.
1.6 News Sentiment Analysis for Stock Price Prediction:

1.6.1 News sentiment analysis is a natural language processing technique


used to determine the sentiment or emotion behind news articles or
social media posts. By analysing the sentiment of news articles related
to a particular stock, it is possible to predict its future price movements.
Here are some of the news sentiment analysis models that have been
used for stock price prediction:

​ Bag-of-Words (BoW): A simple and widely used model for sentiment


analysis,
BoW represents a text document as a bag of words without considering
the order or structure of the words. It can be used to determine the
overall sentiment of a news article.
​ Word Embeddings: A more sophisticated approach to sentiment
analysis, word embeddings represent words as vectors in a high-
dimensional space based on their context in a corpus of text. They can
capture more nuanced relationships between words and can be used
to detect more subtle sentiment in news articles.

1.6.2 News sentiment analysis for stock price prediction typically involves
using classification models to classify the sentiment of news articles
as either positive, negative, or neutral. Here are some of the
classification models that have been used for news sentiment analysis
in stock price prediction:

​ Gradient Boosting: A machine learning algorithm known for its


ability to create a robust model by combining multiple weaker
models. In the context of news sentiment analysis, gradient boosting
has been applied by sequentially training decision trees, with each
subsequent tree aiming to rectify the errors made by the previous
trees.
​ K-Nearest Neighbours (KNN): A simple classification algorithm
that determines the class of a new sample based on the classes of its k
nearest neighbours in the feature space, KNN has been used for news
sentiment analysis by treating news articles as vectors in a high-
dimensional feature space and finding the k nearest neighbours to a
new news article.
​ Ensemble Learning: A technique that combines multiple
classification models to make predictions, ensemble learning has been
used for news sentiment analysis by combining the outputs of
multiple models such as SVMs, logistic regression, and decision trees
to improve the accuracy and robustness of the prediction.
2. Problem Statement

Stock market price prediction is a complex and challenging problem that requires an in-
depth understanding of market dynamics, economic trends, and financial data analysis.
The objective of this project is to compare various robust machine learning models that
can accurately forecast the future price trends of a selected set of stocks.

To achieve this, the model should be trained on historical market data, which includes
price trends, trading volumes, news sentiment, and other relevant market factors. The
model should also be able to adapt and evolve over time by incorporating new data
inputs and adjusting its predictions accordingly. The success of the model will be
evaluated based on its ability to predict stock prices over different time horizons, ranging
from short-term (e.g., daily or weekly) to long-term (e.g., monthly or quarterly).

The stock market is a highly dynamic and unpredictable entity, which poses a significant
challenge to investors and traders to accurately predict stock prices. The unprecedented
impact of the COVID-19 pandemic has further added to the complexity of stock market
forecasting, making it an intriguing research area. FAANG(MAANG) companies
(Facebook, Amazon, Apple, Netflix, Google, Microsoft) are among the most valued
companies in the world and are known for their volatility in the stock market. Various
methods have been used to predict stock prices, ranging from traditional time series
analysis to modern machine learning algorithms.

The study will leverage historical stock price data of the FAANG(MAANG) companies
and economic factors to develop the hybrid models. The research will analyse the
performance of the hybrid models in predicting stock prices in the pre-COVID and post-
COVID era and identify the most effective hybrid method. The outcome of the research
will provide insights into the suitability of different prediction techniques for stock
market forecasting in the current dynamic market conditions.
3. Related Works

1. In (Shu and Gao, 2020); Predicting stock prices accurately is crucial for
investors seeking to maximise profits. However, this task is challenging
pertaining to the non-linear and non-stationary nature of stock price series.
The emergence of machine learning has provided a potential solution, as
artificial neural networks (ANNs) have proven to be an effective tool for
analysing and predicting time series, thanks to their ability to model non-
linear relationships.

2. In (Wang et al., 2019); Fama proposed the efficient market hypothesis,


suggesting that stock prices reflect all available information, but this method
only considers historical and current stock prices and overlooks news as a
significant source of price volatility. Preiset al. found that Google Trends
search volumes for certain events prior to news releases can predict future
trends, indicating that search volume can reflect the current state and forecast
future trends.

3. In(Agrawal et al., 2021); In this study, Technical Indicators were utilised for
stock trend forecasting using EDLA and two benchmark ML algorithms. The
experiment was conducted on three prominent stocks from NSE India, and
performance evaluation demonstrated the superiority of the proposed
approach. Additionally, various optimization techniques and other STIs can be
applied to enhance the deep learning model further.

4. In (Joshi et al., 2016); The prediction of stock price trends is a dynamic field
of research, as more precise predictions are directly correlated with higher
returns on stocks. As a result, considerable efforts have been made in recent
years to create models that can predict the future trend of a particular stock or
the overall market. Many existing techniques employ technical indicators.
However, some researchers have shown that news articles about a company
are strongly related to fluctuations in its stock prices.

5. In (Yasir et al., 2020); This study aims to enhance the accuracy of interest
rate forecasting in five countries, namely the UK, Turkey, China, Hong
Kong, and Mexico, by leveraging a deep learning model that incorporates
both daily interest rate and exchange rate data from Jan 2010 to Oct 2019, as
well as Twitter sentiments related to six major events. The events include
the 2012 US and Mexican elections, the 2014 Gaza conflict and Hong Kong
protests, the 2015 refugee welcome, and the 2016 Brexit referendum. The
study finds that incorporating event sentiment into the deep learning model
leads to a significant reduction in forecasting error, with the Hong Kong
interest rate seeing a remarkable 266% decline in error. These results
highlight the potential of deep learning models and social media analysis in
improving financial forecasting accuracy.
6. In (Maqsood et al., 2022); In this study, a hybrid deep learning and machine
learning model is proposed that utilises social media sentiment analysis of
Brexit to predict stock market trends. The model incorporates support vector
machines (SVM) and linear regression (LR) as machine learning models,
and convolutional neural networks (CNNs) as a deep learning model. To
further enhance its accuracy, the model incorporates around 1.82 million
tweets related to major and minor contributors to the EU budget.

7. In (Pahlawan et al., 2021); The study employed an RNN to forecast future


stock prices based on daily closing stock prices and the rupiah exchange rate
against the dollar. Results show that the RNN model had a MAPE value of
1.546% without incorporating the exchange rate variable, and 1.558% when
it was included in the model.

8. In (Ju and Chen, 2022); The innovative approach proposed in this study
utilises deep convolutional neural networks to organise market price
activities into a bell-shaped curve, providing valuable insight into the levels
of greed and fear in the market. Additionally, the study explores the impact
of time causality on patterns, a critical consideration when analysing market
trends. A significant challenge in this type of analysis is capturing rapid
market changes and identifying the causal links between patterns and trend
reversal behaviour using only static features extracted by AI models. To
overcome this challenge, the study employs advanced techniques to generate
images that retain time-variant features and can be effectively analysed
using convolutional neural networks. This approach represents a significant
step forward in our ability to predict market trends and identify key market
indicators.

9. In (Lin et al., 2021); In addition, this project utilised Natural Language


Processing (NLP) to extract daily news topics, which is a crucial factor in
the features, particularly in the case of unexpected events such as COVID-
19. Therefore, this study compared the model's performance during normal
periods and the COVID-19 period.

10. In (Li et al., 2020); This project also incorporated Natural Language
Processing (NLP) to extract daily news topics, which is a crucial aspect of
the features, especially during unexpected events such as the COVID-19
pandemic. As a result, the performance of the model was compared between
normal periods and the COVID-19 period.

11. In (Jabeen et al., 2021); This paper presents a stock prediction system that
utilises a layered deep learning model to learn sequential information within
market snapshot series. The system employs technical analysis to represent
numerical price data through technical indicators and sentiment analysis to
represent textual news articles via sentiment vectors. Additionally, a fully
connected neural network is used to make stock predictions.
12. In (Chou and Ramachandran, 2021); Time series models perform well in
predicting periodic patterns, but they are not practical for finance because
they often result in over- or under-predictions of prices. Graphs of time
series models show a regression line that resembles a smooth moving
average line, but it lags behind actual prices. While sentiment scaling
systems are a popular method for extracting the polarity of text, they are
limited by the availability of sentiment lexicons. One solution to this
limitation is the use of word embedding models, which can convert text into
informative matrices to complement the missing sentiment lexicons. These
models can be particularly useful for capturing sudden and short-term
influences.

13. In (Heidein and Parpinelli, 2022); The proposed model consists of two
stages. The first stage involves performing sentiment analysis on news
collected from the New York Times newspaper using VADER. The second
stage is a stock price prediction model based on the LSTM architecture. The
model incorporates a robust parameter tuning strategy.

14. In (Zhong and Enke, 2019); This paper proposes an innovative ensemble of
advanced methods to predict the stock prices of Apple Inc, a renowned
company listed on NASDAQ. To achieve this, the paper first performs a
comprehensive sentiment analysis of news and headlines using an advanced
version of BERT, which is a pre-trained transformer model by Google for
Natural Language Processing (NLP). Subsequently, a cutting-edge
Generative Adversarial Network (GAN) is employed to forecast the stock
price for Apple Inc by incorporating various technical indicators, stock
indexes of different countries, selected commodities, historical prices, and
sentiment scores. To assess the performance of the proposed model, the
paper compares it with several baseline models, including Long Short-Term
Memory (LSTM), Gated Recurrent Units (GRU), vanilla GAN, and Auto-
Regressive Integrated Moving Average (ARIMA) models.

15. In (Haque et al., 2022); The study employs rigorous hypothesis testing
procedures to investigate the classification accuracy of Deep Neural
Networks (DNNs) on two datasets that underwent pre-processing using
Principal Component Analysis (PCA). The simulation results revealed that
the DNN models utilising these PCA-represented datasets exhibited
significantly higher classification accuracy compared to the original dataset,
as well as other hybrid machine learning algorithms that were tested. This
highlights the potential of employing PCA as a pre-processing step to
enhance the performance of DNNs in classification tasks.
4. Aim and Objectives

The purpose of this study is to compare the effectiveness of different hybrid models for
predicting the stock market prices of FAANG (Facebook, Apple, Amazon, Netflix,
Google) or MAANG (Microsoft, Apple, Amazon, Netflix, Google) companies. The study
aims to evaluate and compare the performance of various hybrid methods, such as
machine learning algorithms, time-series analysis, and sentiment analysis, to identify the
most accurate method for predicting stock prices. Moreover, the study aims to explore the
potential of incorporating other advanced techniques, such as reinforcement learning and
deep reinforcement learning, to further improve the accuracy of hybrid methods in stock
market prediction. Additionally, the study will examine the robustness of the proposed
hybrid models by analysing their performance in different market conditions, including
bull and bear markets. Overall, the study's contribution to the field of financial forecasting
can help investors and traders make better-informed decisions and reduce risks associated
with stock market investments.
The research objectives are formulated based on the aim of this study which are as follows:

● To compare the accuracy of hybrid methods in predicting stock market prices of


FAANG (Facebook, Apple, Amazon, Netflix, Google) or MAANG (Microsoft,
Apple, Amazon, Netflix, Google) companies over a specified time period.
● To assess the effectiveness of machine learning algorithms, time-series analysis, and
sentiment analysis in the hybrid methods for predicting stock market prices of
FAANG or MAANG companies.
● To investigate the impact of external factors, such as macroeconomic indicators,
market trends, and news events, on the accuracy of the hybrid methods for
predicting stock market prices of FAANG or MAANG companies.
5. Significance of the Study

Studying hybrid models for stock market price prediction is significant because it can
improve the accuracy and robustness of the predictive models. Hybrid models utilise a
combination of various machine learning techniques including statistical models, neural
networks, and fuzzy logic. This approach is designed to capitalise on the advantages of
each model while mitigating their individual shortcomings, resulting in an effective and
versatile solution.

One of the main purposes of using hybrid models for stock market price prediction is that
they can handle non-linear relationships and complex patterns in the data more effectively.
This is particularly important in financial markets, where the relationships between
different market factors can be intricate and dynamic. The advantages of using Hybrid
Models for Stock Market Price Prediction are as follows:

● Improved Accuracy: Research studies have shown that hybrid models can outperform
individual models in terms of accuracy in stock market price prediction. For example,
a study conducted by Zhang et al. (2019) compared the performance of a hybrid model
combining deep neural networks and fuzzy logic with individual models. The study
found that the hybrid model had a higher prediction accuracy than the statistical
models.
● Interpretability: Hybrid models can provide greater interpretability in stock market
price prediction. For instance, a study by Li et al. (2019) developed a hybrid model
combining wavelet transform, principal component analysis, and support vector
machine to predict stock prices. It was found that the hybrid model provided valuable
insights into the underlying factors driving the predictions.
● Robustness: Hybrid models improve the robustness of stock market price prediction by
reducing errors or biases from a single model. For example, a study by Nguyen et al.
(2021) developed a hybrid model combining LSTM, ANN with News Sentiment
Analysis to predict stock prices in the Vietnamese stock market. The study found that
the hybrid model provided more accurate and robust predictions compared to classical
models.
● Real-world applications: Hybrid models have been successfully applied in real-world
applications for stock market price prediction. For instance, Google's AutoML
platform uses a hybrid approach to predict stock prices by combining deep learning
with reinforcement learning. The platform has been used by various investment firms
to develop customised predictive models for their portfolios.
6. Scope of the Study

Stock market price prediction using hybrid models, the scope of this study can include
various aspects. These aspects may include the evaluation of the effectiveness of different
hybrid models proposed in recent papers, analysing the impact of news sentiment and
technical analysis indicators, investigating the role of financial variables, analysing
hardware constraints, and identifying future research directions.
One major focus of this study could be to evaluate the effectiveness of different hybrid
models proposed in recent papers. These models often combine various techniques, such as
deep learning, machine learning, and sentiment analysis, for predicting stock prices. The
study can compare and evaluate the effectiveness of these models based on different
metrics such as accuracy, precision, recall, and F1 score. The strengths and limitations of
these models can also be analysed, and areas for improvement can be identified.
Another important aspect of this study could be to analyse the impact of news sentiment on
stock prices. Many of the recent papers incorporate news sentiment analysis into their
hybrid models for stock price prediction. The study can investigate the impact of news
sentiment on stock prices and evaluate the effectiveness of different news sentiment
analysis techniques. The potential of incorporating other types of sentiment analysis, such
as social media sentiment analysis, in hybrid models for stock price prediction can also be
explored.
In addition to news sentiment analysis, technical analysis indicators such as moving
averages and relative strength index have also been used as input features in recent hybrid
models. This study can analyse the role of these indicators in stock price prediction and
evaluate the effectiveness of different indicators. The use of other technical analysis
indicators can also be explored, and areas for improvement in using these indicators for
stock price prediction can be identified.
Furthermore, this study can investigate the role of different financial variables, for example
trading volume, market cap, and dividend income, in stock price prediction. The effects of
these variables on stock prices can be analysed, incorporating these variables into hybrid
models which can be evaluated. The use of other financial variables can also be explored,
and areas for improvement in using these variables for stock price prediction can be
identified.
In addition to the above aspects, this study can also analyse hardware requirements and
constraints associated with implementing hybrid models for stock price prediction. The
studies mentioned above use different hardware configurations and requirements for
implementing their models. This study can analyse the hardware requirements and
constraints associated with implementing these models and evaluate their scalability and
efficiency. The potential of using cloud computing or other technologies to overcome these
constraints and improve the scalability and efficiency of hybrid models for stock price
prediction can also be explored.
Finally, this study can identify areas for future research in stock price prediction using
hybrid models. For example, the use of other data sources, such as social media and
internet search trends, in predicting stock prices can be explored.
The use of reinforcement learning techniques in stock price prediction can also be
investigated, and the impact of regulatory changes on stock prices can be analysed. By
identifying future research directions, this study can contribute to the development of more
effective and accurate hybrid models for stock price prediction.
In summary, this study can have a comprehensive scope that includes evaluating the
effectiveness of different hybrid models, analysing the impact of news sentiment and
technical analysis indicators, investigating the role of financial variables, analysing
hardware constraints, and identifying future research directions. By addressing these
aspects, this study can contribute to the development of more accurate and effective models
for predicting stock prices and provide insights for investment decision making.
7. Research Methodology

Dataset: The dataset here has 10 features and 3000 instances. 7 of the 10 features
are numerical while the rest are categorical.
Stats about the dataset:

Table 7.1 Numerical Data

Table 7.2 Categorical Data


Pre-processing: This project involved two main components:

● Technical Indicators: Calculation of the most popular technical indicators (7 days and
21 days moving average, exponential moving average, momentum, Bollinger bands,
MACD) for investors using historical asset data and some trend features.
● News Sentiment Analysis: Analysis of daily news of [Link] using NLP methods to
determine sentiment values (positive, neutral, or negative) using FinBert which gives a
score between -1 to 1.
To prepare the text data for further analysis, a standardised pre-processing approach is
necessary. The process involved the following steps:

● Non-trading days were removed to align news dates with stock time.
● The NLTK word segmentation module in Python3 was utilised to perform
spell checks on the news text, remove special symbols, and divide the
news headlines into sets of words.
● Stop words were eliminated from the news to reduce noise, and a
continuous distribution representation was obtained for each news article
using the word co- occurrence matrix and corresponding word vectors.

Modelling: This step requires modelling of two factors chiefly: technical indicators
such as open, close, adj. close that requires the analysis of past and present data to
predict future values using hybrid models and sentiment analysis of news articles
as well as news headlines.

7.1 Sentiment Analysis of News Articles and News Headlines

Some prevalent hybrid models used in sentiment analysis of news articles are:
● Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN):
This hybrid model combines the strengths of CNN and RNN to extract both local and
global features from the news articles. CNN is used to extract local features such as
sentiment words and phrases, while RNN is used to capture the temporal dynamics of
the sentiment expressed in the news article.
● Support Vector Machines (SVM) and Naive Bayes (NB): This hybrid model
combines the strengths of SVM and NB to improve the accuracy of sentiment
analysis. SVM is used to identify the polarity of sentiment expressed in the news
article, while NB is used to classify the sentiment into positive, negative, or neutral
categories.
● Deep Belief Networks (DBN) and Restricted Boltzmann Machines (RBM): This
hybrid model combines the strengths of DBN and RBM to perform unsupervised
feature learning and classification of sentiment. DBN is used to learn the underlying
structure of the news articles, while RBM is used to classify the sentiment expressed
in the news articles.
● Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU): This hybrid
model combines the strengths of LSTM and GRU to improve the accuracy and
efficiency of sentiment analysis. LSTM is used to capture long-term dependencies, a
memory of sentiment expressed in the news articles, while GRU is used to model the
short-term dynamics of sentiment.
● Hybrid Sentiment Analysis Model (HSAM): This model combines the strengths of
different sentiment analysis techniques, such as lexicon-based, machine learning, and
deep learning methods, to provide a comprehensive and accurate analysis of sentiment
expressed in news articles.
7.2 Hybrid Models for Predicting Stock Prices using Historical Data:
In recent years, several hybrid models have been developed, combining deep learning and
machine learning techniques to improve prediction accuracy.

​ 7.2.1 HTPNN (Hierarchical Text-Integrated Prediction Neural Network) is a


deep learning model that integrates news text with basic stock data for stock price
prediction. The proposed model architecture includes a combination of layers,
including a deep convolutional layer, a Long Short-Term Memory (LSTM) layer, a
feature fusion layer, and a fully connected layer. HTPNN takes in the word vectors
matrix obtained from the news text using SAE and combines it with the stock data
to predict whether the stock price will rise or fall.

​ 7.2.2 LSTM-MSVR: LSTM-MSVR is a hybrid model that combines Long


Short-Term Memory (LSTM) and Multi-Stage Vector Regression (MSVR).
Developed in 2020, the model aims to overcome the limitations of traditional
time-series models by incorporating both past and present information to make
accurate predictions. LSTM- MSVR has been used for stock price prediction in the
Chinese stock market.

​ 7.2.3 HMF-LSTM: Hybrid Matrix Factorization (HMF) and LSTM is a model


developed in 2020 for stock price prediction. The HMF-LSTM model combines the
power of HMF, a collaborative filtering technique used for recommender systems,
with LSTM, a popular deep learning model used for time-series forecasting. This
model can handle high-dimensional data and can predict multiple time-series
simultaneously.

​ 7.2.4 TCN-LSTM: Temporal Convolutional Network (TCN) and LSTM is a hybrid


model developed in 2021 for stock price prediction. TCN-LSTM combines the
strengths of both models, using TCN to capture long-term dependencies and LSTM
to handle short-term dependencies in the time-series data. The model has been used
for stock price prediction in the Indian stock market.

​ 7.2.5 GAN-LSTM: Generative Adversarial Networks (GAN) and LSTM is a hybrid


model developed in 2022 for stock price prediction. The GAN-LSTM model uses
GANs to generate synthetic stock price data and LSTM to predict future stock
prices. This model can handle the problem of data scarcity and can make accurate
predictions in data-sparse scenarios. GAN-LSTM has been used for predicting stock
prices in the Korean stock market.
7.3 Evaluation metrics

To analyse the performance of the above hybrid models, three evaluation


metrics will be used:

7.3.1 Mean Absolute Error (MAE):


The Mean Absolute Error (MAE) is a metric used to assess the accuracy of a
forecasting model. It measures the average magnitude of the errors between the
predicted values and the actual values, regardless of their direction. The MAE is
particularly useful for evaluating the precision of constant variables. The MAE is
calculated by taking the average of the absolute differences between the forecasted
values and the corresponding actual values in the verification sample. The MAE
treats all the differences equally as it is a linear score, without any weighting or bias
towards any particular difference.

7.3.2 Root mean squared error (RMSE):


The average magnitude of the error is measured by the RMSE, a quadratic scoring
rule. The difference between the predicted and matching actual values are each
squared and then averaged over the sample, to put the calculation into words. The
average's square root is then calculated. The RMSE lends comparatively significant
weight to large errors since the errors are squared before they are averaged. In
scenarios where minimising significant errors is of utmost importance, the Root Mean
Square Error (RMSE) metric proves to be particularly useful.

By utilising both Mean Absolute Error (MAE) and RMSE metrics, one can gain
insights into the distribution of errors within a set of forecasts. It is important to note
that RMSE is always greater than or equal to MAE, and the difference between the
two metrics indicates the level of variability in the individual errors within the
sample. When RMSE equals MAE, it implies that all errors possess the same
[Link] the RMSE is equal to the MAE, it suggests that all errors have the same
magnitude.

The MAE and RMSE are negatively oriented scores that can range from 0 to ∞ and
are used to evaluate forecasting errors, with lower values indicating better
performance.

7.3.3 Mean Absolute Percentage Error (MAPE):


MAPE is the median or average of the absolute percentage error of forecasts, where
the percentage errors are added regardless of their sign; it is a popular measure in
forecasting as it avoids the issue of positive and negative errors cancelling each other
out and shows mistakes in terms of percentages.
8. Required Resources

​ Processor: AMD Ryzen 7 5000H with Radeon Vega Mobile Gfx 2.10 GHz

​ RAM: 8.00 GB

​ Hard Disk: 20 GB

​ Graphics Card: NVIDIA GeForce RTX 3050 8 GB

​ Operating System: Windows

​ Programming Language: Python

​ Integrated Development Environment: Spyder, Jupyter

Python Libraries: Pandas, Numpy, SciPy, Scikit-Learn, Seaborn, Matplotlib, Pytorch,


TensorFlow, Keras
9. Research Plan

Below is a Gantt chart to illustrate the research plan timeline

Figure 9.1: Gantt Chart


References

Agrawal, M., Shukla, P.K., Nair, R., Nayyar, A. and Masud, M., (2021) Stock prediction based on
technical indicators using deep learning model. Computers, Materials and Continua, 701, pp.287–
304.

Chou, H.-C. and Ramachandran, K.M., (2021) Combining Time Series and Sentiment Analysis for
Stock Market Forecasting. In: Proceedings of the 3rd International Conference on Statistics: Theory
and Applications. Avestia Publishing.

Haque, S., Eberhart, Z., Bansal, A. and McMillan, C., (2022) Semantic Similarity Metrics for Evaluating
Source Code Summarization. In: IEEE International Conference on Program Comprehension. IEEE
Computer Society, pp.36–47.

Heidein, A. and Parpinelli, R.S., (2022) Financial News Effect Analysis on Stock Price Prediction Using
a Stacked LSTM Model. In: Communication Papers of the 17th Conference on Computer Science and
Intelligence Systems. PTI, pp.233–240.

Jabeen, A., Afzal, S., Maqsood, M., Mehmood, I., Yasmin, S., Niaz, M.T. and Nam, Y., (2021) An LSTM
based forecasting for major stock sectors using COVID sentiment. Computers, Materials and
Continua, 671.

Joshi, K., H. N, B. and Rao, J., (2016) Stock Trend Prediction Using News Sentiment Analysis.
International Journal of Computer Science and Information Technology, 83, pp.67–76.

Ju, C. bin and Chen, A.P., (2022) Identifying Financial Market Trend Reversal Behavior with Structures
of Price Activities Based on Deep Learning Methods. IEEE Access, 10, pp.12853–12865.

Li, X., Wu, P. and Wang, W., (2020) Incorporating stock prices and news sentiments for stock market
prediction: A case of Hong Kong. Information Processing and Management, 575.

Lin, H.C., Chen, C., Huang, G.F. and Jafari, A., (2021) Stock Price Prediction using Generative
Adversarial Networks. Journal of Computer Science, 173, pp.188–196.

Maqsood, H., Maqsood, M., Yasmin, S., Mehmood, I., Moon, J. and Rho, S., (2022) Analyzing the
Stock Exchange Markets of EU Nations: A Case Study of Brexit Social Media Sentiment. Systems, 102.

Pahlawan, M.R., Riksakomara, E., Tyasnurita, R., Muklason, A., Mahananto, F. and Vinarti, R.A.,
(2021) Stock price forecast of macro-economic factor using recurrent neural network. IAES
International Journal of Artificial Intelligence, 101, pp.74–83.

Shu, W. and Gao, Q., (2020) Forecasting stock price based on frequency components by emd and
neural networks. IEEE Access, 8, pp.206388–206395.

Wang, Y., Liu, H., Guo, Q., Xie, S. and Zhang, X., (2019) Stock volatility prediction by hybrid neural
network. IEEE Access, 7, pp.154524–154534.

Yasir, M., Afzal, S., Latif, K., Chaudhary, G.M., Malik, N.Y., Shahzad, F. and Song, O.Y., (2020) An
efficient deep learning based model to predict interest rate using twitter sentiment. Sustainability
(Switzerland), 124.

Zhong, X. and Enke, D., (2019) Predicting the daily return direction of the stock market using hybrid
machine learning algorithms. Financial Innovation, 51.

You might also like