Stock Market AI Prediction
Stock Market AI Prediction
Madhur Gupta
Enrollment: 22131010480
Admission: 22SCSF1010376
Ramashankar Yadav
Enrollment: 22131010464
Admission: 225CSE1010355
1
AI for Stock Market Prediction 2
Abstract
Stock market prediction is a complex and challenging task due to the dynamic interplay
of quantitative trends, investor behavior, and qualitative factors such as news and social
media sentiment [1]. Traditional forecasting methods often rely solely on historical price
and volume data, which may not fully capture market sentiment or sudden shifts caused
by external events [2].
This project proposes a hybrid AI-based approach that integrates sentiment analysis
of financial news and social media with technical indicators to improve the accuracy of
stock price prediction. Using Natural Language Processing (NLP) models like BERT
and RoBERTa, textual data is analyzed to derive sentiment scores reflecting investor
optimism, pessimism, or neutrality [3]. These sentiment scores are then combined with
technical indicators such as Moving Averages (SMA/EMA), Relative Strength Index
(RSI), and MACD to construct a comprehensive feature set [4].
The integrated dataset is used to train machine learning and deep learning mod-
els, including Long Short-Term Memory (LSTM), Recurrent Neural Networks (RNN),
and Convolutional Neural Networks (CNN), enabling the prediction of short-term stock
price movements and trends [1, 5]. Experimental results on real-world financial datasets
demonstrate that incorporating sentiment analysis significantly enhances prediction ac-
curacy compared to models based solely on historical price data [1].
The study highlights the effectiveness of AI-driven, sentiment-informed forecasting
and its potential to support investors, traders, and financial analysts in making more
informed and timely decisions [2, 3]. This approach bridges the gap between qualitative
market insights and quantitative analysis, providing a more holistic framework for stock
market prediction [1].
Keywords: Artificial Intelligence, Machine Learning, Sentiment Analysis, Technical In-
dicators, Stock Price Prediction, Natural Language Processing, Deep Learning, LSTM,
Investment Decisions.
Contents
1 Introduction 3
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Research Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Literature Review 6
AI for Stock Market Prediction 3
3 Research Methodology 8
3.1 Research Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.1 Financial Market Data . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.2 Textual and Sentiment Data . . . . . . . . . . . . . . . . . . . . . 8
3.3 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Model Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5 Evaluation Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1. Introduction
1.1 Background
The stock market represents one of the most dynamic and complex financial systems in
the modern economy [1]. Stock price movements are influenced by a multitude of factors
including company fundamentals, macroeconomic indicators, geopolitical events, and in-
vestor sentiment [1, 4]. Accurate stock price prediction has been a longstanding challenge
for investors, traders, and financial analysts worldwide, as even marginal improvements
in predictive accuracy can translate into significant financial gains [5]. The ability to
forecast price movements with reasonable accuracy can significantly improve investment
returns and minimize risks [2].
Traditional approaches to stock market analysis have relied primarily on two method-
ologies: technical analysis and fundamental analysis [1]. Technical analysis focuses on
historical price patterns, trading volumes, and various technical indicators derived from
AI for Stock Market Prediction 4
market data [6]. Fundamental analysis, on the other hand, examines company finan-
cial statements, earnings reports, and economic factors to determine intrinsic value [7].
However, both approaches have significant limitations. Technical analysis, while easy to
implement, often produces unreliable predictions due to its simplicity and susceptibil-
ity to pattern overfitting [1]. Fundamental analysis requires extensive manual research
and can be time-consuming, often missing the real-time market sentiment that drives
short-term price movements [2].
With the advent of artificial intelligence and machine learning, researchers have de-
veloped increasingly sophisticated approaches to stock market prediction [1, 5]. Recent
advances in deep learning, particularly Long Short-Term Memory (LSTM) networks,
have demonstrated superior performance compared to traditional statistical and machine
learning methods on financial time-series forecasting tasks [8, 9]. Furthermore, the in-
tegration of natural language processing (NLP) techniques to extract sentiment from
financial news and social media has opened new avenues for capturing qualitative market
information that has traditionally been ignored in quantitative models [3, 10].
a) Sentiment volatility: Market sentiment can shift rapidly based on news events,
company announcements, or macroeconomic developments, but traditional models
fail to capture these qualitative factors [1, 10].
b) Non-linear relationships: Stock price movements exhibit non-linear behavior
that simple statistical models cannot adequately capture [5].
c) Data heterogeneity: Integration of multiple data sources (price data, techni-
cal indicators, news sentiment, social media) requires sophisticated modeling ap-
proaches [1, 2].
d) Overfitting and generalization: Models trained on historical data often fail to
generalize to new market conditions due to changing market dynamics and regime
changes [1, 9].
e) Real-time adaptation: Market conditions change rapidly, requiring models that
can adapt to evolving market structures and behaviors [17].
AI for Stock Market Prediction 5
There is therefore a critical need for a predictive model that combines quantitative
technical indicators with qualitative sentiment analysis to provide more accurate fore-
casts [1]. This project addresses this problem by developing an AI-based framework that
integrates sentiment analysis with technical indicators for improved stock price predic-
tion [2, 10].
• Short-term price prediction: The model aims to predict stock price movements
in the short term (1–5 days ahead), where sentiment and technical indicators are
most impactful [1, 2].
• Target markets: The study includes major indices and stocks from established
exchanges such as the National Stock Exchange of India (NSE), the Bombay Stock
Exchange (BSE), and selected international stocks [8].
• AI methodologies: The research emphasizes machine learning and deep learning
techniques, particularly LSTM, RNN, CNN, and ensemble methods [5, 9].
• Sentiment data: The study incorporates both financial news sentiment and social
media sentiment through NLP models like BERT and RoBERTa [3, 10].
• Technical indicators: Standard technical indicators including SMA, EMA, RSI,
MACD, Bollinger Bands, and momentum indicators are integrated into the predic-
tive framework [4, 6].
AI for Stock Market Prediction 6
2. Literature Review
Technical Analysis Era (1960s–1990s): Traders used historical price and volume data
to construct indicators and chart patterns to inform trading decisions [6].
Algorithmic Trading Era (1990s–2000s): Increased computing power enabled au-
tomation of trading strategies with rule-based systems [17].
Machine Learning Era (2000s–2010s): Supervised learning algorithms such as SVM,
Decision Trees, and Random Forests were applied to predict price direction [7, 11].
Deep Learning Era (2010s–Present): Deep neural networks, particularly LSTM and
CNN, have been used to model complex temporal dependencies [8, 9].
Sentiment Analysis Era (2015–Present): Integration of textual data from financial
news and social media through NLP techniques has enabled models to capture
qualitative market sentiment [3, 10].
Support Vector Machines (SVM) have been widely used for stock market classification
tasks. When applied to stock prediction, SVM models can classify future price movements
as upward or downward, often achieving accuracy levels of 75–85% [1, 7].
Random Forest, an ensemble of decision trees, can model non-linear interactions be-
tween features while providing feature importance measures. Studies have shown Ran-
dom Forest models attaining 85–90% accuracy on daily or weekly stock direction predic-
tion [6, 11].
Gradient boosting methods, particularly XGBoost and LightGBM, have also been
employed. These methods iteratively improve model performance by focusing on mis-
AI for Stock Market Prediction 7
classified instances [5]. Logistic regression remains useful as a baseline model due to
its interpretability and capacity to highlight the predictive contribution of individual
features [12].
Artificial Neural Networks (ANN) and Deep Neural Networks (DNN) have been used to
learn non-linear relationships in financial data. When combining multiple feature types,
DNNs have demonstrated improved predictive performance [9, 16].
Long Short-Term Memory (LSTM) networks are particularly well-suited for stock
prediction because they can capture long-range dependencies in sequential data [8, 13].
Empirical studies consistently show that LSTM-based models outperform linear regres-
sion with RMSE reductions of 20–40% [8, 14].
Recurrent Neural Networks (RNN) and Gated Recurrent Units (GRU) have been
applied to both price sequences and textual data. While standard RNNs suffer from
vanishing gradients, LSTM and GRU architectures mitigate this issue [9, 10].
Convolutional Neural Networks (CNN) have been adapted to 1D financial time-series
and textual embeddings, proving effective in extracting local patterns [15, 16]. Ensemble
architectures combining CNN and LSTM have achieved state-of-the-art performance [9,
16].
Investor sentiment has long been recognized as a key driver of short-term price move-
ments [3]. Positive news is often associated with increased buying pressure and price
appreciation, while negative news can trigger sell-offs [10]. Traditional quantitative mod-
els typically ignore this qualitative information [3].
Sentiment analysis offers a way to systematically quantify textual information from fi-
nancial news and social media. Studies have shown that sentiment indicators can improve
predictive models by providing early signals of changing market expectations [10, 17].
Modern NLP models such as BERT and RoBERTa have dramatically improved sentiment
classification performance by leveraging transformer architectures [3]. When fine-tuned
on domain-specific financial corpora, these models achieve high accuracy in classifying
news and posts [10].
Aspect-based sentiment analysis further refines this approach by identifying sentiment
toward specific aspects, enabling more granular understanding [3]. Real-time aggrega-
AI for Stock Market Prediction 8
tion of sentiment scores can produce sentiment indices that correlate meaningfully with
subsequent returns [17].
3. Research Methodology
1. Collection and preprocessing of historical stock price data, technical indicators, and
textual data.
2. Construction of integrated feature sets combining numerical and sentiment vari-
ables.
3. Development and training of baseline, machine learning, and deep learning models.
4. Backtesting of prediction-driven trading strategies on historical data.
5. Statistical evaluation using regression, classification, and profitability metrics.
Historical stock prices (OHLCV) are obtained from Yahoo Finance, NSE/BSE feeds, and
Kaggle datasets, covering 5–10 years [8, 9]. Technical indicators including SMA, EMA,
RSI, MACD, Bollinger Bands are computed using TA-Lib and pandas-ta [4, 6].
Textual data is collected from financial news providers and social platforms, filtered for
relevance to selected stocks [3, 10]. Sentiment scores are generated using fine-tuned BERT
or RoBERTa models trained on financial sentiment corpora [3].
AI for Stock Market Prediction 9
• Models trained on historical data may not capture unprecedented market condi-
tions.
• Transaction costs and market impact can reduce profitability in real trading.
• Market efficiency variations across different stocks and time periods.
• Regulatory constraints on algorithmic trading and market manipulation.
• Limited access to real-time social media data due to API restrictions.
• Sentiment models trained primarily on English-language content.
References
[1] Prasad, A., & Seetharaman, A. (2021). Importance of machine learning in making
investment decision in stock market. Vikalpa: The Journal for Decision Makers,
46(4), 209–222.
[2] Chopra, R., & Sharma, G. (2021). Application of artificial intelligence in stock
market forecasting: A critique, review, and research agenda. Journal of Risk and
Financial Management, 14(11), 134.
[3] Goel, J., Nihalani, S., Bhagwat, S., & Agarwal, V. (2020). Artificial intelligence
in stock market: Concepts, applications and limitations. International Journal of
Advances in Engineering and Management, 2(9), 578–583.
AI for Stock Market Prediction 11
[4] Nair, S., & Malik, G. (2020). A study on application of artificial intelligence in
stock market prediction. International Journal of Creative Research Thoughts, 8(6),
2320–2882.
[5] Patel, A., Patel, D., & Yadav, S. (2021). Prediction of stock market using artificial
intelligence. SSRN Working Paper.
[6] Agrawal, M., Khan, A. U., & Shukla, P. K. (2019). Stock price prediction using
technical indicators: A predictive model using optimal deep learning. International
Journal of Recent Technology and Engineering, 8(2), 2297–2305.
[7] Dingli, A., & Fournier, K. S. (2017). Financial time series forecasting: A machine
learning approach. Machine Learning and Applications: An International Journal,
4(1–2), 1–14.
[8] Hiransha, M., Gopalakrishnan, E. A., Menon, V. K., & Soman, K. P. (2018). NSE
stock market prediction using deep-learning models. Procedia Computer Science,
132, 1351–1362.
[9] Zhong, X., & Enke, D. (2019). Predicting the daily return direction of the stock
market using hybrid machine learning algorithms. Financial Innovation, 5(1), 4.
[10] Hu, Z., Liu, W., Bian, J., Liu, X., & Liu, T.-Y. (2018). Listening to chaotic whispers:
A deep learning framework for news-oriented stock trend prediction. In Proceedings
of the Eleventh ACM International Conference on Web Search and Data Mining (pp.
261–269).
[11] Khaidem, L., Saha, S., & Dey, S. R. (2016). Predicting the direction of stock market
prices using random forest. arXiv preprint arXiv:1605.00003.
[13] Lai, C. Y., Chen, R. C., & Caraka, R. E. (2019). Prediction stock price based on
different index factors using LSTM. In 2019 International Conference on Machine
Learning and Cybernetics (ICMLC) (pp. 1–6).
[14] Roondiwala, M., Patel, H., & Varma, S. (2017). Predicting stock prices using LSTM.
International Journal of Science and Research, 6(4), 1754–1756.
[15] Vargas, M. R., de Lima, B. S., & Evsukoff, A. G. (2017). Deep learning for stock
market prediction from financial news articles. In 2017 IEEE International Con-
ference on Computational Intelligence and Virtual Environments for Measurement
Systems and Applications (CIVEMSA) (pp. 60–65).
AI for Stock Market Prediction 12
[16] Vargas, M. R., dos Anjos, C. E., Bichara, G. L., & Evsukoff, A. G. (2018). Deep
learning for stock market prediction using technical indicators and financial news
articles. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp.
1–8).
[17] Weng, B., Ahmed, M. A., & Megahed, F. M. (2017). Stock market one-day ahead
movement prediction using disparate data sources. Expert Systems with Applications,
79, 153–163.