Stocks Market
Forecasting
Gabriele Carrivale - 872488
Martino Pettinari - 866496
Stefano Madona - 874799
Overview
01 04
Introduction Results and Evaluation
02
05
Datasets
Discussion
03
06
The Methodological
Conclusions
Approach
Introduction
Mission
Focus Financial Market
Goal Evaluate deep learning and
machine learning models for
stock price prediction
Datasets
Data Preprocess
Data preprocess:
1. Data collection: 250 stocks from S&P 500 and 250 from
STOXX 600.
2. Data transformation: The dataset contains only 7 columns,
for each row there are the metrics and the company’s
ticker.
3. Merging and cleaning:
Merged datasets and cleaned from null values.
Keep the columns: Close, Date and Volume, and set the
date as the index.
Datasets Example
Data Exploration
STOXX600 S&P500
Correlation between individual stocks and their own market
Data Sequence
Sequences for classification and regression tasks.
For the classification task, three methods were applied:
1. Mean Value Approach
2. Maximum Value Approach
3. Initial Value Approach
The Methodological
Approach
Train Approach
Strategies to train models:
1. Index-Specific Models: Model for each major index.
2. Generalized Models for Individual Stocks: Model
with all data.
3. General Model with Fine-Tuning.
Prediction Paradigm
Classification-Based Prediction: Classification task,
determining whether the stock price will increase or
decrease on the next trading day.
Regression-Based Prediction: Estimate the stock price for
the following trading day within a selected period.
Models
Long Short-Term Memory (LSTM) and bidirectional LSTM
(BiLSTM): Capture long-term temporal dependencies and
patterns.
[1]
CNN-BiLSTM: Combines CNNs and BiLSTMs. CNN extracts
key features, while BiLSTM processes sequential patterns.
Random Forest Regressor and XGBoost Regressor:
Traditional machine learning models optimized for non-linear
relation-ships and efficient prediction.
LSTM fine-tune
Goal: improve accuracy on data
KerasTuner framework
Optimal configuration
First layer: 90–100 neurons and 0.2 dropout rate.
Second layer: 30–40 neurons and 0.5 dropout rate.
Made on the S$P500 index data
Challenges
Key challenges:
Data retrieval and cleaning: Handling invalid tickers and
API restrictions for sentiment data.
Noisy and incomplete data: Addressed through
preprocessing techniques.
Computational constraints: Required limiting dataset size
and sequence length.
Result and
Evaluation
Classification
Classification result on entire dataset
Regression
Performance metrics for the entire Datasets
Regression for Index
Performance metrics for S&P500 Datasets
Performance metrics for STOXX600 Datasets
Predictions
Comparison between the best and worse prediction using the entire dataset for
training the models.
Confidence Interval
[2]
Regression results predicting jointly mean and log variance
Discussion
Future works
Analyze sentiment related to S&P 500
companies.
VADER to generate sentiment scores and
merging these with stock price data.
Future research could investigate advanced
sentiment analysis models such as BERT or
FinBERT.
Conclusion
Recap
Evaluation classification and regression models.
Maximum Value-Based Classification: highest accuracy of 75%.
XGBoost outperformed other models,
Training on the full dataset captured broad market trends
BiLSTM-based confidence interval captured market volatility.
Sentiment analysis using Twitter data demonstrated potentiality for future
works.
Thank You!