0% found this document useful (0 votes)

6 views28 pages

Autoformer: Long-Term Forecasting Method

The document discusses the attention mechanism in neural networks, particularly its application in language translation and self-attention. It explains how attention allows models to focus on relevant input parts, improving accuracy in tasks like machine translation. Additionally, it introduces concepts like multi-head attention, cross attention, and the Autoformer architecture for long-term series forecasting, highlighting the use of autocorrelation in attention mechanisms.

Uploaded by

Ankit Prakash Ching

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views28 pages

Autoformer: Long-Term Forecasting Method

Uploaded by

Ankit Prakash Ching

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Attention Mechanism

मुझे नहीं पता <end>

c2 Compute context vector:

<start> मुझे नहीं पता

+¿
𝛼 21 𝛼 22
𝛼 23 𝛼 24
h1 h2 h3 h4

h0 h1 h2 h3 h4

The attention mechanism enables models to

selectively focus on relevant parts of the input
sequence while generating each element of the output
sequence, improving accuracy and capturing long-
range dependencies in tasks like machine translation.
i do not know
Language translation with Attention
Mechanism Input: Sequence
Output: Sequence
𝑎1 ,0 𝑎1 ,1 𝑎1 ,2 𝑎1 ,3
Encoder:
where
is MLP
𝑒 1 ,0 𝑒 1 ,1 𝑒 1 ,2 𝑒 1 ,3 मुझे
Decoder:
where context vector c is often
𝑒 𝑒 𝑒 𝑒 𝑑 𝑑
h 𝑜 h 1 h 2 h 3 h 𝑜 h 1

⨂
star
𝐼 𝑑𝑜 𝑛𝑜𝑡 𝑘𝑛𝑜𝑤 𝑐1 t

2
credit: from slide of Vikas sir
Language translation with Attention
Mechanism
𝑎2 ,0 𝑎2 ,1 𝑎2 ,2 𝑎2 ,3

𝑒 2 ,0 𝑒 2 ,1 𝑒 2 ,2 𝑒 2 ,3 मुझे नहीं
𝑒 𝑒 𝑒 𝑒 𝑑 𝑑 𝑑
h 𝑜 h 1 h 2 h 3 h 𝑜 h 1 h 2

⨂
star
𝐼 𝑑𝑜 𝑛𝑜𝑡 𝑘𝑛𝑜𝑤 𝑐2 t मुझे
3
credit: from slide of Vikas sir
Language translation with Attention
Mechanism
𝑎3 , 0 𝑎3 , 1 𝑎3 , 2 𝑎3 , 3

𝑒3 , 0 𝑒 3 ,1 𝑒3 , 2 𝑒3 , 3 मुझे नहीं पता

𝑒 𝑒 𝑒 𝑒 𝑑 𝑑 𝑑 𝑑
h 𝑜 h 1 h 2 h 3 h 𝑜 h 1 h 2 h 3

⨂
star
𝐼 𝑑𝑜 𝑛𝑜𝑡 𝑘𝑛𝑜𝑤 𝑐3 t मुझे नहीं
4
credit: from slide of Vikas sir
Language translation with Attention
Mechanism
𝑎4 ,0 𝑎4 ,1 𝑎4 ,2 𝑎4 ,3

𝑒4 , 0 𝑒4 , 1 𝑒4 , 2 𝑒4 , 3 मुझे नहीं पता end

𝑒 𝑒 𝑒 𝑒 𝑑 𝑑 𝑑 𝑑 𝑑
h 𝑜 h 1 h 2 h 3 h 𝑜 h 1 h 2 h 3 h 4

⨂
star
𝐼 𝑑𝑜 𝑛𝑜𝑡 𝑘𝑛𝑜𝑤 𝑐4 t मुझे नहीं पता
5
credit: from slide of Vikas sir
Word Embedding

After gets trained

on a large corpus of
text data
Problem with Word Embedding

Apple = [taste, technology]

1. An apple a day keeps a doctor away. [0.6,0]
2. Apple is healthy. [0.7,0]
3. Apple is better than orange.[0.8,0]
4. Apple makes great phone.[0.75,0.2]

Technology

Taste
Self attention

𝑦1 𝑦𝑇 Outputs:
• context vector: c (shape: D)
mul + add

𝑣1 𝑎 1 ,1 𝑎 1 ,𝑇 Permutation invariant

Attention
Operations:
𝑣2 𝑎 2 ,1 𝑎 2 ,𝑇 • Key vectors: Problem: how can we encode
⋮ ⋮ ⋮
• Value vectors: ordered sequences like
𝑣𝑇 𝑎 𝑇 ,1 𝑎𝑇 ,𝑇 • Query: language or spatially ordered
• Alignment: image features?
softmax • Attention:
• Output:
𝑥1 𝑘 1 𝑒 1 ,1 𝑒 1 ,𝑇
Input Vectors

Alignment

𝑥 2 𝑘 2 𝑒 2 ,1 𝑒 2 ,𝑇
⋮
Inputs:
⋮ ⋮ ⋮
𝑥 𝑇 𝑘 𝑇 𝑒 𝑇 ,1 𝑒 𝑇 ,𝑇 • Input Vectors: ’s (shape: D

𝑞1 𝑞𝑇
⋮

credit: from slide of Vikas sir

Self attention

How Are You

𝑒 h𝑜𝑤 𝑒 𝑎𝑟𝑒 𝑒 𝑦𝑜𝑢

𝑊𝑞 𝑊𝑘 𝑊𝑣 𝑊𝑞 𝑊𝑘 𝑊𝑣 𝑊𝑞 𝑊𝑘 𝑊𝑣

𝑞 h𝑜𝑤 k v 𝑞 𝑎𝑟𝑒 k k 𝑞 𝑦𝑜𝑢 k k

Self attention

𝑞 h𝑜𝑤 𝑞 h𝑜𝑤 𝑞 h𝑜𝑤 𝑞 𝑎𝑟𝑒 𝑞 𝑎𝑟𝑒 𝑞 𝑎𝑟𝑒 𝑞 𝑦𝑜𝑢 𝑞 𝑦𝑜𝑢 𝑞 𝑦𝑜𝑢

k
k

k
k
𝑆𝑜𝑓𝑡𝑚𝑎𝑥 𝑆𝑜𝑓𝑡𝑚𝑎𝑥 𝑆𝑜𝑓𝑡𝑚𝑎𝑥
𝑊 11 𝑊 12 𝑊 13 𝑊 21 𝑊 22 𝑊 23 𝑊 31 𝑊 32 𝑊 33

v v v v v v v v v

+¿ +¿ +¿
y_how y_are y_you
Image credit: [Link]
Multihead attention
“The man saw the astronomer with a
telescope.”
E_bank
E_money

1 1 1 2 2 2
𝑊 𝑞 𝑊 𝑘 𝑊 𝑣 𝑊 𝑞 𝑊 𝑘 𝑊 𝑣
1 1 1 2 2 2
𝑊 𝑞 𝑊 𝑘 𝑊 𝑣 𝑊 𝑞 𝑊 𝑘 𝑊 𝑣
𝑞 h𝑜 𝑤1 k v 𝑞 h𝑜 𝑤2 k v
𝑞 h𝑜 𝑤1 k v 𝑞 h𝑜 𝑤2 k v
Image credit: [Link]
Add and Norm
Z1-norm Z2-norm Z3-norm

Layer Normalization

Z1 Z2 Z3
• Layer normalization is used in transformer
• Normalization stabilizes the training process
• Residual connection allows the model to learn X1 X2 X3
more effectively without vanishing gradients
+ + +
z1 z2 z3

Residual
connection
Multi-head attention

X1 X2 X3

how are you

Image credit: [Link]
Feed Forward Network

Y1 Y2 Y3

• First layer with 2048 neurons and a ReLU activation

function.
• Second layer with 512 neurons and a linear activation
function. 512 neurons with linear
• ReLU activation in the first layer, which 2048*512 activation function
introduces non-linearities into the model.
• This allows the FFNN to learn more complex patterns
2048 neurons with Relu
than it could with a simple linear transformation. activation function
512*2048

Z1-norm Z2-norm Z3-norm

Image credit: [Link]
Multi-masked attention

• The decoder works in an auto-regressive manner,

meaning it generates each token in a sequence by using
the tokens generated.
• During training, the decoder doesn’t follow the auto-
regressive approach
• If we were to treat the training process as fully auto-
regressive, similar to inference, it would slow down
the entire process.
• instead of predicting each word one by one, we
can parallelize the entire process.
• Prevent vectors from looking at future vectors.
• Manually set alignment scores to -infinity

Image credit: from slide of Vikas sir

Image credit: [Link]
Cross attention

• Cross attention identifies connections between two

sequences
• It generates query vectors from the output
sequence (Hindi), while key and value vectors are
derived from the input sequence (English).
• this process helps the model determine how similar
or related words from the output sequence (Hindi)
are to words from the input sequence (English).

Image credit: [Link]

Autoformer: Decomposition Transformers with
Auto-Correlation for Long-Term Series
Forecasting

• Published in 2021
• Conference on Neural Information Processing Systems (NeurIPS 2021) – A*
• Dataset used – ETT, Electricity, Exchange , Traffic, Weather and ILI(influenza-like illness)
Decomposition Layer

• Autoformer incorporate decomposition into the Transformer architecture

• the encoder and decoder use a decomposition block to aggregate the trend-cyclical part and
extract the seasonal part from the series progressively.
• For an input series with length L, decomposition layer returns defines as:

Image credit: Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting
Attention (Autocorrelation) Mechanism

• Autoformer employs a novel auto-correlation mechanism which replaces the self-attention

• In Autoformer, attention weights are computed in frequency domain (using fast fourier transform)
and aggregates them by time delay.

Image credit: Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting
Frequency Domain Attention

• a time lag 𝜏 , autocorrelation for a single discrete variable 𝑦 is used to

measure the "relationship" between the variable's current value at time 𝑡 to
its past value at time 𝑡−𝜏:

• Using autocorrelation, Autoformer extracts frequency-based dependencies

from the queries and keys, instead of the standard dot-product between
them.
• The theory behind computing autocorrelation using FFT is based on the
Wiener–Khinchin theorem

Image credit: Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting
Time Delay Aggregation
• The autocorrelations (referred to as attn_weights) as
• is aligned by calculating its value for each time delay 1,𝜏2,...𝜏𝑘,
𝑘 which is also known as Rolling.
• Subsequently, we conduct element-wise multiplication between the aligned and the autocorrelations.
• the left side showcasing the rolling of by time delay, while the right side illustrates the element-wise
multiplication with the autocorrelations.

Image credit: Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

Understanding Transformers vs RNNs
No ratings yet
Understanding Transformers vs RNNs
54 pages
Overview of Transformer Models
No ratings yet
Overview of Transformer Models
18 pages
Understanding Transformer Models in NLP
No ratings yet
Understanding Transformer Models in NLP
22 pages
Understanding Transformer Architecture
No ratings yet
Understanding Transformer Architecture
3 pages
Unit-5 2
No ratings yet
Unit-5 2
24 pages
Understanding Transformer Models in AI
No ratings yet
Understanding Transformer Models in AI
36 pages
AI Transformers: 3.1 Encoder
No ratings yet
AI Transformers: 3.1 Encoder
5 pages
Understanding Transformer Architecture
No ratings yet
Understanding Transformer Architecture
127 pages
ML LP Question Bank Transformers
No ratings yet
ML LP Question Bank Transformers
4 pages
Understanding Transformer Architecture
No ratings yet
Understanding Transformer Architecture
7 pages
Understanding Transformer Models in NLP
No ratings yet
Understanding Transformer Models in NLP
5 pages
Overview of Transformer Architecture
No ratings yet
Overview of Transformer Architecture
18 pages
Understanding Transformers in ML
No ratings yet
Understanding Transformers in ML
9 pages
Adapting Transformers for Time-Series Data
No ratings yet
Adapting Transformers for Time-Series Data
20 pages
Understanding Transformer Networks
No ratings yet
Understanding Transformer Networks
53 pages
Transformer
No ratings yet
Transformer
12 pages
And The Attention Mechanism: Transformers
No ratings yet
And The Attention Mechanism: Transformers
9 pages
Transformers
No ratings yet
Transformers
29 pages
Understanding Transformers in NLP
No ratings yet
Understanding Transformers in NLP
35 pages
Introduction to Transformers in DL
No ratings yet
Introduction to Transformers in DL
7 pages
GenAI Module 3
No ratings yet
GenAI Module 3
24 pages
Attention Models and Transformers Overview
No ratings yet
Attention Models and Transformers Overview
40 pages
Transformer Architecture 1771950057
No ratings yet
Transformer Architecture 1771950057
42 pages
Lecture 6 PDF
No ratings yet
Lecture 6 PDF
22 pages
Transformer Model in AI Revolution
No ratings yet
Transformer Model in AI Revolution
6 pages
Understanding Transformer Architecture
No ratings yet
Understanding Transformer Architecture
4 pages
Understanding Transformer Architecture
No ratings yet
Understanding Transformer Architecture
4 pages
Transformers for Biological Signal Processing
No ratings yet
Transformers for Biological Signal Processing
6 pages
Transformers: Key to Generative AI
No ratings yet
Transformers: Key to Generative AI
4 pages
Autoformer: Long-Term Time Series Forecasting
No ratings yet
Autoformer: Long-Term Time Series Forecasting
12 pages
l08.1 Post Class
No ratings yet
l08.1 Post Class
57 pages
Transformer Architecture Overview
No ratings yet
Transformer Architecture Overview
32 pages
2026 02 06 Transformers
No ratings yet
2026 02 06 Transformers
67 pages
CV Lec5
No ratings yet
CV Lec5
61 pages
Lect 7
No ratings yet
Lect 7
68 pages
Understanding Transformers and Self-Attention
No ratings yet
Understanding Transformers and Self-Attention
19 pages
Transformers
No ratings yet
Transformers
7 pages
Autoformer: Long-Term Time Series Forecasting
No ratings yet
Autoformer: Long-Term Time Series Forecasting
20 pages
Advantages of Transformer Models
No ratings yet
Advantages of Transformer Models
19 pages
Understanding Transformer Models in NLP
No ratings yet
Understanding Transformer Models in NLP
49 pages
Understanding Transformers in NLP
No ratings yet
Understanding Transformers in NLP
62 pages
An Introduction To Transformers
No ratings yet
An Introduction To Transformers
10 pages
In-Depth Guide to Transformer Architecture
No ratings yet
In-Depth Guide to Transformer Architecture
4 pages
Generative Models For Text
No ratings yet
Generative Models For Text
25 pages
Encoder-Decoder, Attention, Transformers
No ratings yet
Encoder-Decoder, Attention, Transformers
54 pages
2025 Transformer
No ratings yet
2025 Transformer
56 pages
Research Paper
No ratings yet
Research Paper
3 pages
Research On Transformer and Attention in Applied A
No ratings yet
Research On Transformer and Attention in Applied A
8 pages
Understanding Transformers in NLP
No ratings yet
Understanding Transformers in NLP
48 pages
Transformer Architectures Explained With Examples
No ratings yet
Transformer Architectures Explained With Examples
16 pages
Overview of Transformer Architecture
No ratings yet
Overview of Transformer Architecture
2 pages
Understanding the Transformer Architecture
No ratings yet
Understanding the Transformer Architecture
38 pages
Transformers, BERT, and GPT Tutorial
No ratings yet
Transformers, BERT, and GPT Tutorial
14 pages
Generative Models For Text
No ratings yet
Generative Models For Text
37 pages
Overview of Transformer Architecture
No ratings yet
Overview of Transformer Architecture
5 pages
Deep Learning Framework: RNN & LSTM Insights
No ratings yet
Deep Learning Framework: RNN & LSTM Insights
19 pages
Understanding Time Series Components and Models
No ratings yet
Understanding Time Series Components and Models
19 pages
Tokenization Techniques for Time-Series Transformers
No ratings yet
Tokenization Techniques for Time-Series Transformers
15 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
22 pages
Pinball Loss in Sunspot Prediction Model
No ratings yet
Pinball Loss in Sunspot Prediction Model
16 pages
Invalid Candidates for Entrance Exam 2025
No ratings yet
Invalid Candidates for Entrance Exam 2025
29 pages
IoT Lab Practical Manual for CSE
100% (1)
IoT Lab Practical Manual for CSE
36 pages
Deflection Analysis of Structural Beams
No ratings yet
Deflection Analysis of Structural Beams
51 pages
IOTC Objections Review 2024
No ratings yet
IOTC Objections Review 2024
3 pages
Student Bus Route Details 2022
No ratings yet
Student Bus Route Details 2022
2 pages
Year 6 English Diagnostic Test
No ratings yet
Year 6 English Diagnostic Test
6 pages
Affidavit of Loss Template Philippines
No ratings yet
Affidavit of Loss Template Philippines
3 pages
Sample Loan Agreement Template
No ratings yet
Sample Loan Agreement Template
2 pages
Expanded Perlite Safety Data Sheet
No ratings yet
Expanded Perlite Safety Data Sheet
5 pages
Bash Scripting Basics and Examples
No ratings yet
Bash Scripting Basics and Examples
6 pages
Bearing Timken Automotive.
No ratings yet
Bearing Timken Automotive.
27 pages
FEAP Finite Element Analysis Manual
100% (1)
FEAP Finite Element Analysis Manual
551 pages
PetroKnowledge 2024 Training Calendar
No ratings yet
PetroKnowledge 2024 Training Calendar
5 pages
Designing and Managing Services
No ratings yet
Designing and Managing Services
28 pages
Nephrology Case Reports Editorial 2024
No ratings yet
Nephrology Case Reports Editorial 2024
3 pages
The Publishing Ministry
No ratings yet
The Publishing Ministry
351 pages
Strathclyde Pegasus Online Registration Guide
No ratings yet
Strathclyde Pegasus Online Registration Guide
10 pages
Reflection on "Hacker" Documentary
No ratings yet
Reflection on "Hacker" Documentary
3 pages
Sliding Door Design Specifications
No ratings yet
Sliding Door Design Specifications
2 pages
Weekly Cetaphil Skincare Routine
No ratings yet
Weekly Cetaphil Skincare Routine
1 page
Nursing Case Study: Hemorrhagic Stroke
75% (4)
Nursing Case Study: Hemorrhagic Stroke
102 pages
Tirumala Tirupati Credit Society List
No ratings yet
Tirumala Tirupati Credit Society List
18 pages
Postnatal Exercise Class Overview
100% (1)
Postnatal Exercise Class Overview
13 pages
GPSMapEdit Guide for Garmin Users
No ratings yet
GPSMapEdit Guide for Garmin Users
33 pages
Machines Event Overview 2025-2026
No ratings yet
Machines Event Overview 2025-2026
1 page
Propositions in Logical Reasoning
No ratings yet
Propositions in Logical Reasoning
29 pages
Peerless 8AEF20G Fire Pump Specs
No ratings yet
Peerless 8AEF20G Fire Pump Specs
6 pages
Sustainable Tourism Plan for Narra, Palawan
No ratings yet
Sustainable Tourism Plan for Narra, Palawan
104 pages
DOLc1 Futures Price and Quant Strategies
No ratings yet
DOLc1 Futures Price and Quant Strategies
43 pages
Cargo Manifest for TSS Pearl Voyage
No ratings yet
Cargo Manifest for TSS Pearl Voyage
1 page
Executives' Challenge XIX Registration Form
No ratings yet
Executives' Challenge XIX Registration Form
1 page
Fatigue Curve For SCM440
No ratings yet
Fatigue Curve For SCM440
6 pages

Autoformer: Long-Term Forecasting Method

Uploaded by

Autoformer: Long-Term Forecasting Method

Uploaded by

Attention Mechanism

मुझे नहीं पता <end>

c2 Compute context vector:

<start> मुझे नहीं पता

The attention mechanism enables models to

𝑒3 , 0 𝑒 3 ,1 𝑒3 , 2 𝑒3 , 3 मुझे नहीं पता

𝑒4 , 0 𝑒4 , 1 𝑒4 , 2 𝑒4 , 3 मुझे नहीं पता end

After gets trained

Apple = [taste, technology]

credit: from slide of Vikas sir

How Are You

𝑞 h𝑜𝑤 k v 𝑞 𝑎𝑟𝑒 k k 𝑞 𝑦𝑜𝑢 k k

𝑞 h𝑜𝑤 𝑞 h𝑜𝑤 𝑞 h𝑜𝑤 𝑞 𝑎𝑟𝑒 𝑞 𝑎𝑟𝑒 𝑞 𝑎𝑟𝑒 𝑞 𝑦𝑜𝑢 𝑞 𝑦𝑜𝑢 𝑞 𝑦𝑜𝑢

how are you

• First layer with 2048 neurons and a ReLU activation

Z1-norm Z2-norm Z3-norm

• The decoder works in an auto-regressive manner,

Image credit: from slide of Vikas sir

• Cross attention identifies connections between two

Image credit: [Link]

• Autoformer incorporate decomposition into the Transformer architecture

• Autoformer employs a novel auto-correlation mechanism which replaces the self-attention

• a time lag 𝜏 , autocorrelation for a single discrete variable 𝑦 is used to

• Using autocorrelation, Autoformer extracts frequency-based dependencies

You might also like