In this third video of our Transformer series, we’re diving deep into the concept of Linear Transformations in Self Attention. Linear Transformation is fundamental in Self Attention Mechanism, shaping how inputs are mapped to key, query, and value vectors.
In this lesson, we’ll explore the role of linear transformation, breaking down the math behind them to see why they’re essential for capturing dependencies in Self Attention. We’ll go through detailed mathematical proofs to show how Linear Transformation work and why it is crucial for capturing relevant similarities and generate an appropriate word representation that is based on training of the model, in Self Attention Mechanism.
If you’re ready to master the theory behind Transformers & Self Attention, hit play, and let’s get started! Don’t forget to like, subscribe, and share if you find this valuable.
Timestamps:
0:00 Intro
1:31 Recap of Self Attention
9:33 Without Learnable Parameters
14:01 Linear Transformation
15:44 Changing Dimensions
16:34 Feature Extraction with Linear Transformation
18:00 Math of Linear Transformation in Self Attention
22:33 Math of capturing dependencies
25:12 Training the parameters
26:50 Number of parameters
28:37 Outro
Follow my entire Transformers playlist :
Transformers Playlist: https://www.youtube.com/watch?v=lRylkiFdUdk&list=PLuhqtP7jdD8CQTxwVsuiFYGvHtFpNhlR3&index=1&t=0s
RNN Playlist: https://www.youtube.com/watch?v=lWPkNkShNbo&list=PLuhqtP7jdD8ARBnzj8SZwNFhwWT89fAFr&index=1&t=0s
CNN Playlist: https://www.youtube.com/watch?v=E5Z7FQp7AQQ&list=PLuhqtP7jdD8CD6rOWy20INGM44kULvrHu&t=0s
Complete Neural Network: https://www.youtube.com/watch?v=mlk0rddP3L4&list=PLuhqtP7jdD8CftMk831qdE8BlIteSaNzD&t=0s
Complete Logistic Regression Playlist: https://www.youtube.com/watch?v=U1omz0B9FTw&list=PLuhqtP7jdD8Chy7QIo5U0zzKP8-emLdny&t=0s
Complete Linear Regression Playlist: https://www.youtube.com/watch?v=nwD5U2WxTdk&list=PLuhqtP7jdD8AFocJuxC6_Zz0HepAWL9cF&t=0s