On this article, we’ll embark on a playful journey by way of the world of transformers, unraveling the complexities of their structure utilizing the Einstein notation.
Introduction:
Transformer fashions have revolutionized the sphere of pure language processing (and past), reaching state-of-the-art outcomes on quite a lot of duties. They’ve spectacular efficiency however the underlying mathematical operations may be complicated and tough to know — particularly with out breaking down the person layers. On this article, I suggest utilizing the Einstein notation to precise the mathematical operations inside a transformer mannequin.
Be aware that the Einstein notation is generally utilized in Physics and Arithmetic comparable to in Common Relativity, Electromagnetism, Quantum and Fluid Mechanics but additionally in Linear Algebra to signify matrix operations in a extra compact type.
The aim is to put in writing the mathematical operations of each layer in a concise and chic manner. By leveraging implicit summation over repeated indices, Einstein notation can simplify the illustration of tensor operations, making it (probably) simpler to grasp and due to this fact implement the person layers of the transformer fashions…