§ 01 · Writing
Thoughts,
written down.
Ideas about math, computer science, life, and everything in between.
The Intuition Behind Self-Attention
Attention has many distinct advantages over its RNN predecessor. This article focuses on the intuition behind attention, so you can understand why it's so powerful and widely used.
The Geometry Behind L1/L2 Regularization
L1 prefers sparse weights while L2 prefers small weights. We'll explore why this is, and how circles and squares help answer this question.
PCA is the Answer to a Constrained Optimization
Eigenvectors of the covariance matrix aren't a coincidence. They fall out of maximizing variance under a unit-norm constraint.
Every Gradient in Your Neural Network Is Just the Chain Rule
Hand-compute every gradient in a neural network. By the end, you'll know why we perform backpropagation to train a neural network.
Eigenvectors: The Unifying Language Behind Matrix Decomposition
We'll discuss what an eigenvector is and then relate it to three common forms of matrix decomposition, showing how each form builds upon the previous.
Hello, World
First post! Why I'm starting this blog, and what to expect.