Mechanistic Interpretability - Some concepts

Here are some quick notes on concepts in Mechanistic Interpretability. The subject is vast and very recent and try to interpret features for neural networks, specifically transformers and LLMs.

This was based on my studies and mainly on this blog post Link to blog. On the future when I advanced my studies on this subject I will update this post.