System | Zhiyang's Blog

LLM Quantization

8bit-Quantization Implementation for LLama-2-7b Model

Notes on Linear Attention

Notes on topics related to linear attention.

Machine Learning Compilation

Lecture Notes on ML Compilation by Tianqi Chen

Write a Memory Allocator for PyTorch

Write a memory allocator from scratch.

FFT On the Road

Guide to implement FFT in C++ with parallelism