LLM Quantization

8bit-Quantization Implementation for LLama-2-7b Model

February 23, 2025 · 18 min · Zhiyang Shen

Notes on Linear Attention

Notes on topics related to linear attention.

January 18, 2025 · 4 min · Zhiyang Shen

Machine Learning Compilation

Lecture Notes on ML Compilation by Tianqi Chen

January 15, 2025 · 1 min · Zhiyang Shen

Write a Memory Allocator for PyTorch

Write a memory allocator from scratch.

May 16, 2024 · 7 min · Zhiyang Shen

FFT On the Road

Guide to implement FFT in C++ with parallelism

April 20, 2024 · 5 min · Zhiyang Shen