Hi there! I’m Jesse Cai, an ML engineer focused on making deep learning models faster and more efficient.
Most recently, I worked on PyTorch model performance at Meta Superintelligence Labs. Most of my work was on accelerating training and inference with sparsity and quantization — you can read more about that here and here.
Before that, I spent two years at a Series A startup called Cultivate, which was eventually acquired by Perceptyx. Most of my work there was making BERT do cool things with very little labeled data — the kind of stuff you’d reach for prompt engineering for today. I wrote a five-part series about it here.
Prior to that, I was a research intern for Professor Kai-Wei Chang at the UCLA NLP lab while finishing my B.S. in Computer Science. I worked on sentence embeddings and spent a lot of time trying to replicate QuickThoughts.
I also spent some time as a machine learning engineer at Blend, a mortgage fintech, where I used RNNs to predict user behavior.
Now
[What I’m currently reading, working on, or thinking about — updated occasionally.]
Publications & Talks
Papers
- To 2:4 Sparsity and Beyond: Neuron-level Activation Function to Accelerate LLM Pre-Training
- Accelerating Transformer Inference and Training with 2:4 Activation Sparsity
- TorchAO: PyTorch-Native Training-to-Serving Model Optimization — ICML 2025 CODEML Workshop
Blog Posts
- When Quantization Isn’t Enough: Why 2:4 Sparsity Matters
- Accelerating Neural Network Training with Semi-Structured (2:4) Sparsity
- Speeding up ViTs using Block Sparsity
- (beta) Accelerating BERT with semi-structured (2:4) sparsity
Talks
Projects
- TorchAO — PyTorch-native quantization and sparsity library, covering FP8, INT4, INT8, and 2:4 sparsity from training to serving.
GitHub · LinkedIn · Google Scholar · jcjessecai@gmail.com