Hi there! I’m Jesse Cai, an ML engineer focused on making deep learning models faster and more efficient.


Most recently, I worked on PyTorch model performance at Meta Superintelligence Labs. Most of my work was on accelerating training and inference with sparsity and quantization — you can read more about that here and here.

Before that, I spent two years at a Series A startup called Cultivate, which was eventually acquired by Perceptyx. Most of my work there was making BERT do cool things with very little labeled data — the kind of stuff you’d reach for prompt engineering for today. I wrote a five-part series about it here.

Prior to that, I was a research intern for Professor Kai-Wei Chang at the UCLA NLP lab while finishing my B.S. in Computer Science. I worked on sentence embeddings and spent a lot of time trying to replicate QuickThoughts.

I also spent some time as a machine learning engineer at Blend, a mortgage fintech, where I used RNNs to predict user behavior.


Now

[What I’m currently reading, working on, or thinking about — updated occasionally.]


Publications & Talks

Papers

Blog Posts

Talks


Projects

  • TorchAO — PyTorch-native quantization and sparsity library, covering FP8, INT4, INT8, and 2:4 sparsity from training to serving.

GitHub · LinkedIn · Google Scholar · jcjessecai@gmail.com