← All companies

Exla
ActiveAn SDK to run transformer models anywhere
W25·Winter 2025·B2B·San Francisco, CA, USA·Team of 2·Founded 2025
About
Exla aggressively quantizes AI models to minimize memory usage and maximize inference speed. Whether you're deploying LLMs, VLMs, VLAs, or custom models, Exla reduces memory footprint by up to 80% and accelerates inference by 3–20x - all with just a few lines of code. https://cal.com/exla-ai/schedule
Founders
Pranav Nair· Co-FounderCTO at Exla. Previously an OS engineer at Apple leading sleep/hibernation for all Apple devices. B.S. Computer Science from Purdue.

Product launches · 1 launch
Optimize models to run on edge devices (e.g. Jetsons) with 3-20x faster inference and 80% less memory requirements