Playground

Local LLM in your browser

Pick a model, hit load, and chat. Everything runs on your GPU via WebGPU. No API key. No server. Nothing leaves your tab.

WebGPUWebLLMZero serversOpen source

Load Phi-3.5 Mini

Model weights (~2.2 GB) download once and are cached in your browser.

Requires a WebGPU-capable browser (Chrome / Edge on desktop)

How it works

Model weights are fetched from Hugging Face, stored in your browser's Cache API, and executed via WebLLM — a WebGPU-native inference engine that runs the same MLC-compiled models as the desktop mlc-llm runtime.

Chrome Prompt API

If you're on Chrome 126+ with the Gemini Nano flag enabled, the Chrome Prompt API option appears automatically. It uses the model already on your device — zero download, near-instant start.

Privacy

Your messages never leave the browser tab. Once the weights are cached (typically 1–4 GB depending on model), the playground works completely offline. No telemetry. No account required.

Checking browser support