I built the world's first Chrome extension that runs LLMs entirely in-browser—WebGPU, Transformers.js, and Chrome's Prompt API

February 11, 2026
I built the world's first Chrome extension that runs LLMs entirely in-browser—WebGPU, Transformers.js, and Chrome's Prompt API

Here's something that caught my attention — someone built the world's first Chrome extension that runs large language models directly in your browser. And get this — no servers, no subscriptions, just local inference. According to /u/psgganesh, it runs models like Llama 3.2, Qwen3, and Mistral right in Chrome using WebGPU, Transformers.js, and Chrome's Prompt API. The best part? All models are cached offline in IndexedDB, so it works even without internet. You can do quick drafts, summaries, or code help without worrying about API costs or privacy breaches. Now, here's where it gets interesting — this isn’t about replacing GPT-4, but for everyday tasks, a 3-billion-parameter model running locally is more than enough. As /u/psgganesh points out, it’s perfect for organizations with strict data restrictions or those who want complete privacy. So what does this actually mean? It’s a game-changer for anyone who wants fast, private AI — right in their browser, anytime.

There are plenty of WebGPU demos out there, but I wanted to ship something people could actually use day-to-day.

It runs Llama 3.2, DeepSeek-R1, Qwen3, Mistral, Gemma, Phi, SmolLM2—all locally in Chrome. Three inference backends:

  • WebLLM (MLC/WebGPU)
  • Transformers.js (ONNX)
  • Chrome's built-in Prompt API (Gemini Nano—zero download)

No Ollama, no servers, no subscriptions. Models cache in IndexedDB. Works offline. Conversations stored locally—export or delete anytime.

Free: https://noaibills.app/?utm_source=reddit&utm_medium=social&utm_campaign=launch_artificial

I'm not claiming it replaces GPT-4. But for the 80% of tasks—drafts, summaries, quick coding questions—a 3B parameter model running locally is plenty.

Not positioned as a cloud LLM replacement—it's for local inference on basic text tasks (writing, communication, drafts) with zero internet dependency, no API costs, and complete privacy.

Core fit: organizations with data restrictions that block cloud AI and can't install desktop tools like Ollama/LMStudio. For quick drafts, grammar checks, and basic reasoning without budget or setup barriers.

Need real-time knowledge or complex reasoning? Use cloud models. This serves a different niche—**not every problem needs a sledgehammer** 😄.

Would love feedback from this community 🙌.

submitted by /u/psgganesh
[link] [comments]
Audio Transcript

There are plenty of WebGPU demos out there, but I wanted to ship something people could actually use day-to-day.

It runs Llama 3.2, DeepSeek-R1, Qwen3, Mistral, Gemma, Phi, SmolLM2—all locally in Chrome. Three inference backends:

  • WebLLM (MLC/WebGPU)
  • Transformers.js (ONNX)
  • Chrome's built-in Prompt API (Gemini Nano—zero download)

No Ollama, no servers, no subscriptions. Models cache in IndexedDB. Works offline. Conversations stored locally—export or delete anytime.

Free: https://noaibills.app/?utm_source=reddit&utm_medium=social&utm_campaign=launch_artificial

I'm not claiming it replaces GPT-4. But for the 80% of tasks—drafts, summaries, quick coding questions—a 3B parameter model running locally is plenty.

Not positioned as a cloud LLM replacement—it's for local inference on basic text tasks (writing, communication, drafts) with zero internet dependency, no API costs, and complete privacy.

Core fit: organizations with data restrictions that block cloud AI and can't install desktop tools like Ollama/LMStudio. For quick drafts, grammar checks, and basic reasoning without budget or setup barriers.

Need real-time knowledge or complex reasoning? Use cloud models. This serves a different niche—**not every problem needs a sledgehammer** 😄.

Would love feedback from this community 🙌.

submitted by /u/psgganesh
[link] [comments]
0:00/0:00