There are plenty of WebGPU demos out there, but I wanted to ship something people could actually use day-to-day.
It runs Llama 3.2, DeepSeek-R1, Qwen3, Mistral, Gemma, Phi, SmolLM2—all locally in Chrome. Three inference backends:
- WebLLM (MLC/WebGPU)
- Transformers.js (ONNX)
- Chrome's built-in Prompt API (Gemini Nano—zero download)
No Ollama, no servers, no subscriptions. Models cache in IndexedDB. Works offline. Conversations stored locally—export or delete anytime.
Free: https://noaibills.app/?utm_source=reddit&utm_medium=social&utm_campaign=launch_artificial
I'm not claiming it replaces GPT-4. But for the 80% of tasks—drafts, summaries, quick coding questions—a 3B parameter model running locally is plenty.
Not positioned as a cloud LLM replacement—it's for local inference on basic text tasks (writing, communication, drafts) with zero internet dependency, no API costs, and complete privacy.
Core fit: organizations with data restrictions that block cloud AI and can't install desktop tools like Ollama/LMStudio. For quick drafts, grammar checks, and basic reasoning without budget or setup barriers.
Need real-time knowledge or complex reasoning? Use cloud models. This serves a different niche—**not every problem needs a sledgehammer** 😄.
Would love feedback from this community 🙌.
[link] [comments]