Matching the world's top multi-hop RAG systems, with no GPU, no fine-tuning, just pip install

June 20, 2026

1:09

Matching the world's top multi-hop RAG systems, with no GPU, no fine-tuning, just pip install

Imagine matching the top multi-hop retrieval-augmented generation systems without needing GPUs or fine-tuning. That’s exactly what /u/ObjectiveEntrance740 highlights with MOTHRAG. Unlike HippoRAG, CoRAG, or NeocorRAG — systems that depend on heavy infrastructure — MOTHRAG runs entirely through simple API calls, with just a pip install. ((slower)) And get this — its performance is within 0.7 points of the state-of-the-art GPU systems, and it even beats some on 2Wiki. So, what does this mean? It’s a game-changer for deploying powerful QA models on a budget, without sacrificing much accuracy. The pipeline is modular: you can swap readers, embedders, or retrieval methods without retraining. As /u/ObjectiveEntrance740 points out, a lower-cost setup can still reach top-tier results. The big takeaway? You don’t need the infrastructure of million-dollar labs to do groundbreaking multi-hop retrieval. The question now: how long before this becomes the new standard for accessible AI?

The three systems below (HippoRAG 2, CoRAG, NeocorRAG) are among the strongest multi-hop QA frameworks published. Every one of them depends on a GPU, fine-tuning, or constrained decoding to get there.

MOTHRAG sits right alongside them on F1, while running entirely on commodity API calls. No GPU. No fine-tuning. No constrained decoding. No non-commercial licenses.

System | Deployment | HotpotQA | 2Wiki | MuSiQue | AVG
HippoRAG 2 | offline graph + GPU | 75.5 | 71.0 | 48.6 | 65.0
CoRAG | trained retrieval | 75.1 | 75.1 | 52.9 | 67.7
NeocorRAG | GPU constrained decode| 78.3 | 76.1 | 52.6 | 69.0
MOTHRAG (ours) | commodity APIs only | 78.1 | 76.3 | 50.5 | 68.3

Highest average F1 among commercially-deployable frameworks, within 0.7 points of the GPU-bound state of the art, and ahead of it on 2Wiki. The point isn't beating these systems, it's reaching their tier with none of their infrastructure.

Deployment is a pip install plus API keys:

pip install mothrag

from mothrag import MothRAG
m = MothRAG.from_documents(["Paris is the capital of France.", "The Eiffel Tower is in Paris."])
result = m.query("In which country is the Eiffel Tower?")
print(result.answer)
print(result.confidence)

The pipeline is fully modular. Readers, embedders and retrieval judges all swap without retraining, installed as optional extras: gemini/openai for API readers and embedders, sentence-transformers for a local embedding fallback, faiss for vector stores over 100k-10M chunks, retrieval for classic BM25/graph features, prod for the full stack.

A one-flag economy tier swaps the retrieval judge and drops cost from ~$0.032 to ~$0.018 per query at statistical parity on HotpotQA and 2Wiki.

Every answer is proof-tree-structured so you can inspect each reasoning hop, and the per-query outputs behind every table in the paper are released so you can verify the numbers.

Paper: https://zenodo.org/records/20668567
Code (Apache 2.0): https://github.com/juliangeymonat-jpg/mothrag
Site: https://mothrag.com

Happy to answer questions about the pipeline or the judge design.

submitted by /u/ObjectiveEntrance740
[link] [comments]

Audio Transcript