A single-node search engine for e-commerce catalogs. It runs hybrid retrieval and Learning-to-Rank in under 15 milliseconds on a single CPU core. No clusters. No GPUs. No vector databases. No SaaS subscriptions. Based on Mercadona Tech's 2026 architecture.
$ git clone https://github.com/2701/findegil.git $ cd findegil && uv sync $ uv run findegil quickstart --catalog wands
Candidates are retrieved using keyword matching and semantic embeddings. A machine learning model trained on your own click data ranks the results. Everything executes locally.
Most catalogs
fit in memory.
Findegil is an experimental research preview based on the architecture Mercadona Tech published in 2026. It was built to validate a single assumption. The vast majority of e-commerce catalogs fit comfortably in the memory of a single machine.
We discard the operational overhead of Elasticsearch clusters, dedicated vector databases, and GPUs. Findegil explores providing the ranking quality of an enterprise search engine without the complexity of distributed systems.
Keyword matching uses tantivy-py. The index lives in-process. You do not have to maintain Elasticsearch or babysit shards.
Queries are encoded on the CPU via ONNX Runtime. Embeddings are stored as a NumPy matrix in memory. You do not need Pinecone, Qdrant, or Milvus.
Lexical and semantic results are merged using Reciprocal Rank Fusion. This combines ranked lists without requiring manual score weight tuning. Adding semantic retrieval to a BM25 baseline costs about 5 ms and 66 MB.
Multiple store locations share a single master index. Availability is filtered per tenant using NumPy bitsets. You do not need to duplicate indexes for different regions.
WANDS is Wayfair's public dataset for e-commerce search. It contains 43,000 furniture products and 480 queries with human relevance judgments. The numbers below come directly from continuous integration latency tests and frozen evaluations in our repository.
| Configuration | MRR@10 | NDCG@10 | Recall@50 |
|---|---|---|---|
| Lexical only (Tantivy BM25) | 0.873 | 0.648 | 0.226 |
| Hybrid (BM25 + Semantic + RRF) | 0.922 | 0.712 | 0.236 |
Adding semantic retrieval lifts MRR by about 5 points and NDCG by 6 points over a BM25 baseline. This costs 5 ms of CPU and 66 MB of RAM.
Architectural budget is 15 ms. Real measurements stay an order of magnitude under it.
Findegil includes a machine learning pipeline using CatBoost YetiRank. It defaults to hybrid retrieval for cold starts. Once you collect sufficient click data, the pipeline is ready to train a model.
Automatically corrects position bias. This is the tendency for users to click the top result just because it is at the top.
Models are trained on past data and evaluated on future data. No look-ahead, no inflated scores.
The training pipeline evaluates new models against an immutable golden set and refuses to ship them if quality drops.
It is not for billions of logs or documents. Findegil is strictly for e-commerce catalogs that fit in memory.
No Pinecone, no Qdrant, no Milvus. Embeddings live in memory as standard NumPy matrices.
No API keys, no usage limits, no hidden fees. MIT licensed. You run it on your own machine.
Bundled adapters for the Wayfair WANDS dataset let you spin up a real catalog in seconds. Bring your own catalog with a small Python adapter.
# install git clone https://github.com/2701/findegil.git cd findegil && uv sync # fetch WANDS bundle and start FastAPI uv run findegil quickstart --catalog wands # query it curl -s -X POST http://127.0.0.1:8000/search \ -H 'content-type: application/json' \ -d '{"query": "coffee table", "limit": 3}' | jq
Findegil is an open implementation of the search architecture that Mercadona Tech published under MIT in 2026. The original team designed and validated the production system. Findegil contains no proprietary code or data. It serves as a research preview to test that design as an open-source package.
Built and maintained by 2701 Labs.
Findegil was the scribe in Minas Tirith who copied the Red Book of Westmarch in the Fourth Age. An archivist for a search engine.