Architecture¶
Reverse RAG¶
sembr inverts the standard RAG query pattern:
| Traditional RAG | sembr (Reverse RAG) | |
|---|---|---|
| When | On user query | Continuously, on each intent's own schedule |
| What is stored | Documents | User intent vectors |
| What is searched | Documents for a query | Articles for each intent |
| Latency | Query-time | Background job |
Intent vectors are computed once at creation time and stored in Qdrant. Each scheduled tick (or each freshly-ingested article, in event mode) drives an ANN search against the news collection — never a re-embed of the intent text.
Data flow¶
[RSS sources]
│ HTTP poll (APScheduler, per-feed interval)
▼
[collector] ──► article text + metadata
│
▼
[embedder] ──► BGE-M3 via SiliconFlow /v1/embeddings (batch=32)
│
▼
[vector_store / news_current alias] ──► Qdrant, payload includes ingested_at_ts + feed_id
│
│ per-intent schedule (cron or event)
▼
[matcher] ──► query_points(query=intent_vector, score_threshold=..., query_filter=...)
│
▼
[summarizer] ──► chat completions (Jinja2 templates, per-intent system + instruction)
│
▼
[notifier / email] ──► SMTP digest, rendered in the intent's timezone
The same diagram holds for both schedule modes — the matcher's trigger differs (APScheduler tick vs ingestion-driven event buffer drain) but everything downstream is identical.
Per-intent schedules¶
Two modes, picked per intent at creation time:
- Cron mode:
preset: "hourly" | "daily" | "weekly",hour/minute/weekdayas appropriate,lookback_seconds(default 86400, range 5 min – 30 days),skip_seen(default true). The matcher job runs against articles ingested within the lookback window - Event mode:
trigger_count(1–10, default 3) andmax_wait_seconds(60–86400, default 1800). Articles matching the intent buffer inevent_pendingand fire the summarizer once either the count is reached or the wait expires — whichever comes first
schedule.mode is immutable — changing modes requires DELETE + POST. Within a mode, every other field is editable via PUT.
Dual-collection Qdrant design¶
Two Qdrant collections, both accessed via aliases so a model swap can re-point them atomically:
intents_current— pre-computed intent vectors, one point per intent, full precision (no quantization — query-side precision matters)news_current(today:news_bge-m3_v1) — article vectors with INT8 scalar quantization in RAM and full vectors on disk; payload index oningested_at_tsandfeed_id
The matcher calls query_points(query=vector, score_threshold=..., query_filter=...) per intent. (The older search_batch API was removed in qdrant-client 1.10.) Intent vectors live in their own collection so the matcher can refresh just one side independently.
Collection naming follows news_{model}_{version} to enable zero-downtime model upgrades: provision a new collection in parallel, re-embed in background, then atomically switch the news_current alias. Every payload carries embedding_model_version so a partial cutover is identifiable.
Deduplication¶
Two layers:
- Exact:
MD5(url + title)fingerprint stored infeed_items; collector skips already-seen articles before they reachpending_articles - Per-intent semantic:
match_seenrows record(intent_id, article_id)after each successful summarize; cron-mode intents that re-scan the same lookback window won't re-fire the same article
match_seen cascades on intent delete. A PUT that changes the intent's text clears match_seen for that intent so the re-embedded vector can re-match articles it would otherwise have skipped.
Prompt templates¶
Templates are flat .md files under /app/prompts/{system,instruction}/, bind-mounted read-write from the host's ./prompts/ so the dashboard can edit them at runtime. The summarizer reads the file on every tick (sembr/summarizer/templates.py::load_template) — there is no in-memory cache to invalidate, so a host-side or dashboard-driven save reaches the next digest with no restart.
templates layer (filesystem-only on save/delete; cross-boundary on rename):
POST /api/prompts/templates/{kind} ──► save_template_atomic (.tmp + os.replace)
PUT /api/prompts/templates/{kind}/{name} ──► save_template_atomic
DELETE /api/prompts/templates/{kind}/{name} ──► delete_template (after 409 ref-check)
POST /api/prompts/templates/{kind}/{name}/rename ──► os.rename → db.transaction() UPDATE intents
(UPDATE failure → reverse os.rename)
The reserved name default is enforced both at the API layer (HTTP 403 on writes / 422 on reuse-as-target) and in the BUILTIN_NAMES frozenset in sembr/summarizer/templates.py. Per-file size cap is 64 KiB. Strict placeholder validation (try_render) runs on every save so a typo in {intent_text} cannot poison the next digest. Rename is the only template operation that crosses into SQLite: the file move runs first, the cascade UPDATE intents SET {kind}_template = ? runs inside db.sqlite.transaction() afterwards, and a SQLite failure triggers reverse os.rename so file and DB never diverge.
Delivery¶
Email is the only channel that ships. EmailChannel is a BaseChannel subclass with no abstract methods (the marker-ABC pattern — per-channel send signatures diverge enough that a common ABC would erase typing). The dispatcher in main.py pattern-matches on the channel config type:
for ch in intent.channels:
if isinstance(ch, EmailChannelConfig):
await email_ch.send(result, config=ch, intent_name=intent.name,
intent_timezone=intent.timezone)
Adding a channel — Telegram, Discord, Slack — is additive: a new XConfig: BaseModel with its own Literal["x"] discriminator value, a new channel class, and a new isinstance arm. The discriminated union on Intent.channels makes the API boundary validate channel-specific parameters before any side effect runs.
There is no notification_log retry/DLQ machinery in 1.0 — a failed send is logged and dropped. Cron-driven intents pick up missed deliveries on the next tick by virtue of their lookback window; event-driven intents lose the buffered tick on send failure. A stateful retry pipeline is post-1.0 work.
SQLite + WAL¶
WAL mode (journal_mode=WAL; synchronous=NORMAL; cache_size=-64000; busy_timeout=5000) is initialized at startup and verified by reading the pragma back. Readers never block writers; the application uses a single shared aiosqlite connection per process and serializes multi-statement writes through an asyncio lock exposed as transaction().
APScheduler integration¶
AsyncIOScheduler started inside the FastAPI lifespan context manager. Every job sets coalesce=True to prevent backlog on recovery; shutdown uses wait=False so uvicorn teardown isn't blocked by an in-flight tick. The same scheduler runs the per-feed RSS polls, the per-intent matcher jobs (cron-mode), the embedder worker, and the hourly dashboard log retention prune.
Source registration¶
The 1.0 source registry is a hardcoded SOURCE_REGISTRY: dict[str, type[BaseSource]] in collector.scheduler (currently {"rss": RSSSource}). The dashboard reads this dict to populate the create-feed form, so adding an HTTP/JSON source means subclassing BaseSource and registering in that dict — the form picks it up on the next dashboard page load. A pyproject.toml entry-points-driven plugin discovery layer is on the post-1.0 roadmap but does not exist today.
Docker Compose memory limits¶
api: mem_limit: 1500m, mem_reservation: 512m
qdrant: mem_limit: 2g, mem_reservation: 1g
rsshub: mem_limit: 512m
Right-sized against live measurement (api ~125 MiB, rsshub ~355 MiB, qdrant ~520 MiB at the default 53-source workload — total ~1 GB in use). Each limit leaves ~4× headroom for cron-mode batch scans, concurrent LLM summarizer calls, and Qdrant ANN bursts. Raise qdrant.mem_limit to 4G+ if you ingest millions of articles; raise api.mem_limit to 3G if you run tens of intents with concurrent fire bursts.
Qdrant stores quantized vectors in RAM (always_ram=True) and raw vectors on disk (on_disk=True) using INT8 scalar quantization. 10 M vectors at 1024-dim ≈ 600 MB RAM.