summarizer¶
Per-intent LLM digest. The matcher hands off a list of
Matchobjects viaapp.state.on_match; the pipeline renders a system + instruction prompt from disk-backed Markdown templates, calls the configured LLM backend, wraps the result inSummaryResult, and forwards it to the notifier-installedon_summarycallback.
Responsibility¶
- Define the LLM backend ABC (
BaseLLMBackend) so additional backends (Ollama, mlx-lm, Claude, Gemini) can be plugged in without touching the pipeline - Ship a working OpenAI-compatible backend (
APIBackend) that targets any/v1/chat/completionsendpoint - Discover, load, and render prompt templates under
prompts_dirwith a strict placeholder whitelist; expose a small reusable helper module for the API and intent layers - Receive a per-intent batch of
Matchobjects and produce oneSummaryResultper call — there is no in-pipeline grouping or fan-out. The LLM structures the digest itself; the matcher / event buffer decides what to send - Convert article HTML to plain text (entity-decoded, link-stripped) before the body reaches the LLM, and water-fill bodies into the backend's published prompt budget so short articles stay whole and only the longest get truncated
- Build a list of
Citationobjects so the notifier can render footnotes without re-querying the database - Provide a pre-push hook seam so a future feature can veto a summary right before delivery (rate limiting, content filters)
- Surface a separate
on_template_errorcallback so the operator gets a notification when a renamed or syntactically broken template stops a tick — silent failure on prompts is worse than a delivery error
Not in scope¶
- Match selection — the matcher decides which articles a tick covers
- Per-article scoring or grouping by similarity —
summarizer.groupingexists for the event buffer, not for the cron path. The pipeline treats the whole batch as one digest - Channel formatting and retries —
notifierowns those - Prompt engineering for specific use cases — bundled
prompts/system/default.mdandprompts/instruction/default.mdare the starting point; users override them per-intent
Public interface¶
Domain types (models.py)¶
@dataclass
class Citation:
article_id: str
title: str
url: str
source: int # feed_id; raw integer for downstream lookups
published_at: str | None
source_name: str | None # resolved feed.name; None when the feed was deleted
score: float | None # cosine similarity from the matcher; None when the citation came from a non-match path
@dataclass
class SummaryResult:
intent_id: int
summary: str
citations: list[Citation] # canonical ordered list; [N] in `summary` indexes into this
primary: Citation | None # citations[0] for legacy callers
other_sources: list[Citation] # citations[1:]
PrePushHook = Callable[[SummaryResult], Awaitable[bool]]
OnSummaryCallback = Callable[[SummaryResult], Awaitable[None]]
citations is the new contract; primary and other_sources are preserved for callers (notifier templates, tests) that predate the unified list.
Pipeline (pipeline.py)¶
class SummaryPipeline:
def __init__(
self,
llm: BaseLLMBackend,
on_summary: OnSummaryCallback | None = None,
pre_push_hook: PrePushHook | None = None,
get_intent_prompt_ctx: IntentPromptCtxFetcher | None = None,
get_feed_names: FeedNameFetcher | None = None,
on_template_error: OnTemplateError | None = None,
prompts_dir: Path = Path("/app/prompts"),
)
async def handle(self, matches: list[Match]) -> None
handle is the entry point installed as app.state.on_match. It is contractually never-raise — any exception is logged and the tick is silently skipped, mirroring the matcher's log_matches contract.
A single tick:
- Sort
matchesnewest-first bypublished_at(None sorts last) - Resolve
(system_template_name, instruction_template_name, intent_text, language)via the suppliedget_intent_prompt_ctx; an emptyintent_textshort-circuits the tick - Render the system template with
{language}; a missing or broken system template routes toon_template_errorand stops the tick - Resolve feed names for citations via
get_feed_names; failure here logs but does not abort - Render the instruction template once with
articles=""to measure the wrapper's character cost - Compute the body budget:
llm.max_prompt_chars × 0.85 − len(system) − len(instruction wrapper) − per-entry boilerplate. If the budget is negative (system + instruction alone exceed 85% of the model's prompt budget), log an error and stop the tick — that is a configuration problem, not a data problem - Water-fill the article bodies into the body budget: short articles stay whole, only bodies above the cap are truncated. Log a warning when truncation happens, including the cap level and how many bodies were affected
- Re-render the instruction template with the assembled
articlesblock; the same template-error routing as step 3 applies - Call
llm.summarize(prompt, system=system_prompt); an LLM error logs and stops the tick (no fallback) - Build the
SummaryResult(citations indexed 1..N matching the LLM's[N]references) - Optionally consult
pre_push_hook; a False return drops the result silently - Hand off to
on_summary
LLM backend ABC (llm/base.py)¶
class BaseLLMBackend(ABC):
@property
@abstractmethod
def max_prompt_chars(self) -> int # total prompt budget the backend accepts
async def summarize(self, prompt: str, *, system: str | None = None) -> str # raises LLMError
async def health(self) -> bool
async def aclose(self) -> None
max_prompt_chars is the contract that lets the pipeline size articles correctly without re-tuning a separate setting on every model swap. Backends that front more than one model must publish the budget for the actually-configured one. Characters (not tokens) because the pipeline operates on strings; tokens-per-character varies by language and tokenizer, so callers should set this conservatively.
summarize returns a Markdown string per the system template's contract. system is sent as the OpenAI system message; backends that don't support a system role should prepend it to prompt. health is reachable-only — it does not validate the model name; expect upstream proxies that respond 200 on /models even when the configured model is wrong.
OpenAI-compatible backend (llm/api.py)¶
class APIBackend(BaseLLMBackend):
def __init__(
self,
base_url: str, api_key: str, model: str,
timeout: float, max_prompt_chars: int,
)
Targets /v1/chat/completions on any OpenAI-shaped endpoint (SiliconFlow / DeepSeek / OpenAI / a self-hosted vLLM behind a proxy). Errors are translated to LLMError; non-2xx response bodies are scrubbed of the API key before being raised, so a misconfigured proxy that echoes the Authorization header in a 401 body does not leak the key into logs.
Factory (llm/factory.py)¶
Currently returns APIBackend unconditionally. Local backends (mlx-lm, Ollama) plug in here when available; the factory is the only place Settings is read so the backend constructors stay test-friendly.
Templates (templates.py)¶
PROMPTS_DIR: Final[Path] = Path("/app/prompts")
BUILTIN_NAMES: frozenset[str] = frozenset({"default"})
MAX_TEMPLATE_BYTES: Final[int] = 64 * 1024
# Read-side
def template_exists(prompts_dir, kind, name) -> bool
def list_templates(prompts_dir, kind) -> list[str]
def load_template(prompts_dir, kind, name) -> str
def render_system(prompts_dir, name, *, language) -> str
def render_instruction(prompts_dir, name, *, intent_text, articles) -> str
# Write-side (used by sembr/api/prompts.py)
def save_template_atomic(prompts_dir, kind, name, content) -> Path
def delete_template(prompts_dir, kind, name) -> None
def rename_template(prompts_dir, kind, old_name, new_name) -> Path
def try_render(kind, content) -> None # strict-placeholder gate
class TemplateNotFoundError(FileNotFoundError): ...
class TemplateRenderError(ValueError): ...
Templates live under PROMPTS_DIR/{system,instruction}/{name}.md. name is validated to reject path separators, leading dots, and .. segments; the resolved path is checked with Path.is_relative_to(prompts_dir) so a user-supplied name cannot escape the prompts root via symlinks. Files are read on every call (no in-process caching) so an operator's edit takes effect on the next tick without a restart.
PROMPTS_DIR is a module-level Final[Path] constant — the legacy Settings.prompts_dir field was removed in the template-management refactor. Tests redirect via monkeypatch.setattr("sembr.summarizer.templates.PROMPTS_DIR", tmp_path). The SummaryPipeline.__init__(prompts_dir=...) kwarg is unchanged so unit tests can still inject a custom directory.
Rendering is str.format_map with a strict __missing__ that raises KeyError for any placeholder outside the documented whitelist:
| Kind | Allowed placeholders |
|---|---|
| system | {language} |
| instruction | {intent_text}, {articles} |
Anything else triggers TemplateRenderError, which the pipeline routes to on_template_error so the operator gets a notification rather than a silently-broken digest.
The write-side helpers are pure-filesystem and side-effect-narrow:
save_template_atomicwrites to{kind}/.{name}.md.tmp.<pid>.<monotonic_ns>thenos.replaces onto the final path. Same-directory POSIX rename is atomic; readers see either the old or the new bytes, never half-written. The hidden tmp filename is filtered out bylist_templates's glob.delete_templateraisesTemplateNotFoundErrorinstead of silently succeeding so callers can map to HTTP 404.rename_templateis the single-step filesystem rename (validation +os.rename); it does NOT pre-check existence and does NOT touch SQLite. The API layer (sembr/api/prompts.py::rename_template_endpoint) orchestrates the cross-boundary 2PC: pre-existence check →os.rename→db.transaction(): rename_intent_template, with reverseos.renameon UPDATE failure.try_render(kind, content)runsformat_mapwith empty-string-valued placeholders so the API can reject typo'd{...}keys at save-time before the bad bytes reach disk.
Title grouping (grouping.py)¶
def normalize(title: str) -> str
class GroupingStep:
def __init__(self, threshold: float = 0.85)
def group(self, matches: list[Match]) -> list[list[Match]]
SequenceMatcher-based union-find clustering of titles. The pipeline does not call this — it is exported for the matcher's event-buffer (matcher/event_buffer.py), which uses it to merge near-duplicate cross-source reports inside an event tick. The threshold defaults to 0.85 (tight; only catches near-identical headlines).
Configuration¶
| Field | Default | Notes |
|---|---|---|
llm_api_base_url |
https://api.siliconflow.cn/v1 |
OpenAI-compatible base; SiliconFlow shares its key with the embedder |
llm_api_key |
"" (empty) |
Empty value logs a warning at startup; every LLM call returns 401 |
### Aggregate pipeline (aggregate.py) |
class AggregateResult:
summary: str
rows_used: int
rows_total: int
async def aggregate_history(
rows: list[dict],
*,
llm: BaseLLMBackend,
system_tpl: str,
inst_tpl: str,
language: str,
history_placeholder: str = "{history}",
) -> AggregateResult
Aggregates multiple history-row summaries through a single LLM call. The pipeline:
- Joins the per-row summary text into a single
{history}block - Renders the intent's system and instruction templates, replacing
{history}with the joined text (the template must contain{history}or the call short-circuits withrows_used=0) - Water-fills the joined history text into the backend's prompt budget, truncating the oldest rows first when the combined text exceeds capacity — so the LLM sees the most recent summaries in full
- Calls the LLM backend with the rendered prompt
- Returns the LLM-produced aggregate summary plus
rows_used/rows_totalcounts
aggregate_history is the shared core used by both the preview endpoint (POST .../aggregate) and the send endpoint (POST .../aggregate/send). The send path delegates delivery to dispatch_summary in notifier.dispatcher.
Template placeholders¶
The history feature adds one placeholder beyond the per-call ones documented above:
| Kind | Additional placeholder | Purpose |
|---|---|---|
| system / instruction | {history} |
Joined summary rows from summary_history, replaced by the aggregate pipeline |
{history} is only meaningful inside aggregate calls — the standard tick-based pipeline never sees it.
| llm_model | deepseek-ai/DeepSeek-V4-Flash | Passed verbatim as "model" in the request body |
| llm_timeout_seconds | 60.0 | Per-request HTTP timeout |
| llm_max_prompt_chars | 2_000_000 | Total prompt-side character budget for the LLM backend (system + instruction + articles). Tune to the configured model's context window: 2_000_000 is roomy for DeepSeek-V4-Flash (1M token ctx); drop to ~16_000 for an 8K-token local model. Pipeline reserves 15% for the LLM response and water-fills bodies into the rest |
| (constant) PROMPTS_DIR | /app/prompts | Module-level Final[Path] in sembr/summarizer/templates.py; not configurable since the template-management refactor (legacy Settings.prompts_dir field and SEMBR_PROMPTS_DIR env var both removed). Bind-mount the host ./prompts directory in docker-compose.yml to persist edits across rebuilds. Tests redirect via monkeypatch.setattr("sembr.summarizer.templates.PROMPTS_DIR", tmp_path) |
Upstream dependencies¶
config.Settings— every LLM and prompts path settingmatcher.callback.Match— the input shapehandleconsumesdb.intents.get_intent(indirectly, via the lifespan-installedget_intent_prompt_ctx) — resolves per-intent template names, intent text, and languagedb.feeds(indirectly, viaget_feed_names) — resolvesCitation.source_name
Downstream consumers¶
notifier.email.EmailChannel— receives theSummaryResultthrough the lifespan-installedon_summarycallbacknotifier.email.EmailChannel.send_error— receives template errors throughon_template_errordashboard.read_model— readsnotification_logrows whose state machine the notifier owns; the summarizer itself never touches that tableapi.promptsandapi.intents— reusetemplates.list_templates,template_exists,load_templateto power the/api/promptsbrowser and the intent-create template-name validation
Known constraints¶
- One summary per intent per tick: the pipeline does not split a batch into sub-events. The matcher decides what reaches a single tick — under cron mode that is everything that scored above threshold in the lookback window; under event mode that is the flushed
event_pendingrows. If you want per-event splitting on the cron path, wraphandleand call it once per pre-grouped sub-list. - Templates are read on every call: no in-process caching. Disk I/O cost is negligible at 1.0 scale (a few intents firing per minute) but a future high-fan-out deployment should add a cache or move templates to a database table.
- Truncation is character-aligned, not sentence-aware: when water-filling forces a body shorter than its full length, the cut happens at character position N — no sentence boundary lookup, no Markdown fence repair. A long article gets a clean middle and a sliced tail. Acceptable for monitoring digests; not appropriate for a summarization product where the full text matters.
- Token-vs-character heuristics live with the operator:
llm_max_prompt_charsis in characters. The pipeline never tokenizes; English ≈ 4 chars/token and Chinese ≈ 1 char/token, so a setting that fits a 1M-token model on English content (4M chars) might overflow on a Chinese-heavy intent. Set conservatively for non-English deployments. APIBackendconstructshttpx.AsyncClientin__init__: the client is bound to whichever event loop is running at construction time. Production wiring constructs it insidelifespan, so this is fine; a test that constructs the backend at module import time and then runs an asyncio test will fail with "event loop is closed".health()does not validate the model name: it pings/modelsfor reachability. A misconfiguredllm_model(typo, removed from upstream) will only surface on the first realsummarizecall.- No retry policy on transient LLM errors:
summarizeraises on the first non-2xx; the pipeline logs and drops the tick. Cron mode re-tries naturally on the next schedule; event mode loses the buffered batch (it was already cleared byflushbeforeon_summary). A future change should add a small retry budget for 429 / 5xx specifically. - Default fallback strings have been removed: earlier versions of the pipeline carried hardcoded copies of
default.mdfor both system and instruction templates. Those were never used at runtime — template errors route toon_template_errorinstead — and have been deleted to avoid two versions of the same prompt drifting apart. The on-diskprompts/system/default.mdandprompts/instruction/default.mdare now the single source of truth.