add() call passes through a four-layer gate before extraction is queued. Blocked calls return HTTP 200 with a status indicating the blocking layer — inspect the body, not just the status code.
Layers
| Layer | What it checks | Block reason |
|---|---|---|
L1 | Per-user rate limit — requests per minute | rate_limit_exceeded |
L2 | Content quality — score below 0.35 | low_quality |
L3 | Semantic dedup — similarity above 0.92 | duplicate_query |
L4 | Budget — tenant monthly quota exhausted on block policy | budget_exhausted |
What gets blocked
| Layer | Example input |
|---|---|
L1 | Same user sends many writes in one minute |
L2 | "ok", "hi", "??" |
L3 | Sending the same preference statement repeatedly |
L4 | Tenant is out of monthly calls or tokens |
Quality scoring
The L2 score is based on message count, average message length, lexical diversity, and whether the conversation contains a question signal. Short or content-free messages score poorly.How to keep block rates low
- Send meaningful facts, preferences, goals, or procedures rather than filler text.
- Don’t send the same memory-worthy statement on every turn.
- Batch coherent conversational turns instead of single-word fragments.
- Watch
blocked_reasonandbudget_remaining_pctonadd()responses.