Skip to main content
Every add() call passes through a four-layer gate before extraction is queued. Blocked calls return HTTP 200 with a status indicating the blocking layer — inspect the body, not just the status code.

Layers

LayerWhat it checksBlock reason
L1Per-user rate limit — requests per minuterate_limit_exceeded
L2Content quality — score below 0.35low_quality
L3Semantic dedup — similarity above 0.92duplicate_query
L4Budget — tenant monthly quota exhausted on block policybudget_exhausted

What gets blocked

LayerExample input
L1Same user sends many writes in one minute
L2"ok", "hi", "??"
L3Sending the same preference statement repeatedly
L4Tenant is out of monthly calls or tokens

Quality scoring

The L2 score is based on message count, average message length, lexical diversity, and whether the conversation contains a question signal. Short or content-free messages score poorly.

How to keep block rates low

  • Send meaningful facts, preferences, goals, or procedures rather than filler text.
  • Don’t send the same memory-worthy statement on every turn.
  • Batch coherent conversational turns instead of single-word fragments.
  • Watch blocked_reason and budget_remaining_pct on add() responses.