Quality Gate - MemoryOS

Every add() call passes through a four-layer gate before extraction is queued. Blocked calls return HTTP 200 with a status indicating the blocking layer — inspect the body, not just the status code.

Layers

Layer	What it checks	Block reason
`L1`	Per-user rate limit — requests per minute	`rate_limit_exceeded`
`L2`	Content quality — score below 0.35	`low_quality`
`L3`	Semantic dedup — similarity above 0.92	`duplicate_query`
`L4`	Budget — tenant monthly quota exhausted on block policy	`budget_exhausted`

What gets blocked

Layer	Example input
`L1`	Same user sends many writes in one minute
`L2`	`"ok"`, `"hi"`, `"??"`
`L3`	Sending the same preference statement repeatedly
`L4`	Tenant is out of monthly calls or tokens

Quality scoring

The L2 score is based on message count, average message length, lexical diversity, and whether the conversation contains a question signal. Short or content-free messages score poorly.

How to keep block rates low

Send meaningful facts, preferences, goals, or procedures rather than filler text.
Don’t send the same memory-worthy statement on every turn.
Batch coherent conversational turns instead of single-word fragments.
Watch blocked_reason and budget_remaining_pct on add() responses.

​Layers

​What gets blocked

​Quality scoring

​How to keep block rates low

Layers

What gets blocked

Quality scoring

How to keep block rates low