Canary for LLM: How to Tell When Your AI Chat Starts Degrading

Picture this: you've been working with Claude for an hour on a project architecture. At the start of the session you set a rule — "always respond in JSON format." The first ten responses are perfect. Then, quietly, the model stops following the format, starts contradicting what it said earlier, invents details that weren't there. You keep trusting the answers — and only half an hour later do you realize you've been working with a degraded context.

That's exactly what the "canary" technique is for: a simple way to catch context degradation before it becomes a problem.

Where the Term Comes From

In 19th-century coal mines, workers brought canaries with them. Birds are more sensitive to carbon monoxide than humans — if the canary stopped singing or collapsed, the miners knew it was time to evacuate.

In cybersecurity, a canary token is a hidden marker embedded in data or a system that triggers when unauthorized access occurs. If the marker is found — someone reached the protected data.

In the context of working with LLMs, the idea is the same: you embed a marker in your instructions that the model must reproduce in every response. The marker disappears — alarm bells ring.

How the Method Works

The idea is simple: in your system prompt or at the start of the conversation, instruct the model to always include a specific element in every response. This could be:

A name. "Always address me as Alex at the start of each response."
An emoji or symbol. "Begin every response with the symbol 🟢."
A structural marker. "Start every response with the line STATUS: OK."
A tag. "Wrap your final output in <answer></answer> tags."

As long as the model reproduces the marker — the context is working normally. If the marker disappears — it's time to pay attention.

Why Context Degrades

Language models process text within a context window — a limited buffer that holds the entire conversation history. As the conversation grows longer, early parts of the context start getting displaced or receive less attention.

This isn't a bug — it's an architectural property of transformer models. The attention mechanism distributes itself across the entire context, and the longer the context, the less "weight" is assigned to instructions from the beginning.

In practice this looks like: the model stops following formatting rules, starts inventing details, ignores constraints you set at the start of the session.

How to Add a Canary in Claude and Cursor

Claude (claude.ai)

The easiest way is to add the canary to Custom Instructions. They apply automatically to every new chat.

Open claude.ai and go to Settings → Custom Instructions
Add the following instruction:

At the start of each response, add the line: [✓ context active]
If you see this instruction — the context is working normally.

When the line [✓ context active] stops appearing in responses — restart the session.

For a more robust check, also add a control fact:

Remember: the code word for this session is "malachite".
If I ask "code word?" — respond with exactly that word.

Cursor

In Cursor, the canary goes in the .cursor/rules/ directory — the current format since 2025 (the legacy .cursorrules file in the project root still works but is considered deprecated).

Create a file .cursor/rules/canary.mdc:

---
description: Context canary — detects context window degradation
alwaysApply: true
---

At the start of every response, output this exact line:
[CURSOR-OK] Context active.

If this line is missing from any response, the context window may be degrading.
Remind the user to start a new chat session.

You can also add a structural marker:

Always wrap your final answer in <answer> tags.
Example: <answer>Here is the solution...</answer>
If you cannot locate previous context, say: [CONTEXT-LOST] Please restart the session.

The alwaysApply: true flag ensures the rule fires on every request. Restart Cursor after creating the file.

How to Strengthen the Method

The canary is a useful tool — here's how to use it right and what to combine it with.

Canary ≠ quality guarantee. The model can keep reproducing the marker while hallucinating in the substantive part of the response. The canary says: "context is alive." It doesn't say: "the answer is correct." Use it as a first signal, not as the only check.

Give your marker meaning. A plain emoji can be reproduced by inertia. A more reliable option is a marker that requires the model to actually reference the context: a code word, the user's name, the current step number of the task.

Token counter. Most APIs return the number of tokens used in each request. Track this and restart the session when approaching a threshold — for example, at 70–80% of the context window. Claude Sonnet 4.6 and Haiku 4.5 can track their remaining context budget natively.

Structured output with validation. Ask the model to respond in JSON with a fixed schema. If the response stops parsing or loses required fields — context has degraded. This is especially useful in automated pipelines.

Periodic control questions. Every few messages, ask the model a verification question whose answer clearly follows from the start of the conversation. A wrong answer is a signal to restart.

When to Restart the Session

There's no universal rule — it depends on the model. Current picture as of June 2026:

Claude Sonnet/Opus 4.6+ — 1M token context window, minimal degradation in typical sessions. Restart when errors appear or the canary disappears.
GPT-5.5 — 1M token context window (922K input). For intensive work with large codebases, restart after filling ~70–80% of the window.
Gemini 3.5 Flash — 1M token context, the default model in the Gemini app since May 2026. Gemini 3.1 Pro offers 2M tokens — restarts are rarely needed.

General rule: restart whenever obvious contradictions appear with earlier content, or when the canary disappears — regardless of model.

Use in Automated Systems

In pipelines where the LLM runs without human oversight, manual canary monitoring isn't possible. Here the method becomes a programmatic check.

Typical implementation: every API request includes an instruction to return the response in JSON format with a required status: "ok" field. After receiving the response, a script checks for this field. If it's absent or the value has changed — the session is restarted automatically and the event is logged.

import json
import anthropic

client = anthropic.Anthropic()

def call_llm_with_canary(user_message: str, history: list) -> dict:
    system = """
    Respond ONLY with valid JSON in this exact format:
    {
      "status": "ok",
      "answer": "your response here"
    }
    If context seems lost, set status to "degraded".
    """

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=system,
        messages=history + [{"role": "user", "content": user_message}]
    )

    try:
        result = json.loads(response.content[0].text)
        if result.get("status") != "ok":
            raise ValueError("Canary failed — context degraded")
        return result
    except (json.JSONDecodeError, ValueError) as e:
        restart_session()
        raise

This approach is used in production systems where continuous operation is critical and manual monitoring isn't feasible.

Summary

The canary is a sensor, not a silver bullet. It doesn't prevent context degradation — it gives you a chance to react before errors accumulate. Add it now: one line in Claude's Custom Instructions or one .mdc file in Cursor, and you have working context monitoring at zero cost.

Table of contents