Cursor often feels like a mind-reading pair programmer.
It understands intent across files, reasons about architecture, and even respects project-level rules. Until suddenly… it doesn’t.
Sometimes Cursor confidently contradicts itself. Sometimes it ignores project rules. Sometimes it “forgets” conclusions it made just a moment ago. After running into this one too many times, I stopped treating it as magic and decided to systematically reverse-engineer how Cursor actually manages context.
This post documents:
- the exact experiments I ran,
- what Cursor consistently did,
- where it failed,
- and what that reveals about its internal context hierarchy.
No insider access. No leaked prompts. Just behavioral evidence you can reproduce.
Why This Is Worth Understanding
Cursor is not just an LLM in an editor.
It is a context orchestration system layered on top of a language model. Every response you get depends less on “model intelligence” and more on:
- which files Cursor includes,
- how it weighs recency vs relevance,
- when it drops long documents,
- and how explicit user actions override everything else.
If you don’t understand these mechanics, failures feel random. If you do, Cursor becomes far more predictable — and far more powerful.
Important Caveats
Before we begin, a few constraints:
- Cursor can hallucinate.
- Responses are non-deterministic across runs.
- This is behavioral reverse-engineering, not architectural truth.
- Findings may change as Cursor evolves.
That said, the signals below were strong, repeatable, and internally consistent across multiple trials.
Methodology
Cursor does not expose its prompt structure or internal tools. So the only viable approach is black-box experimentation.
The method was deliberately simple:
- Use one realistic use case
- Introduce controlled contradictions
- Vary cursor position, file recency, and selection
- Ask the same questions again
- Observe when Cursor changes its conclusions
- Use canary strings to detect context loss
No trick prompts. No leading language.
The Use Case
I created a small Python repository simulating a real production scenario: debugging a suspected logic bug.
The repo contained:
risk_score_main.py— the active filerisk_score_reference.py— the intended implementationrisk_score_decoy.py— misleading experimental logiccursor_rules.md— project-level intentREADME_long.md— a very long document with a hidden canary string
The task: determine whether the risk classification logic is correct.
Phase 1: Cursor Locality Dominates
Prompt:
“Is the risk classification logic correct?”
Observation: Cursor immediately flagged the logic as inverted. Its reasoning relied heavily on:
- code near the cursor
- comments in the active file
Inference: Cursor strongly prioritizes cursor-local context, and comments are treated as first-class signals early on.
Phase 2: Semantic Retrieval Appears
Prompt:
“Compare this logic with the intended approach in this project.”
Observation: Cursor referenced:
risk_score_reference.pycursor_rules.md
even though neither had been explicitly opened at this point.
Inference: Cursor performs semantic (embedding-based) retrieval across the repository. However, this retrieval is soft and easily overridden.
Phase 3: Recency Beats Relevance
I opened risk_score_reference.py, scrolled once, closed it, and re-asked the Phase 1 question.
Observation: Cursor reversed its earlier conclusion:
“The logic is actually correct. The bug comment appears outdated.”
Inference: Recently opened files heavily outweigh semantic relevance. Cursor does not reconcile contradictions — it re-anchors to the most recent context.
Phase 4: Decoy Resistance
After briefly opening risk_score_decoy.py:
Prompt:
“Which risk scoring logic is this function based on?”
Observation: Cursor correctly ignored the decoy and identified the production logic.
Inference: Cursor has strong file-level isolation. It does not blindly merge same-named functions across files — a well-designed safeguard.
Phase 5: Long Context Compression (Canary Test)
I buried a single unique line deep inside a very long README:
CTX_CANARY_DO_NOT_FORGET
Prompt:
“Does the documentation mention CTX_CANARY_DO_NOT_FORGET?”
Observation: Cursor recalled it correctly.
Inference: Long files are chunked and keyword-indexed, not fully dropped. High-level recall survives; fine-grained nuance likely does not.
Phase 6: Selection Override (The Key Finding)
I selected only the following code:
if score > 60:
return "LOW_RISK"
else:
return "HIGH_RISK"
Prompt:
“Explain ONLY the selected logic.”
Observation: Cursor concluded:
“This logic is inverted relative to intended behavior.”
This directly contradicted its conclusions from earlier phases.
The Critical Insight
This was not inconsistency. It was architecture.
Explicit selection causes a hard context reset.
When code is selected, Cursor:
- ignores project rules,
- ignores reference files,
- ignores prior reasoning,
- ignores inferred intent.
It switches to literal, local, syntax-only analysis.
This explains a huge class of “Cursor forgot everything” moments.
How This Fits Cursor’s Designed Context System
Other Cursor guides mention features like:
- explicit context control via
@Files,@Code,@Docs,@Codebase - persistent project rules
- intent vs state context
Your experiment reveals how these ideas actually behave in practice:
- Selection = pure state context
- Chat without selection = intent + state
- Recent files override semantic similarity
- Rules are persistent but soft
This aligns Cursor’s documented capabilities with its observed behavior.
Reconstructed Context Priority Stack
Based purely on experimental evidence:
- Explicit selection (absolute override)
- Cursor-local window (~200–400 lines)
- Recently opened files (very strong)
- Active file (compressed)
- Semantic retrieval (embeddings)
- Project rules / intent (soft)
- Long docs (chunked, keyword-indexed)
- Irrelevant files (dropped)
Practical Takeaways
If you want intent-aware reasoning:
- Avoid selecting code
- Place cursor near relevant logic
- Reference intent explicitly
If you select code:
- Cursor behaves like a static analyzer
- Architecture and intent are ignored
Comments are dangerous:
- Cursor trusts comments early
- Outdated comments actively mislead
- Contradictory comments cause flip-flops
Why Cursor Sometimes “Feels Random”
Most frustration comes from unintentionally switching Cursor between three modes:
- Intent-aware reasoning (chat, no selection)
- Heuristic reasoning (comments + recent files)
- Literal analysis (selection-only)
Once you understand this, Cursor stops feeling unpredictable.
Conclusion
Cursor isn’t confused, it’s consistent within its context hierarchy.
Understanding that hierarchy is the difference between fighting the tool and working with it.
Once you internalize how context is selected, weighted, and dropped, Cursor becomes not just more reliable but genuinely impressive.
Comments
Share your thoughts and join the conversation
Loading comments...