Context Engineering with Anthropic: From Skiing Accidents to Reliable AI

Aug 25, 2025

Recently I watched Anthropic's “Prompting 101” session, and it was one of the clearest illustrations of why large language models need more than clever wording. They need disciplined context engineering.

The team showed a scenario based on a real customer: a Swedish insurance company that wants to automate accident claim reviews. The system gets two inputs: a filled-out Swedish car accident report form (with 17 standardized checkboxes) and a rough, hand-drawn sketch of the collision. The model’s task is to determine what happened and who might be at fault.

At first, the results were almost comical. With a simple prompt, Claude confidently concluded that the case involved a skiing accident on a Swedish street. On some level it made sense: incomplete context pushed the model toward a plausible, but wildly incorrect, story.

The demo became an unfolding lesson in how to tighten prompts until they work reliably. A few of the strongest principles stood out:

Define the task context upfront. Instruct the model specifically: "You are assisting a human claims adjuster reviewing car accident forms in Swedish." This small adjustment shifts reasoning away from distractions like skiing.
Set tone and confidence rules. The model must remain factual. If a checkbox is unclear or the sketch illegible, it should say so. Better to admit uncertainty than generate fiction.
Provide invariants. The car accident form structure never changes. By embedding the form schema into the system prompt, Claude doesn't waste time guessing how to read it each time. This is ideal for prompt caching.
Use structure and delimiters. Wrapping different inputs inside XML tags gives Claude clearer reference points. For example: <formdata> or <sketchnotes> lets the model reliably separate evidence types.
Work through sequence. The order mattered. Claude was told to carefully review the form first, then analyze the sketch second. Much like a human claims adjuster, it needed stable data before decoding a messy drawing.
Reinforce guidelines at the end. Explicit reminders helped: do not invent details, refer back to the boxes when making factual claims, answer only with confidence.
Shape the output. Wrapping the final result in <final_verdict> tags made the output concise and machine-parseable, ready to drop into a claims database.

As the iterations continued, the difference was staggering. First: skiing. Second: some recognition of vehicles, but still gaps. Third version: Claude matched boxes to vehicle behaviors, compared them against the sketch, and concluded that Vehicle B was likely at fault. It cited evidence, stayed within confidence limits, and packaged the conclusion in structured XML.

That transformation is the essence of context engineering. Not magic words. Not prompt "hacks." Just systematic practice: define roles, add background knowledge, enforce structure, iterate.

Why does this matter? Because in the real world, whether you’re parsing accident reports, processing medical forms, or reviewing contracts, the cost of guessing is higher than the cost of saying "I don’t know." Enterprises will only trust AI if outputs are both accurate and explainable.

This video convinced me of one thing: the path from toy prompts to production AI runs through context. The more disciplined the structure, the more grounded the model becomes. What began as a skiing accident ended as a working prototype of an insurance claims system, with Claude acting not as a storyteller but as an assistant claims adjuster.

That’s the journey - from hallucinations to reliability - and it all comes down to how you engineer context.

Serge’s Substack

Discussion about this post