
Testing Anthropic's Claude Fable 5 for Automated Detection Rule Generation
On June 14, Anthropic’s public reporting said it was expanding its AI cybersecurity models with Claude Fable 5 and Mythos 5. The headline is worth a look, but the narrower question for defenders is the one that matters: can a model turn messy incident notes into detection content that survives linting, replay, and analyst review?
That is the test I care about. Not whether the model sounds smart, and not whether it can write a convincing paragraph about threat hunting. The bar is much higher: can it draft a rule that maps to real telemetry, avoid inventing fields, keep false positives under control, and give a human enough context to ship it safely?
What Anthropic’s June 14 announcement changes for detection engineering
What the reporting actually says about Claude Fable 5 and Mythos 5
The source reporting is sparse, and that matters. It says Anthropic expanded its AI cybersecurity models with Claude Fable 5 and Mythos 5. It does not include a full benchmark suite, a public red-team review, or a detailed breakdown of where each model works best.
So treat the announcement as a signal, not proof. A model family can be positioned for cybersecurity and still miss the practical test that detection teams run every day: turning noisy observations into a deployable rule.
For detection engineering, the only output that matters is something you can validate against:
- known-good logs
- known-bad logs
- backend schema
- your SIEM’s syntax
- your analysts’ patience
Why model announcements matter only if the output survives validation
I have seen too many AI-generated detections that look fine on the page and fail everywhere that counts.
The usual failure chain looks like this:
- The model invents a field name that does not exist in your source.
- The rule passes casual review because the logic sounds right.
- The query compiles nowhere.
- Someone “fixes” it by weakening the selectors.
- The rule now alerts on half the environment.
That is why model announcements only matter if the output survives validation. A good model can shorten the first draft. It cannot remove the need for a compiler, a replay harness, and a human who knows which telemetry sources are real.
Test setup: defining the rule-generation problem before touching the model
Choosing a target format: Sigma, Splunk SPL, KQL, or plain logic
I usually start by forcing the model to draft in a format that is easy to inspect and easy to translate. For a cross-platform workflow, Sigma is usually the best first stop because it keeps the detection logic readable and fairly portable.
Here is how I think about the options:
| Format | Best use | Risk |
|---|---|---|
| Sigma | Portable draft detection logic | Backend field mappings can still break it |
| Splunk SPL | Direct use in Splunk-heavy shops | Easy to overfit to Splunk-specific field names |
| KQL | Microsoft-centric telemetry and hunting | Query looks good even when normalization is weak |
| Plain logic | Early-stage reasoning and analyst discussion | Not deployable until translated |
My rule is simple: ask the model for one canonical format, then translate only after the logic has been reviewed.
If you ask for three backends at once, the model often smooths away details that should stay visible. That makes validation harder later.
Building a safe benchmark corpus with benign, noisy, and adversarial examples
A model for detection rule generation should not be judged only on obvious attack patterns. That test is too easy. I prefer a small corpus with three classes:
- benign examples: normal admin behavior, software deployment, scheduled maintenance
- noisy examples: legitimate but unusual actions that often trigger detections
- adversarial examples: the behaviors the rule is supposed to catch
The key is to keep the examples safe and scoped. You do not need live payloads or destructive commands. You need enough structure to test whether the model understands the pattern.
A practical benchmark file might include:
| Case type | Example behavior | Expected outcome |
|---|---|---|
| Benign | An endpoint management tool launches a shell for updates | No alert or a filtered alert |
| Noisy | An admin runs a script with long command-line arguments | Alert only if other suspicious fields line up |
| Adversarial | A user shell spawns a second shell with encoded arguments | Alert with medium or high confidence |
The corpus should also include edge cases:
- alternate parent processes
- renamed binaries
- different field naming conventions
- missing command-line fields
- null values
- partially normalized events
If the model’s rule only works when every field is perfect, it is not a production rule. It is a lab demo.
Establishing scoring criteria for precision, recall, and analyst effort
A lot of teams score detections too loosely. “It found the thing” is not enough.
I like to score rule candidates on three axes:
| Metric | What it means | What failure looks like |
|---|---|---|
| Precision | How many alerts are worth analyst time | Alert flood from normal admin work |
| Recall | How many relevant cases the rule catches | Obvious variants slip through |
| Analyst effort | How much manual cleanup is required | Every alert needs a paragraph of explanation |
That third metric is the one people forget. A rule can have decent recall and still be useless if every match forces an analyst to reconstruct the context from scratch.
For AI-generated content, I also add a fourth check: telemetry confidence. If the model cannot explain which log source it expects, I downgrade the result immediately.
Prompt design for automated detection rule generation
Asking for structured output instead of free-form advice
The first prompt mistake is obvious once you see it: people ask the model to suggest detections and then wonder why they got a vague checklist.
I prefer a structured prompt that forces the model to output fields I can validate. Something like this:
You are writing a detection rule draft for a SIEM.
Return:
1. rule_title
2. target_format
3. detection_logic
4. required_fields
5. assumptions
6. exclusions
7. response_notes
Rules:
- Do not invent telemetry sources.
- If a required field is unknown, say so.
- Keep the logic specific enough to compile.
- Separate detection logic from response guidance.
That structure helps in two ways. First, it keeps the model from drifting into generic advice. Second, it makes the failure modes visible. If it cannot name the required fields, you know the draft is still too hand-wavy.
Forcing the model to name assumptions, fields, and required telemetry
This is the most useful constraint I add.
A rule is only as good as the telemetry underneath it. If the model says look for suspicious shell execution but never specifies whether it needs process creation logs, command-line logging, or parent-child process relationships, then you do not have a rule. You have a guess.
I want the model to answer these questions directly:
- Which event source does this rely on?
- Which fields must be present?
- Which fields are optional?
- What normalization assumptions are being made?
- What should happen when a field is missing?
That last question matters a lot. Missing fields are common in real environments. A useful draft should degrade gracefully instead of quietly becoming meaningless.
Separating detection logic from response guidance
Detection content and response content are not the same thing.
The model might be good at writing:
- a selector for suspicious command-line behavior
- a threshold for repeated failures
- a correlation window across events
It might also write response notes like:
- isolate the host
- check for persistence
- review adjacent authentication events
Those are useful, but they should not be mixed into the logic itself. I usually keep them in separate sections so a reviewer can change one without accidentally changing the other.
That separation also cuts down on prompt sludge. If every prompt asks for both the rule and the incident response plan, the model starts blending them together and the detection gets weaker.
First-pass generation: what the model should be good at
Repeated patterns and obvious abuse chains
If Claude Fable 5 or Mythos 5 is going to be useful for detection generation, the first thing it should handle well is pattern compression.
That means it should recognize recurring structures like:
- shell spawned from an office app
- script interpreter launched with suspicious arguments
- repeated authentication failures followed by a successful login
- unusual privilege escalation after lateral movement indicators
These are not clever detections. They are the bread and butter of triage. A model should be able to draft them quickly without overcomplicating the logic.
The good sign is not novelty. It is consistency.
Mapping natural-language tactics to concrete field conditions
This is where a model can save time if it behaves.
Given a sentence like “detect suspicious PowerShell usage,” a good draft should translate that into field conditions such as:
- process image or executable path
- command-line content
- parent process
- user context
- maybe child process behavior if available
A weak draft stays at the tactic level:
- “look for malicious PowerShell”
- “flag encoded commands”
- “watch for suspicious execution”
That sounds fine until you try to build a real query. Then you realize the model never committed to a field or a log source.
Producing draft rules that are readable by humans first
I want the first-pass output to be understandable without a decoder ring.
A strong draft should read like a senior analyst wrote it in a hurry, not like a marketing copy generator was asked to improvise telemetry. The human review stage still needs to answer:
- what is the trigger?
- why is it suspicious?
- what is excluded?
- how noisy will this be?
If the rule is readable, the rest of the workflow gets easier. If it is not readable, every downstream step turns into a translation exercise.
Validation workflow: how to test whether a rule is actually useful
Static checks against schema, required fields, and syntax
Before I replay anything, I run static validation.
At minimum, the candidate rule should pass:
sigma validate rule.yml
sigma convert -t splunk rule.yml
sigma convert -t kql rule.yml
The exact tool names are less important than the workflow. The draft has to satisfy a schema and compile into the target backend without manual repair.
I also check for missing required fields in the rule itself:
- title
- status
- log source
- detection block
- false positive notes
- level
- tags or metadata required by your pipeline
If a model cannot produce a rule that satisfies those constraints, it is not ready for unattended generation.
Replay testing against historical logs and known benign traffic
Static validation only proves the rule is syntactically valid. It does not prove it is useful.
I replay the candidate against:
- historical incident data
- a small set of confirmed benign logs
- a small set of known relevant events
That gives you an early read on precision and recall.
A simple evaluation loop looks like this:
for each candidate rule:
run static validation
replay against benign corpus
replay against labeled incident corpus
count true positives, false positives, and misses
record analyst review notes
What you want to see is not perfection. You want a rule that is directionally correct and cheap to refine.
Finding overbroad matches, brittle selectors, and missing context
Most AI-generated rules fail in one of three ways:
-
Overbroad matches
The selector is so generic that it catches normal admin traffic. -
Brittle selectors
The rule depends on one exact binary path or one exact string fragment. -
Missing context
The rule finds suspicious events, but not enough supporting data to make a triage decision.
A useful replay test should show you which of those problems you have.
If a rule fires on every software deployment, it is probably too broad. If it only fires when the executable name matches one exact path, it is brittle. If the alert has no parent process, user, or host context, analysts will spend too much time chasing it.
Tuning the output without turning it into prompt sludge
Tightening match conditions with environment-specific fields
This is where many teams make the prompt worse instead of the rule better.
If the model returns something too generic, the instinct is to pile on more instructions. That often produces prompt sludge. The better move is to give the model environment-specific telemetry facts:
- exact field names from your SIEM
- which sources are normalized
- which logs are missing on some hosts
- which processes or tools are common in your environment
That lets the model tighten the logic without guessing.
For example, “watch for PowerShell” is weak. “Use process creation logs, include parent process, and exclude the signed endpoint management tool we know launches scripts during patching” is much more useful.
Adding exclusions, thresholds, and correlation windows
A lot of useful detections are not single-event rules. They need thresholds or correlations.
Examples:
- multiple failed logins within a short window
- repeated script launches from the same host
- suspicious process creation followed by network activity
- a parent-child chain that only matters if it repeats across accounts
The model should be asked to justify each threshold, not just invent one. If it suggests three attempts in five minutes, I want to know whether that comes from source behavior, known service noise, or a guess.
The same goes for exclusions. Good exclusions are narrow and documented. Bad exclusions quietly carve out the interesting cases.
Deciding when the model should stop guessing and ask for more data
This is one of the best signs of maturity.
A model that can say “I need the process creation source and the command-line field names before I can draft this safely” is more useful than a model that confidently invents a query.
I treat that as a feature, not a failure.
If the model keeps guessing field names, it is safer to stop and provide the telemetry schema than to let it fabricate a working-looking rule.
That restraint matters in security work. A wrong answer with high confidence is worse than a partial answer that admits uncertainty.
Common failure modes when using a model for detection content
Hallucinated fields, nonexistent event sources, and fake certainty
This is the classic LLM failure, and it is especially dangerous in detection engineering.
The model may produce field names that look standard when they are not. It may reference a log source your environment does not collect. It may imply that a backend supports a function it does not.
The fake certainty is the real problem. The draft looks authoritative, so reviewers waste time debugging the wrong layer.
Defensive response:
- require schema-aware prompts
- lint against known field dictionaries
- reject rules that mention unsupported sources
- force the model to list assumptions explicitly
Rules that look clever but cannot be deployed
Some AI-generated detections are technically interesting and operationally useless.
Common reasons:
- they depend on a field you do not normalize
- they combine too many conditions and never fire
- they require correlation data you do not retain
- they assume full process ancestry where only partial ancestry exists
These are the clever but dead rules. They usually mean the model learned detection language but not your environment.
False positives from generic behavior and low-fidelity telemetry
A rule can also be too generic because the underlying telemetry is weak.
If all you have is process name and host, the model may draft a broad rule that catches half the help desk. If you add command-line logging, parent process, and user context, the same threat pattern becomes much easier to isolate.
This is why I always separate model quality from telemetry quality. A bad model can make good telemetry look messy. A weak log source can make a good model look bad.
How to integrate AI-generated rules into a defensive workflow
Analyst review gates and change control
AI-generated detection content should go through the same review discipline as any other production rule.
My minimum gates are:
- source prompt is saved
- draft is reviewed by an analyst
- syntax validation passes
- replay test results are recorded
- approval is captured before deployment
That creates a paper trail and keeps helpful edits from bypassing review.
Versioning, comments, and traceability back to the source prompt
If a model helped draft the rule, I want that traceable.
I usually store:
- the prompt summary
- the generated draft
- the final edited rule
- a short note on what was changed and why
That helps later when the rule drifts or starts generating noise. You can see whether the issue came from the model, the environment, or the manual edits.
Versioning also helps when the same pattern shows up with a new log source. You can reuse the reasoning without blindly copying the old output.
Safe rollout patterns in a SIEM or detection-as-code pipeline
A sensible rollout is staged:
- start in monitor-only mode
- ship to a low-volume environment first
- compare alert volume to baseline
- review a sample of matches
- only then promote to broader coverage
If your pipeline supports it, tag AI-generated drafts as such until they are fully reviewed. That makes it easier to track whether the generation workflow is improving or just adding noise.
What good looks like in practice
Example of a model-generated draft and the corrections it needs
A first-pass draft for suspicious shell execution might look something like this:
title: Suspicious Shell Launch
status: experimental
logsource:
category: process_creation
detection:
selection:
Image|endswith:
- '\powershell.exe'
- '\cmd.exe'
CommandLine|contains:
- '-enc'
- 'IEX'
condition: selection
falsepositives:
- Admin scripts
level: medium
This is not terrible as a sketch. It is also not ready.
The usual corrections are:
- add the alternate shell binary you actually see in your environment
- narrow the command-line checks to reduce accidental matches
- exclude known software deployment tools
- require a parent process or user context if available
- document the telemetry assumptions
Example of a tuned rule that becomes deployable
After tuning, the same logic may look more like this:
title: Suspicious Shell Execution with Encoded Arguments
status: test
logsource:
category: process_creation
detection:
selection:
Image|endswith:
- '\powershell.exe'
- '\pwsh.exe'
CommandLine|contains:
- ' -enc '
- 'FromBase64String'
filter_admin_tools:
ParentImage|endswith:
- '\your-endpoint-tool.exe'
- '\your-deployment-agent.exe'
condition: selection and not filter_admin_tools
falsepositives:
- Scripted admin activity
- Endpoint management actions
level: high
The point is not that this is the perfect rule. The point is that it is more honest about assumptions, easier to review, and less likely to flood the queue.
Defensive takeaways from the Claude Fable 5 test
Where automation helps and where human review stays mandatory
If Anthropic’s Claude Fable 5 and Mythos 5 are genuinely useful for security work, I expect them to help most with the first draft:
- mapping a tactic to likely fields
- proposing a candidate Sigma rule
- listing exclusions and assumptions
- suggesting response notes
That saves analyst time.
But human review stays mandatory anywhere the model can fail quietly:
- field mapping
- backend compatibility
- false positive tuning
- deployment scope
- incident severity
The best outcome is not the model writes the rules. The best outcome is the model reduces the blank page problem.
What to measure after deployment so drift does not go unnoticed
A detection rule should not be treated as finished once it ships.
After deployment, I watch:
- alert volume over time
- percent of alerts closed as benign
- time to triage
- percentage of alerts missing context
- how often the rule needs backend-specific fixes
- whether a new software rollout changes the noise profile
That is where drift shows up. A rule that was clean in week one can become useless after a logging change or a new admin tool rollout.
If the model helped draft the rule, those metrics also tell you whether the drafting workflow is learning anything. If every AI-generated rule needs the same kind of cleanup, the prompt is not improving.
Conclusion: using the model as a drafting aid, not an oracle
The June 14 announcement is interesting because it suggests Anthropic is treating cybersecurity as a first-class use case for its newer models. That may matter for detection engineering, but only if the model can do the boring work: structured drafts, honest assumptions, and rules that survive validation.
For me, the right test is simple:
- can it produce a rule in a real format?
- can it name the telemetry it needs?
- can it avoid invented fields?
- can it pass static checks?
- can it replay cleanly against real logs?
- can an analyst understand why it exists?
If the answer is yes, the model is useful. If the answer is no, it is just another confident generator of plausible security text.
Short checklist for teams evaluating AI rule generation
- Pick one canonical output format first.
- Build a small corpus of benign, noisy, and relevant examples.
- Force the model to name assumptions and required fields.
- Run static validation before any replay testing.
- Compare alert volume against historical baselines.
- Require analyst review and versioned traceability.
- Measure drift after deployment, not just draft quality.
That workflow is the real evaluation. Everything else is branding.


