← Back to Dashboard
F3
25.7
F3 Score (strict)
F2
27.0
F2 Score (strict)
24.4%
Recall (strict)
47.7%
Precision (strict)
23
Repos Scored
claude-haiku-4-5-20251001
Model
$
$4.94
Total Cost
19s
Avg Latency
Per-Repository Breakdown TP / FP / FN
Per-Repository Scores click headers to sort
Repository F2 Recall Precision TP FP FN
damn-vulnerable-flask-application 43.6 42.2 51.0 6 6 9
damn-vulnerable-graphql-application 9.4 8.6 16.1 3 15 32
djangoat 9.1 8.0 21.7 4 14 46
dsvpwa 29.1 26.0 55.0 8 8 24
dsvw 26.8 24.7 41.2 7 10 20
dvblab 37.6 36.4 44.3 8 10 14
dvpwa 21.7 19.7 37.7 4 8 18
extremely-vulnerable-flask-app 29.4 26.2 60.8 7 5 21
flask-xss 45.4 41.7 70.9 12 5 16
insecure-web 63.4 63.0 65.3 6 3 3
intentionally-vulnerable-python-application 65.7 61.9 86.7 4 1 3
lets-be-bad-guys 32.5 30.6 44.7 7 10 17
pygoat 7.8 6.7 24.1 5 15 65
python-insecure-app 59.0 54.2 94.4 4 0 4
pythonssti 51.9 50.0 66.7 1 1 1
threatbyte 21.8 20.8 27.2 5 13 19
vampi 43.3 43.6 43.3 6 8 7
vfapi 16.1 18.5 10.6 2 15 7
vulnerable-api 62.8 59.5 80.6 8 2 6
vulnerable-flask-app 33.0 31.7 39.9 6 10 14
vulnerable-tornado-app 31.1 28.6 48.6 4 4 10
vulnpy 49.0 44.4 83.3 35 7 43
vulpy 26.4 23.5 53.2 13 11 41
Detection by Severity
critical
36%
TP 28 / FP 1 / FN 50
high
28%
TP 59 / FP 2 / FN 154
medium
22%
TP 55 / FP 0 / FN 200
low
10%
TP 6 / FP 0 / FN 53
LLM Operational Metrics
Model & Prompt
Modelclaude-haiku-4-5-20251001
Prompt Versionsha256:828b00245b42
Prompt Labeldefault-v1
Token Usage avg per run
Input54,965
Output3,312
Total58,278
Cost
Total$4.94
Per Repo$0.07
Per 100 LOC$0.0250
Reliability
Success Rate100%
Timeouts0
JSON Repair Rate0%
Avg Latency19.0s
CWE Family Heatmap recall by repository
Repository Broken Access Co.. Code Injection /.. Command / OS Inj.. Denial of Service Hardcoded Creden.. HTTP Header Inje.. Insecure Deseria.. Missing Authenti.. Open Redirect Other Path Traversal Security Misconf.. Sensitive Data E.. SQL Injection Server-Side Requ.. XPath Injection Cross-Site Scrip.. XML External Ent..
damn-vulnerable-flask-application 0% 0% 100% 0% 33% 0% 25% 0%
damn-vulnerable-graphql-application 0% 0% 0% 0% 0% 18% 100% 0% 0% 0% 0% 0%
djangoat 0% 0% 0% 17% 0% 0% 0% 0% 50% 0% 0% 0% 0%
dsvpwa 50% 50% 0% 0% 0% 0% 0% 33% 0% 50% 0% 33%
dsvw 100% 0% 0% 0% 100% 0% 0% 0% 0% 0% 0% 67% 0% 0% 50% 0%
dvblab 50% 75% 0% 0% 38% 0% 0% 75%
dvpwa 67% 11% 33% 0% 0% 0%
extremely-vulnerable-flask-app 50% 0% 33% 0% 0% 0% 0% 0% 100% 100% 0%
flask-xss 0% 0% 0% 100% 38% 0% 0% 0% 67%
insecure-web 100% 100% 0% 0% 100% 100%
intentionally-vulnerable-python-application 100% 100% 100% 50% 100% 0% 0%
lets-be-bad-guys 0% 0% 0% 0% 0% 43% 0% 0% 0% 0% 50%
pygoat 0% 0% 0% 11% 0% 0% 4% 0% 0% 20% 0% 0% 43% 0%
python-insecure-app 100% 100% 0% 0% 0% 50%
pythonssti 100% 0%
threatbyte 100% 50% 0% 0% 0% 50% 0% 0% 0% 33%
vampi 50% 0% 0% 0% 60% 100% 0%
vfapi 0% 0% 0% 0%
vulnerable-api 100% 100% 0% 100% 50% 0% 100% 50% 100%
vulnerable-flask-app 50% 25% 0% 29% 0% 0% 0% 0% 0%
vulnerable-tornado-app 100% 100% 0% 0% 50% 100% 0% 100% 0%
vulnpy 100% 67% 0% 100% 12% 88% 50% 100% 0% 100% 75% 100%
vulpy 0% 0% 25% 0% 4% 50% 0% 0% 83% 50%
CWE Family Detection aggregate