← Back to Dashboard
F3
37.2
F3 Score (strict)
F2
39.4
F2 Score (strict)
35.2%
Recall (strict)
74.8%
Precision (strict)
24
Repos Scored
claude-haiku-4-5-20251001
Model
$
$5.24
Total Cost
56s
Avg Latency
Per-Repository Breakdown TP / FP / FN
Per-Repository Scores click headers to sort
Repository F2 Recall Precision TP FP FN
damn-vulnerable-flask-application 48.4 46.7 57.1 7 5 8
damn-vulnerable-graphql-application 27.9 25.7 44.7 9 11 26
djangoat 28.2 24.7 66.4 12 6 38
dsvpwa 42.6 37.5 92.7 12 1 20
dsvw 54.8 49.4 97.4 13 0 14
dvblab 55.3 53.0 67.5 12 6 10
dvpwa 36.3 31.8 84.1 7 1 15
extremely-vulnerable-flask-app 36.4 32.1 78.8 9 3 19
flask-xss 32.8 28.6 81.8 8 2 20
insecure-web 70.9 66.7 95.2 6 0 3
intentionally-vulnerable-python-application 64.2 61.9 77.1 4 1 3
lets-be-bad-guys 57.2 52.8 86.7 13 2 11
pygoat 34.7 30.9 67.5 22 10 48
python-app 48.2 45.0 68.2 9 4 11
python-insecure-app 50.6 45.8 86.7 4 1 4
pythonssti 51.9 50.0 66.7 1 1 1
threatbyte 35.4 31.9 62.7 8 5 16
vampi 55.0 53.8 61.8 7 4 6
vfapi 57.7 55.6 69.4 5 2 4
vulnerable-api 49.9 45.2 87.8 6 1 8
vulnerable-flask-app 52.0 48.3 75.9 10 3 10
vulnerable-tornado-app 53.6 50.0 76.8 7 2 7
vulnpy 50.1 47.0 88.4 37 7 41
vulpy 21.2 17.9 80.1 10 2 44
Detection by Severity
critical
76%
TP 62 / FP 0 / FN 20
high
46%
TP 102 / FP 1 / FN 119
medium
26%
TP 67 / FP 3 / FN 195
low
25%
TP 15 / FP 0 / FN 45
LLM Operational Metrics
Model & Prompt
Modelclaude-haiku-4-5-20251001
Prompt Versionsha256:828b00245b42
Prompt Labeldefault-v1
Token Usage avg per run
Input36
Output4,888
Total243,089
Cost
Total$5.24
Per Repo$0.07
Per 100 LOC$0.0261
Reliability
Success Rate100%
Timeouts0
JSON Repair Rate0%
Avg Latency55.9s
CWE Family Heatmap recall by repository
Repository Broken Access Co.. Code Injection /.. Command / OS Inj.. Denial of Service Hardcoded Creden.. HTTP Header Inje.. Insecure Deseria.. Missing Authenti.. Open Redirect Other Path Traversal Security Misconf.. Sensitive Data E.. SQL Injection Server-Side Requ.. XPath Injection Cross-Site Scrip.. XML External Ent..
damn-vulnerable-flask-application 100% 100% 100% 0% 33% 0% 0% 100%
damn-vulnerable-graphql-application 0% 67% 0% 0% 0% 9% 100% 0% 0% 100% 0% 0%
djangoat 0% 100% 100% 17% 100% 0% 100% 8% 50% 50% 0% 100% 14%
dsvpwa 100% 0% 100% 0% 50% 10% 100% 33% 0% 100% 0% 67%
dsvw 100% 100% 100% 0% 0% 100% 100% 0% 100% 0% 0% 100% 0% 100% 50% 100%
dvblab 100% 50% 100% 0% 50% 0% 0% 100%
dvpwa 67% 22% 33% 0% 100% 40%
extremely-vulnerable-flask-app 100% 0% 33% 100% 0% 17% 0% 50% 100% 100% 20%
flask-xss 100% 50% 33% 100% 12% 0% 33% 0% 22%
insecure-web 100% 100% 0% 100% 100% 100%
intentionally-vulnerable-python-application 100% 100% 100% 0% 100% 0% 0%
lets-be-bad-guys 100% 67% 100% 100% 100% 43% 100% 0% 0% 0% 0%
pygoat 0% 50% 33% 33% 67% 25% 14% 100% 0% 20% 100% 100% 0% 100%
python-app 100% 50% 100% 0% 17% 50% 0% 100% 0% 100%
python-insecure-app 100% 100% 0% 0% 0% 50%
pythonssti 100% 0%
threatbyte 100% 50% 50% 11% 0% 50% 0% 100% 100% 33%
vampi 100% 0% 0% 0% 100% 100% 100%
vfapi 0% 0% 0% 100%
vulnerable-api 0% 100% 0% 67% 50% 0% 100% 50% 100%
vulnerable-flask-app 100% 25% 100% 43% 0% 0% 100% 0% 100%
vulnerable-tornado-app 100% 100% 0% 0% 100% 0% 0% 100% 100%
vulnpy 100% 67% 19% 100% 88% 88% 100% 100% 62% 100% 75% 100%
vulpy 0% 0% 25% 0% 0% 100% 50% 0% 83% 25%
CWE Family Detection aggregate