← Back to Dashboard
F3
51.7
F3 Score (strict)
F2
53.8
F2 Score (strict)
49.9%
Recall (strict)
78.5%
Precision (strict)
23
Repos Scored
claude-sonnet-4-6
Model
$
$16.60
Total Cost
367s
Avg Latency
Per-Repository Breakdown TP / FP / FN
Per-Repository Scores click headers to sort
Repository F2 Recall Precision TP FP FN
damn-vulnerable-flask-application 70.3 68.9 77.3 10 3 5
damn-vulnerable-graphql-application 42.7 39.1 67.4 14 7 21
djangoat 37.7 34.0 66.3 17 9 33
dsvpwa 58.5 55.2 77.2 18 5 14
dsvw 77.7 74.1 96.9 20 1 7
dvblab 70.1 68.2 79.3 15 4 7
dvpwa 53.4 50.0 73.3 11 4 11
extremely-vulnerable-flask-app 62.2 57.1 97.1 16 0 12
flask-xss 44.0 39.3 84.6 11 2 17
insecure-web 70.9 70.4 73.2 6 2 3
intentionally-vulnerable-python-application 60.6 57.1 80.0 4 1 3
lets-be-bad-guys 67.0 62.5 93.7 15 1 9
pygoat 49.9 46.2 74.4 32 11 38
python-app 73.9 71.7 84.3 14 3 6
pythonssti 55.6 50.0 100.0 1 0 1
threatbyte 56.3 52.8 77.5 13 4 11
vampi 82.1 84.6 73.3 11 4 2
vfapi 79.8 85.2 63.9 8 4 1
vulnerable-api 69.7 69.0 73.6 10 4 4
vulnerable-flask-app 64.4 61.7 80.5 12 3 8
vulnerable-tornado-app 58.0 57.1 61.5 8 5 6
vulnpy 71.3 67.5 92.4 53 5 25
vulpy 36.3 32.7 65.0 18 10 36
Detection by Severity
critical
93%
TP 75 / FP 0 / FN 6
high
64%
TP 140 / FP 0 / FN 79
medium
42%
TP 108 / FP 1 / FN 151
low
38%
TP 22 / FP 0 / FN 36
LLM Operational Metrics
Model & Prompt
Modelclaude-sonnet-4-6
Prompt Versionsha256:828b00245b42
Prompt Labeldefault-v1
Token Usage avg per run
Input10
Output5,709
Total232,970
Cost
Total$16.60
Per Repo$0.29
Per 100 LOC$0.0831
Reliability
Success Rate81%
Timeouts10
JSON Repair Rate0%
Avg Latency366.9s
CWE Family Heatmap recall by repository
Repository Broken Access Co.. Code Injection /.. Command / OS Inj.. Denial of Service Hardcoded Creden.. HTTP Header Inje.. Insecure Deseria.. Missing Authenti.. Open Redirect Other Path Traversal Security Misconf.. Sensitive Data E.. SQL Injection Server-Side Requ.. XPath Injection Cross-Site Scrip.. XML External Ent..
damn-vulnerable-flask-application 100% 100% 100% 100% 67% 0% 50% 100%
damn-vulnerable-graphql-application 100% 100% 0% 0% 17% 27% 100% 0% 40% 100% 100% 0%
djangoat 50% 100% 100% 67% 100% 14% 100% 8% 50% 100% 0% 100% 14%
dsvpwa 100% 100% 100% 100% 100% 20% 100% 33% 0% 100% 100% 67%
dsvw 100% 100% 100% 100% 100% 100% 100% 0% 100% 50% 50% 100% 100% 100% 100% 100%
dvblab 100% 75% 100% 0% 50% 0% 0% 100%
dvpwa 33% 22% 67% 100% 100% 80%
extremely-vulnerable-flask-app 100% 0% 100% 100% 100% 33% 0% 50% 100% 100% 60%
flask-xss 0% 50% 33% 100% 12% 100% 33% 0% 56%
insecure-web 100% 100% 33% 0% 100% 100%
intentionally-vulnerable-python-application 100% 100% 100% 0% 100% 0% 0%
lets-be-bad-guys 100% 67% 100% 100% 100% 57% 100% 0% 0% 0% 75%
pygoat 60% 100% 67% 89% 100% 75% 36% 100% 33% 20% 100% 100% 0% 100%
python-app 100% 100% 100% 100% 67% 100% 0% 100% 50% 100%
pythonssti 100% 0%
threatbyte 100% 50% 50% 33% 100% 50% 0% 100% 100% 100%
vampi 100% 0% 100% 100% 80% 100% 100%
vfapi 100% 100% 0% 100%
vulnerable-api 100% 100% 0% 100% 50% 50% 100% 50% 100%
vulnerable-flask-app 50% 50% 100% 71% 0% 75% 100% 0% 0%
vulnerable-tornado-app 100% 100% 0% 60% 100% 0% 0% 100% 0%
vulnpy 100% 67% 6% 100% 100% 88% 100% 100% 54% 100% 92% 100%
vulpy 0% 0% 38% 0% 14% 0% 50% 17% 100% 50%
CWE Family Detection aggregate