← Back to Dashboard
F3
47.7
F3 Score (strict)
F2
49.9
F2 Score (strict)
45.6%
Recall (strict)
79.0%
Precision (strict)
19
Repos Scored
claude-opus-4-6
Model
$
$22.41
Total Cost
763s
Avg Latency
Per-Repository Breakdown TP / FP / FN
Per-Repository Scores click headers to sort
Repository F2 Recall Precision TP FP FN
damn-vulnerable-flask-application 77.0 77.8 74.3 12 4 3
damn-vulnerable-graphql-application 46.8 44.3 60.8 16 10 20
djangoat 46.7 42.0 84.0 21 4 29
extremely-vulnerable-flask-app 57.3 53.6 79.0 15 4 13
flask-xss 47.6 42.9 85.7 12 2 16
insecure-web 74.5 77.8 63.6 7 4 2
intentionally-vulnerable-python-application 73.7 71.4 87.5 5 1 2
lets-be-bad-guys 78.7 75.0 98.2 18 0 6
pygoat 52.8 49.3 74.5 34 12 36
python-app 83.6 83.3 84.7 17 3 3
python-insecure-app 78.9 75.0 100.0 6 0 2
threatbyte 61.0 59.7 67.3 14 7 10
vampi 67.8 69.2 62.7 9 5 4
vfapi 85.1 88.9 72.7 8 3 1
vulnerable-api 73.2 71.4 81.2 10 2 4
vulnerable-flask-app 69.4 68.3 74.7 14 5 6
vulnerable-tornado-app 65.7 64.3 72.1 9 4 5
vulnpy 73.6 71.4 84.6 56 10 22
vulpy 53.2 48.1 91.6 26 2 28
Detection by Severity
critical
88%
TP 57 / FP 0 / FN 8
high
68%
TP 129 / FP 1 / FN 61
medium
50%
TP 110 / FP 0 / FN 109
low
41%
TP 19 / FP 0 / FN 27
LLM Operational Metrics
Model & Prompt
Modelclaude-opus-4-6
Prompt Versionsha256:828b00245b42
Prompt Labeldefault-v1
Token Usage avg per run
Input7
Output4,608
Total176,970
Cost
Total$22.41
Per Repo$0.49
Per 100 LOC$0.1228
Reliability
Success Rate64%
Timeouts26
JSON Repair Rate0%
Avg Latency763.4s
CWE Family Heatmap recall by repository
Repository Broken Access Co.. Code Injection /.. Command / OS Inj.. Denial of Service Hardcoded Creden.. HTTP Header Inje.. Insecure Deseria.. Missing Authenti.. Open Redirect Other Path Traversal Security Misconf.. Sensitive Data E.. SQL Injection Server-Side Requ.. XPath Injection Cross-Site Scrip.. XML External Ent..
damn-vulnerable-flask-application 100% 100% 100% 75% 67% 0% 75% 100%
damn-vulnerable-graphql-application 100% 100% 100% 0% 17% 18% 100% 0% 60% 100% 100% 100%
djangoat 50% 100% 100% 83% 100% 14% 100% 23% 50% 50% 25% 100% 14%
extremely-vulnerable-flask-app 100% 0% 100% 100% 100% 17% 0% 50% 100% 100% 60%
flask-xss 0% 100% 33% 100% 25% 100% 33% 0% 44%
insecure-web 100% 100% 33% 100% 100% 100%
intentionally-vulnerable-python-application 100% 100% 100% 0% 100% 0% 100%
lets-be-bad-guys 100% 100% 100% 100% 100% 57% 100% 33% 100% 0% 100%
pygoat 80% 100% 67% 78% 100% 75% 23% 100% 0% 20% 100% 100% 0% 100%
python-app 100% 100% 100% 100% 100% 100% 50% 100% 100% 0%
python-insecure-app 100% 100% 0% 100% 100% 50%
threatbyte 100% 100% 100% 22% 100% 50% 50% 100% 100% 100%
vampi 100% 0% 100% 50% 40% 100% 100%
vfapi 100% 100% 0% 100%
vulnerable-api 100% 100% 0% 100% 50% 50% 100% 50% 100%
vulnerable-flask-app 100% 75% 100% 57% 0% 50% 100% 0% 0%
vulnerable-tornado-app 100% 100% 0% 40% 100% 100% 0% 100% 100%
vulnpy 100% 100% 6% 100% 62% 88% 100% 100% 100% 100% 100% 100%
vulpy 50% 0% 88% 0% 41% 50% 50% 0% 100% 50%
CWE Family Detection aggregate