← Back to Dashboard
F3
51.0
F3 Score (strict)
F2
53.0
F2 Score (strict)
49.1%
Recall (strict)
77.4%
Precision (strict)
24
Repos Scored
gemini-3.1-pro-preview
Model
$
$27.24
Total Cost
170s
Avg Latency
Per-Repository Breakdown TP / FP / FN
Per-Repository Scores click headers to sort
Repository F2 Recall Precision TP FP FN
damn-vulnerable-flask-application 59.9 57.8 75.9 9 2 6
damn-vulnerable-graphql-application 42.4 39.1 65.1 14 7 21
djangoat 32.8 29.3 65.8 15 9 35
dsvpwa 58.8 55.2 79.6 18 4 14
dsvw 62.0 60.5 69.2 16 8 11
dvblab 61.7 59.1 75.0 13 4 9
dvpwa 46.7 43.9 65.0 10 5 12
extremely-vulnerable-flask-app 53.9 50.0 79.5 14 4 14
flask-xss 46.8 42.9 75.7 12 4 16
insecure-web 64.0 63.0 72.8 6 2 3
intentionally-vulnerable-python-application 73.5 71.4 83.3 5 1 2
lets-be-bad-guys 65.8 61.1 95.7 15 1 9
pygoat 44.3 40.5 72.9 28 11 42
python-app 67.9 65.0 83.2 13 3 7
python-insecure-app 41.7 37.5 75.0 3 1 5
pythonssti 53.7 50.0 83.3 1 0 1
threatbyte 44.2 41.7 59.0 10 7 14
vampi 74.6 76.9 69.3 10 5 3
vfapi 67.2 66.7 69.9 6 3 3
vulnerable-api 69.3 66.7 82.3 9 2 5
vulnerable-flask-app 58.9 55.0 82.6 11 2 9
vulnerable-tornado-app 59.3 54.8 90.0 8 1 6
vulnpy 87.4 86.3 92.8 67 6 11
vulpy 38.8 34.6 77.4 19 5 35
Detection by Severity
critical
88%
TP 72 / FP 0 / FN 10
high
61%
TP 134 / FP 0 / FN 87
medium
50%
TP 131 / FP 0 / FN 131
low
30%
TP 18 / FP 0 / FN 42
LLM Operational Metrics
Model & Prompt
Modelgemini-3.1-pro-preview
Prompt Versionsha256:828b00245b42
Prompt Labeldefault-v1
Token Usage avg per run
Input56,142
Output4,315
Total437,119
Cost
Total$27.24
Per Repo$0.38
Per 100 LOC$0.1358
Reliability
Success Rate100%
Timeouts0
JSON Repair Rate0%
Avg Latency169.8s
CWE Family Heatmap recall by repository
Repository Broken Access Co.. Code Injection /.. Command / OS Inj.. Denial of Service Hardcoded Creden.. HTTP Header Inje.. Insecure Deseria.. Missing Authenti.. Open Redirect Other Path Traversal Security Misconf.. Sensitive Data E.. SQL Injection Server-Side Requ.. XPath Injection Cross-Site Scrip.. XML External Ent..
damn-vulnerable-flask-application 100% 100% 100% 100% 100% 0% 75% 100%
damn-vulnerable-graphql-application 50% 100% 0% 0% 17% 27% 100% 0% 60% 100% 100% 50%
djangoat 0% 100% 100% 17% 100% 29% 100% 8% 100% 0% 25% 100% 14%
dsvpwa 100% 50% 100% 100% 100% 50% 100% 67% 0% 100% 100% 67%
dsvw 100% 100% 100% 100% 100% 100% 100% 0% 100% 100% 50% 100% 100% 100% 100% 100%
dvblab 100% 75% 100% 0% 38% 0% 100% 100%
dvpwa 67% 22% 100% 100% 100% 80%
extremely-vulnerable-flask-app 100% 0% 67% 100% 100% 33% 0% 0% 100% 100% 60%
flask-xss 0% 50% 33% 100% 25% 0% 33% 0% 44%
insecure-web 100% 100% 0% 0% 100% 100%
intentionally-vulnerable-python-application 100% 100% 100% 0% 100% 0% 100%
lets-be-bad-guys 100% 100% 100% 100% 100% 29% 100% 33% 0% 0% 100%
pygoat 60% 75% 33% 33% 100% 25% 18% 100% 0% 20% 100% 100% 0% 100%
python-app 100% 50% 100% 100% 50% 100% 0% 100% 50% 100%
python-insecure-app 50% 100% 0% 0% 0% 50%
pythonssti 100% 0%
threatbyte 100% 100% 50% 33% 100% 50% 0% 100% 100% 67%
vampi 100% 0% 100% 0% 100% 100% 100%
vfapi 100% 100% 100% 100%
vulnerable-api 100% 100% 0% 100% 50% 0% 100% 50% 100%
vulnerable-flask-app 100% 50% 100% 57% 0% 0% 100% 0% 100%
vulnerable-tornado-app 100% 100% 0% 20% 100% 100% 100% 100% 100%
vulnpy 100% 67% 100% 100% 100% 100% 100% 100% 38% 100% 92% 100%
vulpy 0% 0% 62% 0% 36% 100% 0% 0% 100% 50%
CWE Family Detection aggregate