← Back to Dashboard
F3
28.4
F3 Score (strict)
F2
30.7
F2 Score (strict)
26.3%
Recall (strict)
92.7%
Precision (strict)
24
Repos Scored
grok-4.20-reasoning-latest
Model
$
$16.82
Total Cost
34s
Avg Latency
Per-Repository Breakdown TP / FP / FN
Per-Repository Scores click headers to sort
Repository F2 Recall Precision TP FP FN
damn-vulnerable-flask-application 40.6 35.6 93.3 5 0 10
damn-vulnerable-graphql-application 22.6 19.1 91.1 7 1 28
djangoat 19.2 16.0 100.0 8 0 42
dsvpwa 69.1 65.6 87.5 21 3 11
dsvw 46.2 40.7 100.0 11 0 16
dvblab 46.1 40.9 97.2 9 0 13
dvpwa 26.9 22.7 100.0 5 0 17
extremely-vulnerable-flask-app 42.9 38.1 90.1 11 1 17
flask-xss 21.3 17.9 100.0 5 0 23
insecure-web 61.0 55.6 100.0 5 0 4
intentionally-vulnerable-python-application 56.0 52.4 78.3 4 1 3
lets-be-bad-guys 27.9 23.6 100.0 6 0 18
pygoat 11.5 9.5 100.0 7 0 63
python-app 40.0 35.0 94.4 7 0 13
python-insecure-app 46.6 41.7 100.0 3 0 5
pythonssti 55.6 50.0 100.0 1 0 1
threatbyte 22.9 19.4 82.2 5 1 19
vampi 43.5 38.5 94.4 5 0 8
vfapi 60.5 55.6 94.4 5 0 4
vulnerable-api 45.9 40.5 100.0 6 0 8
vulnerable-flask-app 29.2 25.0 88.9 5 1 15
vulnerable-tornado-app 43.5 38.1 100.0 5 0 9
vulnpy 35.2 34.2 94.2 27 5 51
vulpy 10.5 8.6 86.1 5 1 49
Detection by Severity
critical
56%
TP 46 / FP 0 / FN 36
high
31%
TP 68 / FP 0 / FN 153
medium
13%
TP 34 / FP 0 / FN 228
low
3%
TP 2 / FP 0 / FN 58
LLM Operational Metrics
Model & Prompt
Modelxai/grok-4.20-reasoning-latest
Prompt Versionsha256:828b00245b42
Prompt Labeldefault-v1
Token Usage avg per run
Input110,646
Output2,042
Total112,688
Cost
Total$16.82
Per Repo$0.23
Per 100 LOC$0.0838
Reliability
Success Rate100%
Timeouts0
JSON Repair Rate0%
Avg Latency34.5s
CWE Family Heatmap recall by repository
Repository Broken Access Co.. Code Injection /.. Command / OS Inj.. Denial of Service Hardcoded Creden.. HTTP Header Inje.. Insecure Deseria.. Missing Authenti.. Open Redirect Other Path Traversal Security Misconf.. Sensitive Data E.. SQL Injection Server-Side Requ.. XPath Injection Cross-Site Scrip.. XML External Ent..
damn-vulnerable-flask-application 100% 100% 100% 0% 67% 0% 0% 100%
damn-vulnerable-graphql-application 0% 67% 0% 0% 0% 18% 100% 0% 0% 100% 100% 0%
djangoat 0% 0% 100% 17% 100% 0% 100% 0% 50% 50% 0% 100% 14%
dsvpwa 100% 50% 100% 100% 100% 50% 100% 67% 0% 100% 100% 67%
dsvw 100% 100% 0% 0% 0% 100% 0% 0% 100% 0% 0% 100% 0% 100% 25% 100%
dvblab 0% 25% 100% 0% 25% 0% 0% 100%
dvpwa 33% 11% 67% 0% 100% 0%
extremely-vulnerable-flask-app 100% 0% 67% 100% 100% 33% 0% 0% 100% 100% 0%
flask-xss 0% 50% 0% 100% 12% 0% 33% 0% 22%
insecure-web 100% 100% 0% 0% 100% 100%
intentionally-vulnerable-python-application 0% 100% 100% 0% 100% 0% 0%
lets-be-bad-guys 0% 33% 100% 100% 100% 14% 100% 0% 0% 0% 0%
pygoat 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 50% 0% 0% 0%
python-app 100% 50% 100% 0% 17% 50% 0% 100% 0% 100%
python-insecure-app 50% 100% 0% 0% 0% 50%
pythonssti 100% 0%
threatbyte 0% 50% 0% 11% 0% 50% 0% 100% 100% 0%
vampi 50% 0% 0% 0% 40% 0% 100%
vfapi 0% 0% 0% 100%
vulnerable-api 0% 100% 0% 67% 50% 0% 100% 50% 0%
vulnerable-flask-app 50% 25% 100% 14% 0% 0% 100% 0% 0%
vulnerable-tornado-app 100% 100% 0% 0% 100% 0% 0% 100% 100%
vulnpy 0% 67% 0% 25% 0% 12% 0% 33% 0% 0% 8% 0%
vulpy 0% 0% 0% 0% 0% 0% 0% 0% 50% 0%
CWE Family Detection aggregate