← Back to Dashboard
F3
21.3
F3 Score (strict)
F2
23.3
F2 Score (strict)
19.7%
Recall (strict)
83.7%
Precision (strict)
21
Repos Scored
grok-3
Model
$
$4.90
Total Cost
34s
Avg Latency
Per-Repository Breakdown TP / FP / FN
Per-Repository Scores click headers to sort
Repository F2 Recall Precision TP FP FN
damn-vulnerable-flask-application 35.5 31.1 82.2 5 1 10
djangoat 11.3 9.3 73.8 5 2 45
dsvpwa 69.1 65.6 87.5 21 3 11
dsvw 52.5 46.9 100.0 13 0 14
dvblab 49.0 43.9 93.0 10 1 12
dvpwa 12.9 10.6 100.0 2 0 20
flask-xss 17.0 14.3 74.4 4 1 24
insecure-web 70.9 66.7 95.2 6 0 3
lets-be-bad-guys 43.8 39.6 76.0 10 3 14
pygoat 9.2 7.6 49.4 5 4 65
python-app 33.9 30.0 70.9 6 2 14
python-insecure-app 47.1 41.7 100.0 3 0 5
pythonssti 55.6 50.0 100.0 1 0 1
threatbyte 27.8 23.6 94.4 6 0 18
vampi 53.8 50.0 77.1 6 2 6
vfapi 71.3 66.7 100.0 6 0 3
vulnerable-api 47.9 42.9 91.7 6 1 8
vulnerable-flask-app 26.6 23.3 62.5 5 3 15
vulnerable-tornado-app 33.3 28.6 100.0 4 0 10
vulnpy 6.2 5.1 66.7 4 2 74
vulpy 11.2 9.3 86.7 5 1 49
Detection by Severity
critical
49%
TP 36 / FP 0 / FN 37
high
29%
TP 56 / FP 0 / FN 140
medium
16%
TP 37 / FP 0 / FN 196
low
4%
TP 2 / FP 0 / FN 51
LLM Operational Metrics
Model & Prompt
Modelxai/grok-3
Prompt Versionsha256:828b00245b42
Prompt Labeldefault-v1
Token Usage avg per run
Input15,856
Output1,369
Total17,535
Cost
Total$4.90
Per Repo$0.08
Per 100 LOC$0.0279
Reliability
Success Rate81%
Timeouts0
JSON Repair Rate0%
Avg Latency34.3s
CWE Family Heatmap recall by repository
Repository Broken Access Co.. Code Injection /.. Command / OS Inj.. Denial of Service Hardcoded Creden.. HTTP Header Inje.. Insecure Deseria.. Missing Authenti.. Open Redirect Other Path Traversal Security Misconf.. Sensitive Data E.. SQL Injection Server-Side Requ.. XPath Injection Cross-Site Scrip.. XML External Ent..
damn-vulnerable-flask-application 100% 100% 100% 0% 33% 0% 0% 100%
djangoat 0% 50% 100% 17% 0% 0% 0% 0% 0% 0% 0% 0% 14%
dsvpwa 100% 50% 100% 100% 100% 50% 100% 67% 0% 100% 100% 67%
dsvw 100% 100% 100% 100% 0% 100% 100% 0% 100% 0% 0% 67% 0% 100% 25% 100%
dvblab 0% 25% 100% 0% 25% 0% 0% 100%
dvpwa 0% 11% 0% 0% 100% 0%
flask-xss 0% 50% 0% 0% 0% 0% 0% 0% 44%
insecure-web 100% 100% 0% 100% 100% 100%
lets-be-bad-guys 100% 67% 0% 100% 100% 43% 100% 0% 0% 0% 0%
pygoat 0% 25% 33% 0% 100% 25% 0% 100% 0% 20% 50% 0% 0% 0%
python-app 100% 50% 100% 0% 0% 50% 0% 100% 0% 100%
python-insecure-app 100% 0% 0% 0% 0% 50%
pythonssti 0% 100%
threatbyte 0% 50% 50% 11% 0% 50% 0% 100% 100% 0%
vampi 100% 0% 0% 0% 60% 100% 100%
vfapi 50% 100% 0% 60%
vulnerable-api 0% 100% 0% 33% 50% 0% 100% 50% 100%
vulnerable-flask-app 0% 25% 100% 0% 0% 0% 100% 0% 0%
vulnerable-tornado-app 100% 0% 0% 0% 50% 0% 0% 100% 100%
vulnpy 0% 33% 0% 0% 0% 0% 0% 33% 0% 0% 0% 0%
vulpy 0% 0% 12% 0% 4% 0% 0% 0% 67% 0%
CWE Family Detection aggregate