← Back to Dashboard
F3
38.1
F3 Score (strict)
F2
39.8
F2 Score (strict)
36.6%
Recall (strict)
61.8%
Precision (strict)
24
Repos Scored
Qwen3.5-397B-A17B
Model
$
$3.18
Total Cost
77s
Avg Latency
Per-Repository Breakdown TP / FP / FN
Per-Repository Scores click headers to sort
Repository F2 Recall Precision TP FP FN
damn-vulnerable-flask-application 47.4 44.4 65.8 7 3 8
damn-vulnerable-graphql-application 35.7 36.2 46.4 13 23 22
djangoat 28.0 24.7 62.5 12 8 38
dsvpwa 39.6 35.4 78.3 11 4 21
dsvw 60.1 58.0 77.9 16 6 11
dvblab 63.7 59.1 93.5 13 1 9
dvpwa 32.2 28.8 61.6 6 4 16
extremely-vulnerable-flask-app 30.1 26.2 75.9 7 2 21
flask-xss 32.8 28.6 80.9 8 2 20
insecure-web 61.0 59.3 70.8 5 2 4
intentionally-vulnerable-python-application 60.0 57.1 75.6 4 1 3
lets-be-bad-guys 46.5 44.4 57.5 11 8 13
pygoat 41.0 40.0 46.2 28 32 42
python-app 35.3 33.3 46.4 7 7 13
python-insecure-app 49.6 45.8 73.3 4 1 4
pythonssti 55.6 50.0 100.0 1 0 1
threatbyte 37.9 34.7 62.2 8 5 16
vampi 46.8 50.0 37.1 6 11 6
vfapi 64.6 74.1 45.0 7 9 2
vulnerable-api 54.4 50.0 85.8 7 1 7
vulnerable-flask-app 44.0 41.7 56.7 8 6 12
vulnerable-tornado-app 48.4 45.2 68.0 6 3 8
vulnpy 59.2 54.5 90.8 42 4 36
vulpy 20.5 17.9 50.3 10 10 44
Detection by Severity
critical
77%
TP 63 / FP 1 / FN 19
high
49%
TP 109 / FP 6 / FN 112
medium
29%
TP 77 / FP 0 / FN 185
low
7%
TP 4 / FP 0 / FN 56
LLM Operational Metrics
Model & Prompt
Modeltogether_ai/Qwen/Qwen3.5-397B-A17B
Prompt Versionsha256:828b00245b42
Prompt Labeldefault-v1
Token Usage avg per run
Input43,965
Output4,943
Total121,929
Cost
Total$3.18
Per Repo$0.05
Per 100 LOC$0.0159
Reliability
Success Rate96%
Timeouts0
JSON Repair Rate17%
Avg Latency76.7s
CWE Family Heatmap recall by repository
Repository Broken Access Co.. Code Injection /.. Command / OS Inj.. Denial of Service Hardcoded Creden.. HTTP Header Inje.. Insecure Deseria.. Missing Authenti.. Open Redirect Other Path Traversal Security Misconf.. Sensitive Data E.. SQL Injection Server-Side Requ.. XPath Injection Cross-Site Scrip.. XML External Ent..
damn-vulnerable-flask-application 100% 100% 100% 0% 33% 0% 100% 100%
damn-vulnerable-graphql-application 100% 100% 100% 0% 17% 18% 100% 0% 60% 100% 100% 0%
djangoat 0% 100% 100% 50% 100% 0% 0% 8% 50% 50% 0% 100% 14%
dsvpwa 100% 0% 100% 0% 50% 0% 50% 33% 0% 100% 100% 67%
dsvw 100% 100% 100% 0% 0% 100% 100% 0% 100% 50% 0% 100% 0% 100% 25% 100%
dvblab 100% 75% 100% 0% 38% 0% 0% 100%
dvpwa 67% 11% 67% 0% 100% 0%
extremely-vulnerable-flask-app 100% 0% 67% 100% 0% 0% 0% 0% 100% 100% 0%
flask-xss 0% 100% 33% 100% 12% 0% 33% 0% 22%
insecure-web 100% 100% 0% 0% 100% 100%
intentionally-vulnerable-python-application 100% 100% 100% 0% 100% 0% 0%
lets-be-bad-guys 100% 67% 100% 100% 100% 57% 100% 0% 0% 0% 0%
pygoat 0% 75% 67% 44% 67% 50% 23% 100% 33% 20% 100% 100% 0% 100%
python-app 100% 50% 100% 0% 17% 100% 0% 100% 0% 100%
python-insecure-app 100% 0% 0% 0% 100% 0%
pythonssti 100% 0%
threatbyte 100% 50% 50% 22% 0% 50% 0% 100% 100% 0%
vampi 100% 0% 0% 0% 60% 100% 100%
vfapi 100% 0% 100% 100%
vulnerable-api 100% 100% 0% 33% 50% 0% 100% 50% 100%
vulnerable-flask-app 50% 50% 100% 29% 0% 25% 100% 0% 100%
vulnerable-tornado-app 100% 100% 100% 0% 100% 0% 0% 100% 100%
vulnpy 100% 67% 6% 100% 12% 88% 0% 100% 38% 100% 75% 100%
vulpy 100% 0% 38% 0% 4% 0% 0% 0% 83% 25%
CWE Family Detection aggregate