← Back to Dashboard
F3
46.6
F3 Score (strict)
F2
48.3
F2 Score (strict)
45.0%
Recall (strict)
68.3%
Precision (strict)
24
Repos Scored
kimi-k2.5
Model
$
$2.17
Total Cost
140s
Avg Latency
Per-Repository Breakdown TP / FP / FN
Per-Repository Scores click headers to sort
Repository F2 Recall Precision TP FP FN
damn-vulnerable-flask-application 50.7 46.7 77.8 7 2 8
damn-vulnerable-graphql-application 39.7 37.1 55.1 13 11 22
djangoat 35.1 32.0 60.9 16 12 34
dsvpwa 56.1 52.1 81.1 17 4 15
dsvw 65.5 61.7 86.7 17 3 10
dvblab 60.1 57.6 75.2 13 5 9
dvpwa 51.3 48.5 70.2 11 5 11
extremely-vulnerable-flask-app 47.0 42.9 81.6 12 3 16
flask-xss 44.2 40.5 74.5 11 5 17
insecure-web 65.1 66.7 64.1 6 4 3
intentionally-vulnerable-python-application 73.2 71.4 85.7 5 1 2
lets-be-bad-guys 59.3 57.0 71.9 14 6 10
pygoat 37.3 34.8 57.2 24 21 46
python-app 57.8 58.3 55.9 12 9 8
python-insecure-app 51.8 50.0 62.4 4 3 4
pythonssti 55.6 50.0 100.0 1 0 1
threatbyte 46.1 44.5 54.4 11 9 13
vampi 69.8 71.8 70.4 9 5 4
vfapi 79.2 96.3 46.7 9 10 0
vulnerable-api 55.3 52.4 74.6 7 3 7
vulnerable-flask-app 61.8 60.0 70.8 12 5 8
vulnerable-tornado-app 54.9 52.4 70.9 7 3 7
vulnpy 71.6 68.0 91.4 53 5 25
vulpy 26.8 23.5 65.6 13 7 41
Detection by Severity
critical
83%
TP 68 / FP 0 / FN 14
high
56%
TP 123 / FP 3 / FN 98
medium
35%
TP 92 / FP 2 / FN 170
low
28%
TP 17 / FP 0 / FN 43
LLM Operational Metrics
Model & Prompt
Modelkimi-k2.5
Prompt Versionsha256:828b00245b42
Prompt Labeldefault-v1
Token Usage avg per run
Input20,086
Output5,029
Total131,193
Cost
Total$2.17
Per Repo$0.03
Per 100 LOC$0.0108
Reliability
Success Rate100%
Timeouts0
JSON Repair Rate0%
Avg Latency140.0s
CWE Family Heatmap recall by repository
Repository Broken Access Co.. Code Injection /.. Command / OS Inj.. Denial of Service Hardcoded Creden.. HTTP Header Inje.. Insecure Deseria.. Missing Authenti.. Open Redirect Other Path Traversal Security Misconf.. Sensitive Data E.. SQL Injection Server-Side Requ.. XPath Injection Cross-Site Scrip.. XML External Ent..
damn-vulnerable-flask-application 100% 100% 100% 0% 67% 0% 50% 100%
damn-vulnerable-graphql-application 100% 100% 0% 100% 17% 18% 100% 0% 20% 100% 100% 100%
djangoat 0% 100% 100% 50% 100% 29% 0% 23% 50% 0% 25% 100% 14%
dsvpwa 100% 0% 100% 100% 50% 40% 50% 0% 0% 100% 100% 67%
dsvw 100% 100% 100% 100% 0% 100% 100% 0% 100% 50% 50% 100% 100% 100% 50% 100%
dvblab 100% 50% 100% 0% 25% 100% 0% 100%
dvpwa 67% 22% 67% 100% 100% 60%
extremely-vulnerable-flask-app 50% 0% 33% 100% 100% 17% 0% 0% 100% 100% 20%
flask-xss 0% 50% 0% 100% 12% 0% 33% 0% 89%
insecure-web 100% 100% 0% 100% 100% 100%
intentionally-vulnerable-python-application 100% 100% 100% 0% 100% 0% 0%
lets-be-bad-guys 100% 33% 100% 100% 100% 43% 100% 0% 0% 0% 100%
pygoat 40% 75% 67% 44% 67% 25% 9% 100% 0% 20% 100% 100% 0% 100%
python-app 100% 100% 100% 100% 33% 100% 0% 100% 0% 100%
python-insecure-app 100% 100% 0% 0% 0% 50%
pythonssti 100% 0%
threatbyte 0% 50% 100% 33% 100% 50% 0% 100% 100% 67%
vampi 100% 0% 0% 50% 100% 100% 100%
vfapi 100% 100% 100% 100%
vulnerable-api 0% 100% 0% 33% 50% 0% 100% 50% 100%
vulnerable-flask-app 50% 25% 100% 57% 0% 75% 100% 0% 0%
vulnerable-tornado-app 100% 100% 0% 20% 100% 0% 0% 100% 100%
vulnpy 100% 67% 6% 100% 100% 100% 100% 100% 69% 100% 92% 100%
vulpy 0% 0% 25% 50% 0% 0% 0% 0% 83% 50%
CWE Family Detection aggregate