Detection Quality Benchmark Methodology
Live results: the latest approved numbers (precision, recall, time-to-detect, per family) are published at dralvia.tech detection quality. This page explains how those numbers are produced.
Why we publish this
A detection company that does not publish its accuracy is asking you to take its word for it. We would rather show the numbers and the method behind them. This page documents exactly how Dralvia measures detection quality, so the results are reproducible and honest rather than asserted.
Principles
- Every number is regenerable. The results come from a benchmark harness anyone on our team can re-run. No hand-typed stats.
- Fresh, live ground truth only. We measure on URLs first seen in the last N hours that are still serving a real page at scan time. Naive benchmarks are inflated because a large share of feed URLs are already dead by scan time, and many of those still answer with an HTTP 200 to a parked, suspended, default server, or blank page. Counting those as missed detections would understate real quality, so we use a content-aware liveness check (not just a status code): a URL only counts if the page still serves real, non-parked content. The check uses generic deadness signals only, never our own detector, so it cannot bias the recall number.
- Methodology is published with the numbers. Honesty is the brand.
What we measure
- Precision and recall, overall and per family (phishing, malware download, wallet drainer, skimmer), on live positives and a benign control set.
- Time-to-detect: the delta between when a URL first appears in a public feed and when Dralvia reaches a verdict. This is reported as a median and a p90.
- False-positive rate on a benign control set of well-known, known-good sites, extended with rotating long-tail benign URLs.
- Median scan latency for a standard scan.
Ground truth
- Positives: fresh, labeled malicious URLs from public threat feeds (for example URLhaus), selected within a recent first-seen window and liveness-checked at scan time.
- Controls: a benign control corpus of known-good sites plus rotating long-tail benign URLs, so a signal that also fires on legitimate traffic is caught as a false positive.
How a run works
- Load fresh positives and benign controls.
- Content-aware liveness-check every URL; drop anything not serving a real page at scan time (dead host, HTTP error, parked, suspended, default server, or blank page). Each dropped URL is recorded with the reason, so the run shows exactly how much dead ground truth was excluded.
- Scan each live URL, using the full URL including its path (most malicious pages live at a path, not the bare domain), through the same scan path customers use.
- Compute precision, recall, time-to-detect, false-positive rate, and latency.
- Store the full run (inputs, verdicts, timings) so it is replayable.
Publishing
Runs are generated automatically, but a result becomes public only after human
review (the same human-approval discipline we apply before anything is published).
The latest approved numbers are served from /api/public/benchmark/summary and
surfaced on the Dralvia homepage. Until a run is approved, no accuracy figure is
shown.
Honest limits
- These are point-in-time measurements on the sampled fresh ground truth, not a guarantee of future detection on every possible threat.
- Time-to-detect depends on how quickly a URL appears in a public feed; we report the delta from feed first-seen, which is the part we can measure.