Parser statistics
The email parser logs every parse attempt into ParseTrainingLog —
which parser fired, what was extracted, what was missing. From this
TravStats builds a Parser Stats overview, which makes it visible
how reliably the parser performs per airline and where a custom
template would pay off.
Where you see it
Section titled “Where you see it”Parser → Parse Logs — the tab is admin-only. The overview shows:
- Total number of parse attempts
- Overall hit rate — percentage of parses where a template (built-in or user) fired (rather than only the Ollama fallback or the generic regex extractor)
- Per-airline table with:
- Number of attempts
- Hit rate (built-in / user template match rate)
- Top-5 missing fields (what the templates fail to extract most
often — e.g.
aircraft,gate,seat)
The table is sorted descending by attempt count — airlines you’ve parsed most often appear at the top.
What the hit rate tells you
Section titled “What the hit rate tells you”| Hit rate | Meaning |
|---|---|
| 100 % | The airline has a perfectly-working built-in or user template — no action needed |
| 70–95 % | Template fires most of the time, but occasionally the airline tweaks its format. A user-template refresh or a built-in patch is due |
| 30–70 % | Template fires unreliably — either the airline has multiple email formats (old / new) and only one is covered, or the template is stale |
| 0–30 % | Practically only Ollama / regex fallback. A user template would pay off |
Unknown row | Mails where the airline detection didn’t fire at all — sender domain unknown, subject pattern doesn’t match |
Hit rate is not the parse success rate — a mail with hit rate 0 will still be parsed by Ollama or the generic regex layer and typically still lands on the review screen with correct data. Hit rate measures only “did a deterministic template fire?”.
Reading the common-missing-fields column
Section titled “Reading the common-missing-fields column”The commonMissingFields column shows the five most frequently
missing fields per airline. Example:
| Airline | Total | Hits | Hit rate | Missing |
|---|---|---|---|---|
| Lufthansa | 142 | 138 | 97 % | aircraft, gate, seat |
| Ryanair | 38 | 36 | 95 % | aircraft, seat, terminal, pnr |
| Air Baltic | 12 | 0 | 0 % | flightNumber, dep, arr, departureTime, airline |
Reads as:
- Lufthansa: template fires reliably, but the aircraft type is usually missing (LH mails rarely include it — Aviationstack fills it in during enrichment)
- Ryanair: template fires, but PNR / terminal placement varies in their mails
- Air Baltic: no template (no match), Ollama or regex fallback does the rest. If you fly Air Baltic often, a user template would pay off
Anonymised JSONL export
Section titled “Anonymised JSONL export”For maintainers / bug reports there’s a download:
Parser → Parse Logs → Export JSONL (admin only, rate-limited) or programmatically:
curl -fsS -H "Authorization: Bearer $ADMIN_TOKEN" \ https://travstats.example.com/api/v1/admin/parse-logs/export \ --output parse-logs.jsonlWhat the export contains (max 50,000 rows):
- Subject (hashed, not the original)
- Detected airline / template, hit flag
- Which fields were found / missed
- Parser confidence
- Anonymised body — detected PII (emails, IPs, JWTs, UUIDs) replaced
with markers (
<redacted:email>etc.) - Timestamp
What the export does not contain:
- Original email body
- Passenger names, PNRs, booking refs (stripped before logging)
- Personally identifying IDs
The format is useful for filing a bug report along the lines of “look, the Lufthansa template started missing fields after date X” — without sharing real booking data.
Training data (recording mode)
Section titled “Training data (recording mode)”Alongside the passive statistics there’s an active recording mode (see User templates):
- Parser → My Templates → Record New lets you upload an example
email (
.eml,.txt,.msg) or a boarding-pass image - You annotate the fields in the web view, TravStats derives a
template from it (
deriveTemplateFromAnnotation) - The derived template is stored in
ParserTemplateand used on the next match against an email from the same sender
Training uploads are stored in the trainingData DB table (20 MB
limit per file, accepted formats: .eml, .txt, .msg, .jpg,
.png, .pdf). They’re not part of the public statistics — those
show only production parse results.
Limitations
Section titled “Limitations”- No history over time — the statistics aggregate the last 10,000 logs as a rolling window. If your Lufthansa hit rate drops from 100 % to 60 %, you’ll only see it if you check the table regularly. A trend chart is on the roadmap
- User templates feed in too — if you record a user template
for
KL, KL appears in the table afterwards with a high hit rate. That makes the aggregate uninterpretable when you’re the only user on the instance; on multi-user setups it doesn’t matter - No per-template diff — the statistics know “template X fired” but not which field it got wrong. The JSONL export is for that level of detail