Email parser
The email parser is the headline feature of TravStats. Forward a booking confirmation, watch the flight appear on your globe.
This page covers what happens between “I forwarded the email” and “the flight is on the map”.
Two ways to feed the parser
Section titled “Two ways to feed the parser”1. Paste the body
Section titled “1. Paste the body”Flights → Import → Email opens a textarea. Paste the email body (plain text or HTML) directly — works for any email client. Hit Parse.
2. Upload the original file
Section titled “2. Upload the original file”The same screen accepts .eml files (RFC 5322 — most modern
clients export this) and Outlook .msg files. Upload, Parse.
Higher accuracy than pasted body for HTML-rich confirmations
because nothing gets lost in copy-paste.
Both paths go through the same parser pipeline. Use whichever is faster for the email client you have open.
Recommendation: Ollama is the primary parser
Section titled “Recommendation: Ollama is the primary parser”The recommended setup is Ollama-first. When TravStats has an
Ollama instance configured (default: bundled sidecar on
http://ollama:11434), the LLM is tried before the built-in
regex templates, because:
- Multi-flight bookings — round trips, connections, group bookings — work consistently. Regex templates often catch only the first leg, miss connections, or fail entirely on multi-PNR itineraries.
- Coverage: works for any airline, not just the eight European carriers TravStats ships templates for.
- Resilience: copes with redesigned emails, HTML-heavy layouts, alt-text-only fields, and edge cases that break regex.
The bundled docker-compose.yml includes Ollama as a sidecar at no
extra setup. Most users should leave it on.
The detection pipeline
Section titled “The detection pipeline”When you hit Parse, the backend goes through this sequence:
1. Extract the email body → raw text + raw HTML2. Pick a strategy in this order: a. User template with confidence ≥ 80 % → run user-template engine b. Ollama configured & reachable → run Ollama LLM (recommended) c. Built-in regex template matches → run template engine d. Generic regex extractor → last-resort fallback3. Run regex post-processing on the result → fill PNR, gate, terminal, seat4. Return extracted flights to the review screenYou don’t pick which strategy. Detection runs automatically and the
review screen tells you which one fired (look for the parserUsed
badge: user-template, ollama, or regex).
Built-in regex airline templates (the fallback)
Section titled “Built-in regex airline templates (the fallback)”When Ollama is not configured, or it’s unreachable, or it returns no flights, the parser falls back to regex templates for eight European carriers. All cover the standard booking-confirmation layout these airlines have used for the past few years.
| IATA | Sender domains | Subject patterns matched |
|---|---|---|
LH | @lufthansa.com, @miles-and-more.com, @lufthansa.de | ”Buchungsbestätigung”, “Lufthansa booking confirmation”, “Ihre Buchung” |
LH-old | (subject-only, for older Lufthansa format) | “Buchungsdetails” |
LX | @swiss.com, @newsletter.swiss.com | ”Buchungsbestätigung”, “Your Swiss booking” |
OS | @austrian.com, @newsletter.austrian.com | ”Austrian booking”, “Ihre Buchung bei Austrian” |
SN | @brusselsairlines.com | ”Brussels Airlines booking” |
FR | @ryanair.com, @info.ryanair.com | ”Ryanair booking”, “Your booking confirmation” |
U2 | @easyjet.com, @email.easyjet.com | ”easyJet confirmation”, “Your easyJet booking” |
EW | @eurowings.com, @newsletter.eurowings.com | ”Eurowings buchung”, “Eurowings booking” |
W6 | @wizzair.com, @info.wizzair.com | ”Wizz Air booking”, “Buchungsbestätigung Wizz” |
Detection uses all three signals (sender domain, subject regex, HTML fingerprint inside the body). Most legitimate confirmations match multiple signals, which is what makes detection reliable.
Templates live in
backend/src/services/parsers/templates/airlines/
as JSON files. Each one declares the field-extraction rules
(regex / XPath-style selectors / structural patterns). You don’t
edit these unless you’re contributing a new built-in template
upstream.
What gets extracted
Section titled “What gets extracted”For every flight in the booking, the template engine pulls:
- Flight number (normalised —
lh401,LH 401,LH401all becomeLH401) - Date of the flight, in the airport’s local timezone
- Departure / arrival airport (IATA code; resolved against the seeded airport database for full info)
- Scheduled departure / arrival times (local-time-and-IANA-zone pair, see the v1.2.0 contract change)
- Booking reference / PNR if present
- Aircraft type if the airline includes it in the email (rare; AirLabs/Aviationstack enrichment fills this in afterwards)
- Passenger names (multiple if a group booking; saved as
coPassengers) - Seat / class / cabin if the email reached the check-in stage
Multi-leg bookings produce multiple flight rows — round trips become two flights, connections become as many as there are legs. With Ollama active, this works reliably across all airlines. With the regex-only fallback, multi-leg detection is the regex layer’s weakest spot — the parser may catch only the first leg, or miss the return entirely. If you frequently book multi-flight itineraries, run Ollama.
You can always delete unwanted rows on the review screen before saving.
How Ollama is used
Section titled “How Ollama is used”When Ollama is configured (default: bundled sidecar at
http://ollama:11434, model gemma3:12b), the parser sends the
cleaned email text plus a structured-extraction prompt and parses
the JSON response back into flight rows.
Accuracy on the project’s benchmark suite (mixed multi-airline
mails): ~95–100 % with gemma3:12b, near-100 % on
multi-flight bookings — substantially better than regex on
multi-leg, since the LLM tracks itineraries semantically rather
than pattern-matching.
Ollama setup is covered on the Ollama page. If Ollama isn’t running, the parser silently falls back to the built-in regex templates (8 carriers) and a generic regex extractor. For airlines without a built-in template and no Ollama, the parser returns “no flight detected” and you fall back to a user template or manual entry.
When the parse misses
Section titled “When the parse misses”Common symptoms and what to do:
| Symptom | Cause | Fix |
|---|---|---|
| No flight extracted at all | Carrier without a built-in regex template, and Ollama not configured | Enable Ollama (bundled, free) — strongly recommended. Or record a user template, or enter manually |
| Only one flight from a multi-leg booking | Regex-only parsing without Ollama; templates often only catch the first leg | Enable Ollama — multi-flight emails are exactly where regex falls short. With Ollama, all legs come through |
| Aircraft type missing | Airline doesn’t include it in the email body | Manually edit, or enable AirLabs/Aviationstack enrichment to auto-fill on save |
| Wrong flight extracted from a multi-leg booking | Parser (regex layer) picked the wrong leg as the primary flight | Review screen lets you delete the unwanted ones; or enable Ollama which tracks itineraries semantically |
| Codeshare flight number (e.g. AC987 operated by LH) | Operator vs. marketing carrier mismatch | The flight-lookup API resolves codeshares — set an AirLabs / Aviationstack key |
| All fields blank | Email is OCR’d from a forwarded photo, or HTML is heavily styled with everything in alt-text | Try uploading the original .eml instead of pasted body. Ollama copes with this better than regex |
| Date off by one day | Email gives departure in arrival-airport’s timezone, vice versa | Edit on the review screen — the schedule API will fix it on the next enrichment pass |
Recording a parse decision
Section titled “Recording a parse decision”Every parse, successful or not, gets logged. Look in:
docker exec travstats-app tail -f /app/data/logs/parser*.logEach log line records:
- Which template fired (or that none did)
- Which fields were extracted
- Whether the LLM fallback was invoked
- Latency for each stage
Use it when investigating “why didn’t this parse” or when building a user template for a new carrier.
Bulk-importing many emails at once
Section titled “Bulk-importing many emails at once”For backfilling years of old confirmations, paste-one-at-a-time is tedious. Two options:
- Multi-paste — the email parser textarea accepts multiple emails at once, separated by blank lines. The parser detects each independently and produces a single review screen for everything.
- Dedicated mail-fetch endpoint (advanced) —
/api/v1/parse-emailaccepts a JSON body with email content. Combine with a script that walks an IMAP mailbox and POSTs each message. The REST API reference covers the request shape and a sample script.
Either way, the review screen is the choke point — you confirm before anything saves.
Privacy
Section titled “Privacy”The full email body — including passenger names, PNRs, payment amounts if present — is processed inside your TravStats container. Nothing leaves your network with regex templates or Ollama, because both run locally (Ollama as a sidecar in the same Docker network).
The only path where email content leaves your hardware is if you’ve
deliberately pointed OLLAMA_URL at a hosted/external Ollama
service. TravStats doesn’t recommend that, and the bundled
docker-compose default is the local sidecar. The
Ollama page covers what to
expect.
The marketing site uses pre-canned fictional samples for its parser demo precisely so no real email ever needs to be exposed to make a decision about whether the parser works for you. See it at travstats.de under “How it works”.