Skip to content

Email parser

The email parser is the headline feature of TravStats. Forward a booking confirmation, watch the flight appear on your globe.

This page covers what happens between “I forwarded the email” and “the flight is on the map”.

Flights → Import → Email opens a textarea. Paste the email body (plain text or HTML) directly — works for any email client. Hit Parse.

The same screen accepts .eml files (RFC 5322 — most modern clients export this) and Outlook .msg files. Upload, Parse. Higher accuracy than pasted body for HTML-rich confirmations because nothing gets lost in copy-paste.

Both paths go through the same parser pipeline. Use whichever is faster for the email client you have open.

Recommendation: Ollama is the primary parser

Section titled “Recommendation: Ollama is the primary parser”

The recommended setup is Ollama-first. When TravStats has an Ollama instance configured (default: bundled sidecar on http://ollama:11434), the LLM is tried before the built-in regex templates, because:

  • Multi-flight bookings — round trips, connections, group bookings — work consistently. Regex templates often catch only the first leg, miss connections, or fail entirely on multi-PNR itineraries.
  • Coverage: works for any airline, not just the eight European carriers TravStats ships templates for.
  • Resilience: copes with redesigned emails, HTML-heavy layouts, alt-text-only fields, and edge cases that break regex.

The bundled docker-compose.yml includes Ollama as a sidecar at no extra setup. Most users should leave it on.

When you hit Parse, the backend goes through this sequence:

1. Extract the email body → raw text + raw HTML
2. Pick a strategy in this order:
a. User template with confidence ≥ 80 % → run user-template engine
b. Ollama configured & reachable → run Ollama LLM (recommended)
c. Built-in regex template matches → run template engine
d. Generic regex extractor → last-resort fallback
3. Run regex post-processing on the result → fill PNR, gate, terminal, seat
4. Return extracted flights to the review screen

You don’t pick which strategy. Detection runs automatically and the review screen tells you which one fired (look for the parserUsed badge: user-template, ollama, or regex).

Built-in regex airline templates (the fallback)

Section titled “Built-in regex airline templates (the fallback)”

When Ollama is not configured, or it’s unreachable, or it returns no flights, the parser falls back to regex templates for eight European carriers. All cover the standard booking-confirmation layout these airlines have used for the past few years.

IATASender domainsSubject patterns matched
LH@lufthansa.com, @miles-and-more.com, @lufthansa.de”Buchungsbestätigung”, “Lufthansa booking confirmation”, “Ihre Buchung”
LH-old(subject-only, for older Lufthansa format)“Buchungsdetails”
LX@swiss.com, @newsletter.swiss.com”Buchungsbestätigung”, “Your Swiss booking”
OS@austrian.com, @newsletter.austrian.com”Austrian booking”, “Ihre Buchung bei Austrian”
SN@brusselsairlines.com”Brussels Airlines booking”
FR@ryanair.com, @info.ryanair.com”Ryanair booking”, “Your booking confirmation”
U2@easyjet.com, @email.easyjet.com”easyJet confirmation”, “Your easyJet booking”
EW@eurowings.com, @newsletter.eurowings.com”Eurowings buchung”, “Eurowings booking”
W6@wizzair.com, @info.wizzair.com”Wizz Air booking”, “Buchungsbestätigung Wizz”

Detection uses all three signals (sender domain, subject regex, HTML fingerprint inside the body). Most legitimate confirmations match multiple signals, which is what makes detection reliable.

Templates live in backend/src/services/parsers/templates/airlines/ as JSON files. Each one declares the field-extraction rules (regex / XPath-style selectors / structural patterns). You don’t edit these unless you’re contributing a new built-in template upstream.

For every flight in the booking, the template engine pulls:

  • Flight number (normalised — lh401, LH 401, LH401 all become LH401)
  • Date of the flight, in the airport’s local timezone
  • Departure / arrival airport (IATA code; resolved against the seeded airport database for full info)
  • Scheduled departure / arrival times (local-time-and-IANA-zone pair, see the v1.2.0 contract change)
  • Booking reference / PNR if present
  • Aircraft type if the airline includes it in the email (rare; AirLabs/Aviationstack enrichment fills this in afterwards)
  • Passenger names (multiple if a group booking; saved as coPassengers)
  • Seat / class / cabin if the email reached the check-in stage

Multi-leg bookings produce multiple flight rows — round trips become two flights, connections become as many as there are legs. With Ollama active, this works reliably across all airlines. With the regex-only fallback, multi-leg detection is the regex layer’s weakest spot — the parser may catch only the first leg, or miss the return entirely. If you frequently book multi-flight itineraries, run Ollama.

You can always delete unwanted rows on the review screen before saving.

When Ollama is configured (default: bundled sidecar at http://ollama:11434, model gemma3:12b), the parser sends the cleaned email text plus a structured-extraction prompt and parses the JSON response back into flight rows.

Accuracy on the project’s benchmark suite (mixed multi-airline mails): ~95–100 % with gemma3:12b, near-100 % on multi-flight bookings — substantially better than regex on multi-leg, since the LLM tracks itineraries semantically rather than pattern-matching.

Ollama setup is covered on the Ollama page. If Ollama isn’t running, the parser silently falls back to the built-in regex templates (8 carriers) and a generic regex extractor. For airlines without a built-in template and no Ollama, the parser returns “no flight detected” and you fall back to a user template or manual entry.

Common symptoms and what to do:

SymptomCauseFix
No flight extracted at allCarrier without a built-in regex template, and Ollama not configuredEnable Ollama (bundled, free) — strongly recommended. Or record a user template, or enter manually
Only one flight from a multi-leg bookingRegex-only parsing without Ollama; templates often only catch the first legEnable Ollama — multi-flight emails are exactly where regex falls short. With Ollama, all legs come through
Aircraft type missingAirline doesn’t include it in the email bodyManually edit, or enable AirLabs/Aviationstack enrichment to auto-fill on save
Wrong flight extracted from a multi-leg bookingParser (regex layer) picked the wrong leg as the primary flightReview screen lets you delete the unwanted ones; or enable Ollama which tracks itineraries semantically
Codeshare flight number (e.g. AC987 operated by LH)Operator vs. marketing carrier mismatchThe flight-lookup API resolves codeshares — set an AirLabs / Aviationstack key
All fields blankEmail is OCR’d from a forwarded photo, or HTML is heavily styled with everything in alt-textTry uploading the original .eml instead of pasted body. Ollama copes with this better than regex
Date off by one dayEmail gives departure in arrival-airport’s timezone, vice versaEdit on the review screen — the schedule API will fix it on the next enrichment pass

Every parse, successful or not, gets logged. Look in:

Terminal window
docker exec travstats-app tail -f /app/data/logs/parser*.log

Each log line records:

  • Which template fired (or that none did)
  • Which fields were extracted
  • Whether the LLM fallback was invoked
  • Latency for each stage

Use it when investigating “why didn’t this parse” or when building a user template for a new carrier.

For backfilling years of old confirmations, paste-one-at-a-time is tedious. Two options:

  1. Multi-paste — the email parser textarea accepts multiple emails at once, separated by blank lines. The parser detects each independently and produces a single review screen for everything.
  2. Dedicated mail-fetch endpoint (advanced) — /api/v1/parse-email accepts a JSON body with email content. Combine with a script that walks an IMAP mailbox and POSTs each message. The REST API reference covers the request shape and a sample script.

Either way, the review screen is the choke point — you confirm before anything saves.

The full email body — including passenger names, PNRs, payment amounts if present — is processed inside your TravStats container. Nothing leaves your network with regex templates or Ollama, because both run locally (Ollama as a sidecar in the same Docker network).

The only path where email content leaves your hardware is if you’ve deliberately pointed OLLAMA_URL at a hosted/external Ollama service. TravStats doesn’t recommend that, and the bundled docker-compose default is the local sidecar. The Ollama page covers what to expect.

The marketing site uses pre-canned fictional samples for its parser demo precisely so no real email ever needs to be exposed to make a decision about whether the parser works for you. See it at travstats.de under “How it works”.