All posts
reportingmanual errorspipelines

Reports that crash on Monday

gridmap team May 26, 2026 3 min read

Eight fifteen, Monday morning. The weekend refresh ran. The email arrives: the executive dashboard is wrong, the totals look strange, can someone take a look. By nine the data team is in a room together, scrolling through pipeline logs, looking for the broken step.

Almost every time, the root cause is the same. A mapping was edited late Friday by someone who needed to add one new value before going home. The edit looked fine. The downstream system that consumed it did not validate, and something passed silently into the warehouse that should have been caught.

A specific version. The sales ops team maintains a mapping from raw account IDs to clean customer names in a shared Excel file. The warehouse pulls it nightly, joins it to transactions, writes a fact table. Most dashboards read from that fact table.

Friday at five, someone adds a new account. They forget the trailing tab. Excel saves the file as CSV (because the warehouse reads CSV, not XLSX). The row parses with one fewer column than expected. The customer name lands in the wrong column. Nothing flags it because both columns are strings.

Monday morning, the bad row joins to nothing. The dashboard shows revenue without a customer name. The total is correct. The breakdown is not. A VP sees a blank row. The email goes out at eight fifteen.

By eleven the team has found it. The fix takes ten seconds. The investigation took two hours. Three people are stressed. The dashboard is not trusted until the next refresh.

That whole sequence is the propagation of one trailing tab.

Why this propagates

The pipeline trusts the mapping. There is no boundary between the human editing the file and the system reading it. Engineers add validation to user-facing forms because users will type anything. They have not added the same validation to internal mapping files, because those feel safe. They are not safe. A mapping file is a form with no validation, edited by people under time pressure.

The three changes that stop it

Validate at the source. A row missing required fields cannot save. The error appears in front of the person who can fix it in the moment, not three days later when a report breaks.

Version every change. When the report does break, "what changed since Friday" has a definite answer in two clicks instead of an hour of forensics across file copies.

Move to a schema contract. Read the mapping from an API, not a CSV. If the schema changes incompatibly, the pipeline fails immediately and loudly on Friday afternoon, not silently on Monday morning. gridmap's Lookup API works this way: typed, versioned, and paired with the same audit log the editor sees.

The cost of doing nothing

Two hours of crisis a week, fifty weeks a year, is 100 hours. Even at a modest senior engineering rate that is seven thousand euros of pure firefighting per year, before counting whoever else gets pulled in. The pattern persists for years because no single person feels the full pain. The cost is spread across enough people that the math never gets done.

FAQ

Is this just an Excel problem? Largely. Any file-based mapping has the same boundary problem. The fix is the same regardless of format: read from an API or a versioned table.

Our team has a "no Friday changes" rule. Does that help? Slightly. The deeper problem is that any change carries the same risk. Moving the rule to Tuesday only changes the day of the week the dashboard breaks.

How do we know when this is fixed? The signal is not the absence of bugs. It is the time to diagnose. If "what changed" takes two minutes to answer, the boundary is in place.