Use a REST API for data mappings, not nightly Excel imports
Most analytics platforms read reference data from a file. A CSV in a folder. An Excel workbook on SharePoint. A flat extract from another system. A nightly job pulls the file, loads it into staging, and the pipeline takes over.
The pattern is comfortable. Files are simple, debuggable, and you can open them to look. The problem is that files have one persistent issue: they are always slightly out of date. And they have a class of failures that an API does not.
The latency cost
A user updates the customer mapping at 3pm. The nightly job ran at 2am. The new mapping is not in the warehouse until tomorrow morning. Anyone querying between 3pm and the next refresh sees yesterday's truth.
If that user updated the mapping because somebody asked an urgent question, the answer to that urgent question is still wrong.
An API removes the gap. The update goes live immediately. Downstream systems read at query time or get notified.
The failures only files produce
| Failure | File-based | API-based |
|---|---|---|
| Schema drift | New column silently shifts others | New field is a new field |
| Encoding | UTF-8 BOM breaks the first cell | JSON has a defined encoding |
| Lost update | Two desktops overwrite each other | Transactional writes |
| Audit | File modified time only | Per-change attribution |
| Permissions | Folder-level binary | Per consumer, per scope |
The most common failure in practice is schema drift. Someone adds a column to the export, the column order shifts, the downstream parser silently misaligns. The data still looks plausible. Nobody catches it until a report queries the wrong column three weeks later. An API contract makes that misalignment impossible. A new field appears as a new field, separately.
Building any of these protections inside a custom file pipeline takes real work. By the time you have done it, you have built an API. Cheaper to start there.
What you trade
Schema enforcement is the immediate win. Auth, retries, and monitoring follow naturally. The error handling around "file might not exist, might be empty, might have one extra column" largely disappears.
What you give up is the file-like access pattern. A pipeline that did many small lookups against a local CSV is now making network calls. Most pipelines absorb this. The exceptions are pipelines that did high-volume single-row lookups, which need batching or caching.
A four-step migration
Confirm parity. Compare API output to CSV output for the same point in time. Not approximately the same. Exactly. Spend a week reconciling differences.
Run shadow traffic. Each pipeline calls the API alongside the CSV read and writes the result to a parallel location. Pipeline logic unchanged; the API call is observation only.
Switch one pipeline at a time. Smallest blast radius first. Monitor for a week. If anything breaks, roll back. If nothing breaks, move on.
Decommission the CSV. Only when every pipeline reads from the API and no other system reads the CSV. Keep the file available for a month after the last switch. Someone has an ad-hoc script that reads it.
If your mapping system is gridmap, this is what the Lookup API is for. Single and batch queries, per-key access control, usage logging, and the same audit log the editor sees.
FAQ
How fast does the API need to be? For most analytics workloads, under 100 ms feels fine. Under 50 ms feels native. Beyond that, cache the full mapping in memory at pipeline start.
Can we run files and API in parallel forever? You can, and the maintenance will double. Pick a sunset date.
What about systems that cannot consume an API at all? A scheduled file export from the API gives the legacy consumer a file, while everything modern reads live. The file is downstream of the truth instead of in front of it.