From nightly CSV exports to a real-time mapping API: a practical migration

You inherited pipelines that read mapping data from CSV files. The mappings live somewhere: Excel, a small database, a vendor system that exports nightly. Each pipeline reads the CSV at the start of a run, joins it to transactions, and writes the result to the warehouse. Everything works. The team is comfortable.

Then somebody asks if the mappings can be updated in real time. The finance team wants to add a new customer at 10am and have the afternoon report use it. The current setup cannot do this. The CSV refreshes at midnight.

The migration is non-trivial. Most of the pipelines were built assuming the mapping is a file. The question is how to switch without breaking anything live.

The migration in four phases

Phase 1. Stand up the API. The mapping system probably already has one, or can be configured to expose one. The first job is to confirm the API returns the same content as the CSV, exactly, for the same point in time. Spend a week reconciling differences. Anything you cannot reconcile here will bite later.

Phase 2. Run shadow traffic. Each pipeline calls the API alongside the CSV and writes the API result to a parallel location. The pipeline logic is unchanged. The API call is observation only. The point is to validate that the API holds up under production access patterns, without committing to it.

Phase 3. Switch one pipeline at a time. Smallest blast radius first. A reporting pipeline that affects one team. Switch it to read from the API. Monitor for a week. If anything breaks, switch back. If nothing breaks, move on. Order matters: smallest first, most observable next, most critical last. By the time you switch the critical pipeline, you have weeks of operational data.

Phase 4. Decommission the CSV export. Only when every pipeline reads from the API and no other system reads the CSV. Keep the CSV available for a month after the last switch. Somebody has an ad-hoc script that reads it; you want them to discover it gracefully.

What changes

Concern	File	API
Latency	Daily	Live
Schema	None enforced	Hard contract
Errors	File missing	Transient errors with retry
Auth	Folder permissions	Service account / key
Audit	File modified time	Per-change
Monitoring	File exists	Latency, error rate, freshness

Most of the migration is about getting comfortable with the right side of that table. Once the API is in front of the data, the surrounding work (auth, retries, monitoring) is the same shape as any other API integration.

Practical things to plan for

Latency. A network call instead of a local file read. Most pipelines absorb this fine. The exceptions are pipelines that did many small lookups; those need batching or caching at the start of the run.

Authentication. The CSV had no credentials. The API does. Set up service accounts before the migration; do not let credentials become the bottleneck for a pipeline switch.

Retries. The CSV either was there or was not. The API can return a transient error. Add retry-with-backoff where it is missing. Most HTTP clients have it built in.

Schema drift. The CSV was flat. The API may be richer. Decide early whether each pipeline consumes the richer structure or projects it back to the flat shape. The first option takes more work but unlocks new capabilities.

Monitoring. "Did the file exist" becomes "what is the latency and error rate". Set this up during the migration, not after.

If you are migrating to gridmap, the Lookup API is what handles all of this. Per-key access, single and batch lookups, usage logging tied to the same audit log the editor uses.

Realistic timeline

For a typical mid-sized company with twenty to thirty pipelines, three to six months. The work is mostly sequential and mostly boring. Each switch is one or two days plus a week of monitoring.

The payoff is that the next time the business asks for real-time data, the answer is yes, immediately, instead of "let me see what we can do".

FAQ

Do we have to switch every pipeline? Eventually, but the order is yours. Pipelines that genuinely do not need fresh data can stay on the CSV until they need a rewrite for some other reason.

What about systems that cannot consume an API at all? A scheduled file export from the API gives the legacy consumer a file, while everything modern reads live. The file is downstream of the truth instead of in front of it.

How do we know the migration is done? When the file export is off and nothing has complained for thirty days. Until then, it is in progress.