The API you can’t inspect

foundations
thinking
Behind every dataset is a process that generated it. You never see that process directly. You reverse-engineer it from observation, and it doesn’t come with documentation.
Author

Matthew Gibbons

Published

17 January 2026

You’ve integrated with a third-party API that ships without documentation. You know the drill: call the endpoint, study the responses, and build a mental model of the internal logic. You never get to read the source code. You infer behaviour from output.

Statistical modelling works the same way, just aimed at natural processes rather than services. Behind every dataset sits a data-generating process (DGP): the real-world mechanism that produced the observations. We never see it directly. We only see its output and reason backwards.

The analogy captures the right activity (reverse-engineering behaviour from observations) but the thing you’re reverse-engineering is different in kind.

No versioning, no changelog

A well-maintained API has a version number, release notes, and deprecation warnings. When behaviour changes, someone tells you. A data-generating process has none of this. Customer behaviour shifts, markets move, seasons turn, and there’s no status page to tell you when it happens.

Consider a model that predicts daily order volume for an e-commerce platform. You train it on twelve months of data, it performs well, and you deploy it. Three months later, the predictions start drifting. Not catastrophically, just a slow, consistent undershoot. What happened?

Maybe a competitor launched. Maybe a marketing campaign shifted the customer mix. Maybe a supply chain disruption changed purchasing patterns. The DGP changed, and it didn’t file a ticket.

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)

months = np.arange(1, 19)
# True process: stable for 12 months, then gradual upward shift
baseline = 500
true_rate = np.where(months <= 12, baseline, baseline + 15 * (months - 12))
observed = rng.poisson(lam=true_rate)

# Model trained on months 1-12: predicts baseline rate forever
model_pred = np.full_like(months, baseline, dtype=float)

fig, ax = plt.subplots(figsize=(10, 5))
fig.patch.set_alpha(0)
ax.patch.set_alpha(0)

ax.axvspan(0.5, 12.5, alpha=0.10, color='#0072B2', label='Training period')
ax.plot(months, observed, color='#0072B2', marker='o', markersize=4, linewidth=1,
        alpha=0.7, label='Observed orders')
ax.plot(months, model_pred, color='#E69F00', linewidth=2, linestyle='--',
        label='Model prediction')

ax.annotate('Post-deployment', xy=(15, baseline - 30), fontsize=9, color='grey',
            ha='center')

ax.set_xlabel('Month')
ax.set_ylabel('Daily orders')
ax.set_title('The data-generating process shifted — the model didn\'t')
ax.set_xlim(0.5, 18.5)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.yaxis.grid(True, linestyle=':', alpha=0.4, color='grey')
ax.set_axisbelow(True)
ax.legend(loc='upper left', framealpha=0.0)
plt.tight_layout()
plt.show()
Line chart spanning 18 months. The first 12 months are shaded lightly in blue, labelled 'Training period', with an amber dashed line closely tracking blue data points. After month 12, the blue points begin trending upward while the amber line continues flat, creating a growing gap between prediction and observation. The region after month 12 is labelled 'Post-deployment'.
Figure 1: A model trained on 12 months of data (blue region) performs well during that period. After deployment, the data-generating process shifts — predictions (amber dashed line) begin to undershoot observations (blue points) as the gap between the model’s assumptions and reality grows.

In software terms, this is the equivalent of a dependency silently changing its behaviour between versions. Except there are no versions. You detect the drift from the data itself, by monitoring the gap between what the model expects and what actually arrives.

It returns a distribution, not a response

But the difference runs deeper. In theory, a well-behaved API returns the same response for the same request. In practice, hidden state (caches, rate limits, server-side A/B tests, etc.) can make even a deterministic service look unpredictable from the caller’s side. A data-generating process takes that further: it returns a distribution of responses. A spread of plausible outcomes, each with an associated probability.

When you ask “how many orders will we get tomorrow?”, the DGP doesn’t have a single answer. It has a range of answers, some more likely than others. The model’s job isn’t to nail the single right number. It’s to describe that range: its centre, its spread, its shape.

from scipy import stats

fig, ax = plt.subplots(figsize=(10, 5))
fig.patch.set_alpha(0)
ax.patch.set_alpha(0)

lam = 500
x = np.arange(450, 551)
pmf = stats.poisson.pmf(x, lam)

ax.bar(x, pmf, color='#0072B2', alpha=0.5, width=1.0, edgecolor='none')
ax.axvline(lam, color='#E69F00', linewidth=2, linestyle='--',
           label=f'Expected value (λ={lam})')

ax.set_xlabel('Daily orders')
ax.set_ylabel('Probability')
ax.set_title('The DGP returns a distribution, not a point')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.yaxis.grid(True, linestyle=':', alpha=0.4, color='grey')
ax.set_axisbelow(True)
ax.legend(loc='upper right', framealpha=0.0)
plt.tight_layout()
plt.show()
A probability mass function showing the distribution of daily orders centred around 500. The bars form a roughly bell-shaped distribution spanning from about 460 to 540. A dashed amber vertical line marks the expected value at 500. The distribution illustrates that any single day's order count is drawn from this spread of possibilities.
Figure 2: The data-generating process doesn’t return a single value. It returns a distribution of plausible outcomes. The model’s job is to describe that distribution — its centre, spread, and shape — not to predict a single number.

This is what makes data science different from integrating with a deterministic service. You’re not trying to figure out what the API does. You’re trying to figure out what it might do, and with what probability.

Detecting change

If the DGP doesn’t announce when it changes, how do you know? The same way you’d detect a degraded dependency: monitoring.

In production systems, you watch latency percentiles, error rates, and throughput. When they drift outside expected bounds, something has changed. In data science, you do the same thing with your model’s predictions. You compare what the model expects to what actually arrives, and you watch for systematic discrepancies.

The simplest version of this is tracking the residuals (the gaps between predictions and observations) over time. If the residuals are centred around zero and show no trend, the model’s assumptions still hold. If they start drifting in one direction, the DGP has moved and the model hasn’t kept up.

cumulative_resid = np.cumsum(observed - model_pred)

fig, ax = plt.subplots(figsize=(10, 5))
fig.patch.set_alpha(0)
ax.patch.set_alpha(0)

ax.plot(months, cumulative_resid, color='#0072B2', linewidth=2)
ax.axhline(0, color='#E69F00', linewidth=1.5, linestyle='--', alpha=0.7)
ax.axvline(12, color='grey', linewidth=1, linestyle=':', alpha=0.6)
ax.annotate('DGP shifts', xy=(12.2, cumulative_resid.max() * 0.5),
            fontsize=9, color='grey')

ax.set_xlabel('Month')
ax.set_ylabel('Cumulative residual')
ax.set_title('Monitoring for drift: cumulative residuals over time')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.yaxis.grid(True, linestyle=':', alpha=0.4, color='grey')
ax.set_axisbelow(True)
plt.tight_layout()
plt.show()
Line chart showing cumulative residuals over 18 months. For the first 12 months, the line wanders near zero with small fluctuations. After month 12, the line begins a steady upward climb, indicating systematic underprediction. A vertical grey dotted line marks the point where the data-generating process shifted.
Figure 3: Cumulative residuals over time. During the training period, they fluctuate around zero — the model’s predictions are unbiased. After the DGP shifts at month 12, the cumulative residuals begin climbing steadily, signalling that the model is systematically underpredicting.

More sophisticated approaches exist (statistical tests for distributional drift, sliding-window comparisons, Bayesian change-point detection) but the principle is the same one you already apply to production monitoring. Watch the right signals, set thresholds, and respond when something moves.

Models are hypotheses about the process

This framing changes what a model is. It’s not a function that maps inputs to outputs. It’s a hypothesis about the data-generating process — a claim about what mechanism produced the data you observed. When you fit a linear regression, you’re claiming that the relationship between your variables is approximately linear and that the noise is roughly constant. When you fit a Poisson model, you’re claiming that events occur independently at a roughly constant rate.

These claims might be wrong. The relationship might be nonlinear. The rate might vary. The events might not be independent. That’s fine because all models are simplifications. The question isn’t whether the model is perfectly right. It’s whether the simplification is useful: does it capture enough of the structure to answer the questions you care about, and do the residuals tell you where it falls short?

This is reverse-engineering without the luxury of documentation. You propose a mechanism, check it against the data, and revise. The DGP never confirms you’ve got it right. You just accumulate evidence that your model is a reasonable approximation, until the process changes and you start again.


This article is part of a series drawn from Thinking in Uncertainty, a book that teaches data science to experienced software engineers.