Completeness > Outcome — Building the AI Era's Data Flywheel

Part One: Pre‑training, post‑training, and the 100,000 steps of data

Reading interviews and notes from teams working on frontier models, you can feel the center of gravity shifting. As systems get larger, many of the people closest to the research now see data as the main bottleneck, not yet another clever architecture or a slightly bigger cluster. Compute, algorithms, and data still matter together, but the sense is that the next real step change will come from fixing data, not from polishing the math a bit more.

It helps to separate the two stages in how these models learn. Pre‑training sets the basic ability of the model: does it actually know this knowledge at all? Post‑training sets the performance ability: can it use what it already knows in the right way, at the right time, for the right user.

Think of it like a student taking a high‑stakes exam. Post‑training is exam technique: time management, reading the question carefully, and structuring answers logically. But pre‑training is whether the student actually showed up to class for the last ten years. If you skipped the lectures, no amount of test‑taking strategy will help. You cannot reason your way to knowledge you never absorbed in the first place.

In practice, pre‑training means letting the model read a huge amount of text and learn patterns from it. What makes current language models surprisingly capable is that, on the open internet and in published material, many domains already look like a very dense staircase of knowledge. From beginner blog posts to specialist papers, you can almost slice the path from novice to expert into tens of thousands of tiny steps, each only slightly harder than the last. The model does not just see the final conclusions; it also sees a lot of intermediate explanations, worked examples, and side notes that connect simple ideas to complex ones.

Part Two: The invisible staircase

Picture this: you come home after a long day, and something feels off. Your partner is tidying the kitchen a bit too loudly. They say the plumber was a disaster, then mention that your mother called, that there was an important presentation, and that they never had lunch. You now have several strong signals and no clear sense of which event mattered most or how they fit together.

Later, when things are calmer, you walk through the day in order. A failed alarm led to a rushed morning and no breakfast. The plumber arrived an hour early, right when the time was blocked to finish slides for an afternoon presentation. While the plumber was still there, your mother called with an urgent family issue. By the time the leak was fixed and the call ended, there was no time for lunch before the presentation. The plumber, which sounded like the headline, was just the first domino in a chain.

When you only hear fragments like "plumber", "your mother", "presentation", and "no lunch", you miss the timeline, the cause and effect, and how stress accumulates. You have the right nouns, but not the structure that turns them into a story. That is what incomplete data looks like: the events are present, but the middle steps that explain how one led to another are invisible. Many AI researchers worry about this missing middle layer when they talk about gaps in training data.

Now scale that problem up to a business context. A financial analyst might read dozens of documents for a single idea, quietly discard most of them because of subtle red flags or context cues, then converge on a recommendation. Those filters and judgments rarely get written down or fed back into any system. The model sees the raw inputs (the documents) and the final output (the recommendation), but not the messy reasoning in between. It is the same structure as the kitchen conversation: the outcomes are recorded, but the middle steps that carry the judgment and context are missing.

That is why AI-generated financial analysis looks impressive until you actually need to rely on it. Paste in numbers, get a polished report with charts in seconds. It looks right. But to an expert, it is hollow. The model has never seen the thousands of discarded documents, the quiet red flags, or the hard calls that separate signal from noise. It is mimicking the format, not the judgment. Basing real financial decisions on this is not augmenting expertise; it is just losing money with better fonts.

This disconnect explains the massive mood gap between developers and everyone else. As I discussed in Why AI replaces senior devs before junior marketers, coding is unique because the entire problem‑solving process is naturally recorded. A git commit is a snapshot of thinking. Issue trackers log failures, and pull request reviews capture the "why" behind every change. This creates a dense, granular trail of attempts, mistakes, and corrections, giving AI models millions of examples of how experts iterate toward a solution.

Most other industries have nothing similar. They rely on final artifacts such as reports, contracts, and memos, which are all that a web-scale crawl will ever see. The real work, the discarded drafts, the verbal debates, and the quick "gut check" decisions happen in heads or hallways and disappear. Without that trace, AI is asked to learn a craft by only looking at finished products. The result is a model that can copy tone but not judgment. Until you find a way to capture the reasoning process itself, better algorithms are just polishing the surface.

Part Three: Judgment data, the new flywheel

The last technology shift offers a useful map. Amazon and Google did not win just by having better code; they won by building data flywheels. Every click and purchase improved their algorithms, which attracted more users, which generated more data. The service fed the data, and the data improved the service. That loop, not the software itself, became the moat.

The AI era is following the same script, but the fuel has changed. In the internet era, the “oil” was behavioral data: clicks, views, transactions. It told you what people did. In the AI era, the oil is judgment data: how experts weigh options, what they discard, and why they say no. It tells you why a decision was right for this context, and why other plausible paths were rejected.

This is where vertical businesses have a concrete opportunity. General models can read your annual report and public filings, but they never see the sequence of internal decisions that produced them. In a research team, for example, the model might see the final "buy" or "underweight" call, but not the ideas that were screened out, the comments like "management guidance is not credible", or the compliance review that reduced the position size. Judgment data lives in places like:

Screening tools where analysts tag tickers as “pass” or “reject” with a short reason.
Policy engines where underwriters override an automatic score and log why.
Draft layers where lawyers strike out clauses and annotate “litigation risk too high under local law”.

Those are all points where an expert is teaching the system what not to trust. Right now, most organisations treat them as exhaust, not assets.

You can think of the strategic task in three very concrete moves:

Capture the decision traces: log not just the final choice, but which options were considered and killed, and on what grounds.
Preserve the negative examples: when an expert says “no” to a document, deal, clause, or trade, keep that as structured data instead of losing it in email.
Turn overrides into training signals: whenever a human corrects or adjusts what a basic model, rule, or score suggests, treat that as labeled feedback, not a one-off fix.

Once you do this in a focused domain, you are no longer only using generic intelligence. You are building a dataset that encodes how your firm thinks about risk, quality, and edge. Over time, it behaves like a flywheel: every deal reviewed, contract negotiated, and research idea rejected becomes another update to a model that increasingly reflects your best people instead of a smart outsider.

Part Four: Owning your judgment flywheel

Right now, many teams are understandably in "AI shopping mode". When you are racing to ship features or fix immediate bottlenecks, grabbing the best available tool off the shelf is often the only sensible move. Survival comes first, and pragmatism beats purity. But as organizations move from scrambling for basic functionality to building long-term value, the calculation changes. At that stage, simply renting generic intelligence stops being enough to differentiate you from the next competitor buying the exact same API.

The compounding asset has shifted. In the internet era, it was behavioral data such as user clicks. In the AI era, it is captured reasoning: the explanations, corrections, and rejections that encode how your best people think.

The central question is not which model you use; it is what you do with your experts’ judgment. Treat it as disposable, and it disappears at the end of every meeting. Treat it as a dataset, and every override, every “no”, every raised eyebrow becomes fuel for the next, smarter system. Most firms still let that fuel burn off. A few have started bottling it.

Over time, this creates performance gaps because judgment data has a compounding structure. The more you capture, the better your systems get. The better your systems get, the more edge cases and corrections you observe, and those corrections become new training signals. It is the same flywheel pattern that shaped the last generation of technology companies, now available at the domain level to any firm that instruments its decision processes.

As you evaluate new models and tools, a simple strategic question cuts through the noise. Are you building a data asset that gets smarter with use, or renting intelligence that resets when the contract renews? Algorithms will keep improving, and compute will keep getting cheaper. The judgment data that makes AI truly expert in your context can only come from you.

Part One: Pre‑training, post‑training, and the 100,000 steps of data

Part Two: The invisible staircase

Part Three: Judgment data, the new flywheel

Part Four: Owning your judgment flywheel

AI isn't a Feature.It's the Foundation.

AI isn't a Feature.
It's the Foundation.