Everyone's chasing the next foundational model.
But defensibility in AI today isn’t about who has 100B parameters.
It’s about who owns the messiest, most overlooked, most painful-to-collect datasets.
Think:
Real-world driving miles (Tesla)
3D surgical imaging data (Medivis)
Small freight operator logistics (Channel19)
Behavioral clickstreams (Klaviyo)
These aren't "off the shelf".
They're earned: through workflow integration, patient data collection, and relentless operational grind.
Friction is the moat.
The more annoying the data is to get - the more valuable it becomes.
The smartest investors evaluating AI startups today may not be asking, "What’s your model?" They're asking:
What proprietary signal improves with every user interaction?
What painful-to-replicate feedback loop are you capturing?
How hard would it be for a competitor to recreate your dataset?
Because the next $10B outcome?
It probably won’t come from another chatbot.
It’ll come from a startup logging signals no one else even notices - until suddenly, they’re the only ones with the data that matters.
📥 I wrote more about this in today’s newsletter, linked below.
Would love to hear: **what's the weirdest dataset you wish you could own?**👇
This post was originally shared by on Linkedin.