AI Runs on Data. Is Yours Ready?

5 min read

Apr 21, 2026

Data has topped every Tech Trends report we've ever published. As everything gets more complex — more content, more channels, higher expectations from authors, more options for what can be done with AI — data infrastructure becomes even more central to the themes we cover from year to year.

In 2026, the stakes are higher than ever: AI runs on data, discovery depends on it, and revenue increasingly flows from whoever structures it best.

Hum and Silverchair asked contributors to our 2026 Publishing Tech Trends report how data will shape the year ahead. Here's what they had to say.

Metadata has become mission-critical

Metadata was once the concern of a small team in a back office. Not anymore. As AI systems become the primary way researchers discover and consume content, the quality and structure of your metadata has become the difference between being found and being invisible.

Jonathan Woahn (Cashmere) put it in competitive terms: "Metadata quality and structure will determine visibility, interoperability, and monetization. 'AI-ready' data — semantic, structured, and permissioned — becomes the differentiator between open-web noise and premium discovery."

Michael Di Natale (AACR) grounded it in operational reality: "None of the technology publishers and their audiences are excited about can function without robust and accurate metadata." And John Challice (Hum) pushed the definition further, arguing that the metadata conversation needs to expand beyond content to include audience behavior and true engagement, not just clicks.

Large language models don't "read" articles the way humans do; they consume structure. Which means publishers aren't just distributing documents anymore. They're stewards of semantic architecture. Solutions like Alchemist Taxonomy are increasingly how publishers close that gap, turning unstructured archives into organized, searchable, AI-ready knowledge, and automatically tagging new content as it publishes.

The measurement gap is getting harder to ignore

Traditional metrics were built for a world where researchers searched, clicked, and downloaded. That world is changing fast, and the frameworks publishers rely on to demonstrate value may not survive the transition intact.

Nicholas Liu (Oxford University Press) identified the core shift: usage data needs to be reconfigured as it becomes less about human clickthroughs and more about whether your content was actually used to generate an AI response. Rachel Bock (Wiley) connected this directly to revenue risk: if COUNTER reports and citation counts can't capture how researchers use content when querying AI tools for synthesized answers, publishers will struggle to demonstrate ROI to librarians, justify pricing to institutions, and prove research impact to funders.

Jessica Miles (The Informed Frontier) sees a structural response already emerging: "I think we'll increasingly see machine-readable formats presented alongside narratives for humans - a tacit recognition of the increasing importance of 'AI readers.'"

This isn't a future problem. As AI-first search continues to rise, the gap between what publishers can measure and what's actually happening is widening in real time.

Quality over quantity (Finally!)

For years, the dominant data philosophy was simple: collect everything. Storage was cheap, and you never knew what you might need. That era is ending, and the shift has real implications for how publishers think about data strategy.

Adam Day (Clear Skies Ltd) was characteristically direct: "We're moving from an era of abundance to an era of excess. There will be rapid growth in data and the average quality of it will go down. The key is to put curation of quality data first. Data isn't a byproduct of business operations anymore. Data is business operations."

Teo Pulvirenti (ACS Publications) framed curation as the foundation of everything else: "The publishers who invested in robust governance frameworks — ensuring ethical use, clear ownership, and regulatory compliance — will not only shape technology trends but redefine business models." Natalie Jacobs (Emerald Publishing) made the point that data without narrative is just noise: it needs to be meaningful, constructed into a story, and connected to decision-making to have any value at all.

The governance stakes are real. James Butcher (Journalology) raised a tension worth watching: "I worry that researchers may be less willing to share raw research data knowing that AI companies can use it to generate wealth for their investors."

Trust in AI-mediated publishing comes under increasing scrutiny as AI reshapes how papers are produced, reviewed, and discovered, and publishers who have invested in transparent, ethical data governance will have a meaningful advantage.

The Bottom Line

Data isn't a new trend in publishing. What's new is the steep cost of getting it wrong.

The path forward is clearer than it might feel: Audit your metadata for AI-readiness. Rethink how you measure engagement beyond clicks. Build governance frameworks before you need them.

None of these are moonshots. They're the foundational moves that will separate publishers who thrive in an AI-mediated world from those who scramble to catch up.

Want the full picture? Download the 2026 Publishing Tech Trends Report from Hum and Silverchair.