RESOURCES / CULTURAL INSTITUTIONS / CONTENT AS FUEL
When collections become training data.
A regional art museum we work with holds over 12,000 objects spanning five centuries. Their collection management system is thorough: provenance records, artist biographies, medium descriptions, historical context, exhibition history. When we disabled JavaScript and loaded their online collection page, we saw a search bar and nothing else. No object titles. No artist names. No descriptions. The entire collection was invisible to any system that cannot execute JavaScript, and that includes every major AI crawler operating today.
This is the paradox. Museums generate some of the most meticulously structured data of any organization. Artist names, dates, materials, dimensions, provenance chains, subject classifications, geographic origins. This is precisely the kind of factual, well-sourced information AI models use to construct answers. And most of it is locked behind interfaces AI cannot read.
The data AI wants already exists in your CMS
The largest institutions have recognized this. The Met provides open API access to metadata on over 470,000 artworks. The Smithsonian's Open Access program has released more than 11 million metadata records across 21 museums. Cleveland Museum of Art offers structured data on 64,000+ objects through a public API. Harvard Art Museums, the Getty, and the Rijksmuseum have followed similar paths.
These institutions didn't create new content for AI. They exposed existing data in machine-readable formats: REST APIs with JSON, downloadable CSV files, Linked Open Data endpoints, and IIIF-compliant image services. Collection records maintained for decades became raw material for a new kind of discoverability.
But here's what matters for AI visibility: having an API is not the same as having collection content AI crawlers can find on your website. GPTBot and PerplexityBot don't call your API. They crawl your HTML. If your collection pages render through client-side JavaScript, those crawlers see an empty page.
Where collection data gets trapped
Most museum collection pages use a search-first architecture. A visitor types a query, JavaScript calls the database, results render dynamically. This works for human researchers. It creates a dead zone for AI crawlers.
Individual object pages often load descriptions, provenance, and exhibition history through JavaScript calls to the CMS backend. The raw HTML contains a template with placeholder elements and a script tag. Conductor's research found that ChatGPT and Perplexity crawl pages more frequently than Google, but unlike Googlebot, these AI crawlers do not execute JavaScript. They read raw HTML only.
Your most valuable content, the rich object records that distinguish your institution, is functionally nonexistent to the AI systems that shape how people discover cultural experiences.
Making collection content visible
The fix isn't rebuilding your website. It's ensuring collection data renders in the initial HTML crawlers receive. Three approaches work, each at different levels of effort.
Server-side rendering for collection pages. The most complete solution. Your web server queries the collection database and generates full HTML before sending it to the browser. Every object title, artist name, date, medium, and description exists in the page source. This requires development resources but produces pages fully visible to both AI crawlers and search engines.
Static HTML generation for high-value objects. You don't need to server-render your entire collection overnight. Start with your most visited objects, permanent collection highlights, and works tied to current exhibitions. Generate static HTML pages with full metadata embedded. Even 200 well-structured object pages give AI substantial material to draw from.
Structured data markup on existing pages. Even if pages still load content via JavaScript, you can inject JSON-LD schema markup into the page head. VisualArtwork, Museum, Event, and ExhibitionEvent schema types exist for this purpose. Schema doesn't replace HTML rendering, but it provides AI with structured signals about what your collection contains.
Beyond objects: the content AI actually surfaces
Collection records are the foundation, but AI platforms rarely recommend a museum by citing a single object. They recommend institutions based on what kind of experience a visitor will have. That means the content surrounding your collection matters as much as the data itself: exhibition narratives that explain why a show matters, visitor-facing descriptions that answer practical questions, thematic essays connecting objects to broader cultural questions people ask AI about.
Harvard Art Museums used computer vision to generate plain-language descriptions of artworks, adding tags based on what AI "saw" rather than curatorial classification. The result was collection data matching how non-expert visitors actually search. That accessible language is what AI models draw on when constructing recommendations.
The institutional advantage
Cultural institutions have something most businesses would pay millions to create: deep, authoritative datasets built over decades by subject matter experts. The challenge is purely accessibility. The data exists. The curatorial rigor is real. The gap is between where that data lives (your CMS) and where AI looks for it (your rendered HTML).
Every object record, exhibition description, and educational resource that becomes machine-readable is a potential citation source. The institutions that translate their collections into AI-readable formats now will compound that advantage with every model update.
A regional art museum we work with holds over 12,000 objects spanning five centuries. Their collection management system is thorough: provenance records, artist biographies, medium descriptions, historical context, exhibition history. When we disabled JavaScript and loaded their online collection page, we saw a search bar and nothing else. No object titles. No artist names. No descriptions. The entire collection was invisible to any system that cannot execute JavaScript, and that includes every major AI crawler operating today.
This is the paradox. Museums generate some of the most meticulously structured data of any organization. Artist names, dates, materials, dimensions, provenance chains, subject classifications, geographic origins. This is precisely the kind of factual, well-sourced information AI models use to construct answers. And most of it is locked behind interfaces AI cannot read.
The data AI wants already exists in your CMS
The largest institutions have recognized this. The Met provides open API access to metadata on over 470,000 artworks. The Smithsonian's Open Access program has released more than 11 million metadata records across 21 museums. Cleveland Museum of Art offers structured data on 64,000+ objects through a public API. Harvard Art Museums, the Getty, and the Rijksmuseum have followed similar paths.
These institutions didn't create new content for AI. They exposed existing data in machine-readable formats: REST APIs with JSON, downloadable CSV files, Linked Open Data endpoints, and IIIF-compliant image services. Collection records maintained for decades became raw material for a new kind of discoverability.
But here's what matters for AI visibility: having an API is not the same as having collection content AI crawlers can find on your website. GPTBot and PerplexityBot don't call your API. They crawl your HTML. If your collection pages render through client-side JavaScript, those crawlers see an empty page.
Where collection data gets trapped
Most museum collection pages use a search-first architecture. A visitor types a query, JavaScript calls the database, results render dynamically. This works for human researchers. It creates a dead zone for AI crawlers.
Individual object pages often load descriptions, provenance, and exhibition history through JavaScript calls to the CMS backend. The raw HTML contains a template with placeholder elements and a script tag. Conductor's research found that ChatGPT and Perplexity crawl pages more frequently than Google, but unlike Googlebot, these AI crawlers do not execute JavaScript. They read raw HTML only.
Your most valuable content, the rich object records that distinguish your institution, is functionally nonexistent to the AI systems that shape how people discover cultural experiences.
Making collection content visible
The fix isn't rebuilding your website. It's ensuring collection data renders in the initial HTML crawlers receive. Three approaches work, each at different levels of effort.
Server-side rendering for collection pages. The most complete solution. Your web server queries the collection database and generates full HTML before sending it to the browser. Every object title, artist name, date, medium, and description exists in the page source. This requires development resources but produces pages fully visible to both AI crawlers and search engines.
Static HTML generation for high-value objects. You don't need to server-render your entire collection overnight. Start with your most visited objects, permanent collection highlights, and works tied to current exhibitions. Generate static HTML pages with full metadata embedded. Even 200 well-structured object pages give AI substantial material to draw from.
Structured data markup on existing pages. Even if pages still load content via JavaScript, you can inject JSON-LD schema markup into the page head. VisualArtwork, Museum, Event, and ExhibitionEvent schema types exist for this purpose. Schema doesn't replace HTML rendering, but it provides AI with structured signals about what your collection contains.
Beyond objects: the content AI actually surfaces
Collection records are the foundation, but AI platforms rarely recommend a museum by citing a single object. They recommend institutions based on what kind of experience a visitor will have. That means the content surrounding your collection matters as much as the data itself: exhibition narratives that explain why a show matters, visitor-facing descriptions that answer practical questions, thematic essays connecting objects to broader cultural questions people ask AI about.
Harvard Art Museums used computer vision to generate plain-language descriptions of artworks, adding tags based on what AI "saw" rather than curatorial classification. The result was collection data matching how non-expert visitors actually search. That accessible language is what AI models draw on when constructing recommendations.
The institutional advantage
Cultural institutions have something most businesses would pay millions to create: deep, authoritative datasets built over decades by subject matter experts. The challenge is purely accessibility. The data exists. The curatorial rigor is real. The gap is between where that data lives (your CMS) and where AI looks for it (your rendered HTML).
Every object record, exhibition description, and educational resource that becomes machine-readable is a potential citation source. The institutions that translate their collections into AI-readable formats now will compound that advantage with every model update.
CONTACT US
