Does Vertex AI Use Your Schema Markup? Clarifying Data Needs for SEOs
Why your schema markup still matters for search, even if Vertex AI works differently: Clearing up the confusion between Schema.org for Google Search and data schemas within Vertex AI.
The rapid rise of Generative AI and platforms like Google Cloud’s Vertex AI is changing how businesses think about data and search. For SEO professionals and marketers accustomed to using Schema.org markup to enhance visibility in traditional Google Search, a critical question emerges: Does Vertex AI, Google’s powerful AI platform, directly use the Schema.org markup embedded on websites?
The short answer is no, not in the way Google Search does. This article clarifies the distinction and explains how each system uses “schemas.”
Table of Contents
- Does Google Vertex AI Rely on Schema Markup?
- How Vertex AI Actually Gets Its Data
- The Graph Concept as a Representation Model
- Tips for Navigating AI-Powered Search
- How Schema MarkUp Helps Search Engines Recognize Content
- SUMMARY: Explore Google Vertex AI and Useful Schema Markup
Does Google Vertex AI Rely on Schema Markup?
Many linked data cloud users are seeking to clarify if Vertex AI services directly consume or depend on Schema.org markup found on websites. Searching methods are changing. How search works is changing. LLM integration is a reality. The 10 blue links search engine result pages (SERPs) are being replaced.
What does “updating schemas” mean in Vertex AI?
What specific schemas is it talking about? Schema markup? No. We need the ability to place key distinctions and context.
In the context of Vertex AI, “updating schemas” refers to modifying the structure or organization of data. This could involve adding new fields, changing data types, or altering relationships between data elements, typically within a database or data pipeline.
The “schemas” referred to in Google Cloud Update a Schema documentation are NOT the same as the Schema.org markup used by Google Search on public web pages. The documentation relates to Google’s Generative AI App Builder, which is now part of Vertex AI Search and Conversation.
Table Demonstrating the Different Contexts and Purposes for Schemas
Criterion | Schema.org Markup (for Google Search) | Data Store Schemas (for Vertex AI Search / Gen App Builder) |
---|---|---|
Purpose | A standardized vocabulary (using formats like JSON-LD, Microdata, RDFa) embedded in the HTML of public web pages. | Defines the structure, data types, and indexing options for the specific data you upload into your private data store within the Vertex AI Search service. |
Goal | To help general web crawlers (like Googlebot) recognize the content and context of a web page to improve search results (e.g., enabling rich snippets, Knowledge Graph entries). | To tell Vertex AI Search how to index, filter, facet, search, and retrieve the documents you provide for your specific search application. |
Scope | Public web, standardized types (Product, Recipe, Event, etc.). | Private to your Google Cloud project and specific data store. User-defined or inferred based on uploaded data (often JSON). |
Consumption | Primarily by Google Search crawlers. | By the Vertex AI Search service to power the search/recommendation/conversational application you are building. |
What I understand is that the “schemas” in that documentation refer to the configuration of your specific data store within a Google Cloud service, not the public Schema.org vocabulary used by traditional Google Search.
So, how we prepare and ground our data for Vertex AI or Agentic AI matters.
How Vertex AI Actually Gets Its Data
So, if Vertex AI isn’t scraping your website’s Schema.org markup, how does it get information? As AI platforms become primary tools for search, structuring your content aids parsing and agentic understanding of the information presented. This is crucial for providing direct and clear answers in AI-driven search experiences.
Vertex AI primarily relies on structured data explicitly provided by the user through specific channels:
- APIs: Directly sending structured data (often JSON) via API calls.
- Google Cloud Storage (GCS): Uploading files (like JSON Lines, CSV) containing structured data.
- BigQuery: Connecting directly to tables containing your structured data.
- Vertex AI Feature Store: Ingesting pre-defined features.[1]
Within these methods, you typically provide structured data using fields like struct_data (using Google’s standard Protobuf Struct format) or json_data (providing a JSON string) within the Document objects you send to services like Vertex AI Search. The focus is on providing data formatted according to Vertex AI’s requirements, not embedding Schema.org on external websites.
While Vertex AI doesn’t directly consume your website’s Schema.org markup like Google Search does, understanding the underlying concepts of how AI systems process structured information helps clarify why both types of structured data are valuable in their respective contexts.
The Graph Concept as a Representation Model
Let’s take a look at how Graph Databases or Knowledge Graphs play a part in underlying AI technologies.
Both Google Search (when processing Schema.org and other signals) and Vertex AI services (when processing your uploaded data) need to recognize entities, facts, and relationships connecting them. The Google Knowledge Graph strives to organize this data.
In theory, Vertex AI uses graph concepts internally to model relationships within your private, structured data for better machine learning. Google Search uses schema markup externally, on the open web, to recognize the semantic meaning of public content. It may leverage a standardized graph vocabulary (Schema.org) to “feed” its Knowledge Graph and enhance search results. Google may use the data from your schema markup to update, augment, or confirm information about that entity within the Knowledge Graph, but it doesn’t directly ingust it.
Both use graphs to represent structured information, but for fundamentally different audiences and objectives.
Components of structured information:
- Entities as Nodes: This relates to real-world things (your company, a specific product, a location) as points or nodes in a network. A node is a single point within the graph. The “entire known data graph of an entity” would encompass the node itself, all its properties, all its direct relationships (edges) to other nodes, and potentially those neighboring nodes as well.
- Facts as Properties & Edges: Information about these entities (e.g., a product’s price, a company’s address) can be seen as properties attached to these nodes. How entities relate to each other (e.g., ‘Product X’ is manufactured by ‘Company Y’, ‘Event Z’ takes place at ‘Location A’) are represented by connections or edges between the nodes.
- Numerical values as factual representations (Embeddings): In machine learning contexts related to graph embeddings or knowledge graph embeddings (for potential training or use within Vertex AI), nodes are frequently mapped to numerical vectors (embeddings). These dense numerical vectors encode information about the node’s properties and relationships within the graph structure.
In traditonal Google Search (and possibly in AI Overview answers), this is often supplied by schema markup on a website. When Google and Bing draw data from a website, schema markup is helpful to overcome content and entity ambiguity issues.
This graph-like way of representing structured information allows systems to recognize context and answer complex questions. Using semantic triples to explain data relationships is extremely helpful. It is part of creating good content structure. I like to focus on a continued building on the web’s core data infrastructure while increasing the capacity of a web application.
Table: Vertex AI vs. Google Search: Data Handling Comparison
Feature | Vertex AI | Google Search |
---|---|---|
Primary Data Source | Structured data explicitly provided by the user. | Content crawled from public websites. |
Data Input Method | Via APIs, Google Cloud Storage (GCS), BigQuery uploads. | Automated web crawling (Googlebot). |
Relevant “Schema” Type | Platform-specific configurations (e.g., defining structure for struct_data , json_data fields). | Schema.org markup (JSON-LD, Microdata, RDFa) embedded in web page HTML. |
Main Purpose of Using Data | To power specific AI applications (search, recommendations, etc.) built by the user on the platform. | To recognize and interpret public web content, enable Rich Results in SERPs, and feed the Knowledge Graph. |
How to prepare data for ingesting when using Vertex AI Agent Builder:
If you plan to import data from Cloud Storage with metadata, put a JSON file that contains the metadata into a Cloud Storage bucket whose location you provide during import.
“Structured data
Prepare your data according to the import method that you plan to use. If you plan to ingest media data, also see Structured media data.
You can import structured data from the following sources:
- Cloud Storage.
- Local JSON data.
- Third-party data sources.
When you import structured data from BigQuery or from Cloud Storage, you are given the option to import the data with metadata. (Structured with metadata is also referred to as enhanced structured data.)” – Vertex AI Agent Builder: Prepare data for ingesting
Tips for Navigating AI-Powered Search
While the Vertex AI and Schema.org distinction is key, we focus on understanding how it fits into the broader evolution of search. Here are related considerations for SEOs:
The future of AI SEO, or Generative Engine Optimization (GEO), or Answer Engine Optimization (AEO), lies in the strategic integration of AI into broader marketing and content strategies.
As AI continues to evolve, its technology and insights will become a central part of shaping more effective SEO campaigns. Keeping pace with AI-driven updates to search engine algorithms—shifting from tactical execution to strategic oversight means that SEO professionals need to adapt nonstop. A human with imagination, creativity, and a keen understanding of this evolving landscape must always be at the helm.
To help SEO’s and Businesses transition effectively, Hill Web Marketing recommends:
- Creating human curated/edited, comprehensive content, factual/sourced content, and converting AI-friendly content. This is where human editors play a vital role in refining AI-generated content, ensuring accuracy, adding nuance, and aligning the output with specific goals.
- Focusing on niche expertise articles and use topic expert authors.
- Optimizing to satisfy user experience signals, contextual meaning and the intent behind a user’s search query.
- Checking your data quality includes steps to ensure data accuracy, completeness, relevance, and freshness. Go to the Data quality page in the Search for commerce console and check your “Critical Threshold” score. [2]
- Prioritizing credibility and trust by avoiding low-effort content and adding clear value to what is already on the web.
Gaining a strong data footprint in Google’s AI Overviews means adapting your strategies now to secure a competitive advantage and maintain SERP visibility.
What Can We Conclude About the Relationship Between Google Vertex AI and Schemas?
- Direct answer: Within my current experience, I cannot say that Google Vertex AI directly relies on Schema.org markup embedded on websites as a primary data source.
- Clarification of what we do know: Vertex AI heavily relies on structured data formats. This data is typically provided through dedicated channels (APIs, GCS, BigQuery, etc.). It is consumed in formats defined by Vertex AI services themselves.
- Schema markup’s indirect connection: Schema.org data could potentially be extracted from websites, processed, structured into a suitable format (e.g., BigQuery table), and then fed into Vertex AI; however, this is an indirect data pipeline, versus a native reliance within Vertex AI.
- Key takeaway: Focus on preparing and structuring data that aligns to the current specific requirements of the Vertex AI service being used. It is best to not assume it consumes or is reliant on web Schema.org markup.
Gaining “word” clarity: “Schema” is often used informally to refer to the structure you put inside struct_data or json_data. However, the actual field name for transmitting that data in the Vertex AI API here is struct_data or json_data.
“Vertex AI uses tabular (structured) data to train a machine learning model to make predictions on new data.
You can import data either from your computer or from Cloud Storage in an available format (CSV or JSON Lines) with the labels (and bounding boxes, if necessary) inline. For more information on import file format, see Preparing your training data. If you want to split your dataset manually, you can specify the splits in your CSV or JSON Lines import file.” – Google AutoML Beginner’s Guide
Legacy schema terms and methods need clarifying as AI is rapidly evolving
Official Google Cloud documentation for Vertex AI Search and Conversation clearly outlines the use of struct_data and json_data for incorporating structured data into documents for indexing and search features (like filtering, faceting, boosting).
How Schema MarkUp Helps Search Engines Recognize Content
I had a great conversation with Jarno Van Driel about how Google uses schema markup to “recognize content” versus “understand content.”
Jarno is a true expert at placing the value and use of structured data markup. AI agents do not possess consciousness, subjective experience, emotions, or intentionality in the same way that humans do when we “think.” They operate based on complex algorithms, the data they have been trained on, and the rules that govern their functionality.
While AI can perform increasingly sophisticated processing, such as Google’s ability to “understand” or “recognize” content through schema markup, Or Vertex AI data models, we are talking about a level of interpretation and contextual awareness.
The context of using the term “intelligent agents” refers to Vertex AI’s ability to perform tasks effectively and autonomously within their defined parameters. Its “reasoning logic” describes the algorithmic processes they follow to arrive at conclusions or actions, which is different from human reasoning that involves consciousness and a broader understanding of how people find solutions, products, and services.
Therefore, to directly answer your question: Based on the provided sources and our conversation history, AI agents do not “think” or “understand” in the human sense of the word, which involves consciousness, subjective experience, and genuine understanding. Instead, they are sophisticated tools that can process information, make decisions, and perform tasks in ways that mimic certain aspects of human intelligence through complex algorithms and data analysis. They are designed to automate tasks, provide insights, and enhance interactions, but their operation is fundamentally different from human thought processes.
My guess is that Google uses schema markup to better process and interpret the content on a webpage, which goes beyond mere “text recognition.”
Why and when schema markup is still important?
Just because Vertex AI doesn’t directly rely on your website’s Schema.org doesn’t mean it’s unimportant! Schema markup remains critical for:
- Helpful to Google Search: While the key benefit is providing semantic meaning that is consumed by Google’s core search engine, signals indicate that it continues to support visibility, rich snippets, and entity disambiguation.
- Potential Future Use: While not currently the case, acknowledge the possibility (though speculative) of future integrations.
- Feeds the Knowledge Graph: It potentially influences how you appear in Knowledge Panels and other entity-focused search features.
- Good Data Practice: Implementing structured data (or schema markup) reflects good structured data practices, which might facilitate easier extraction and transformation for other systems (like Vertex AI pipelines) later.
While Google is getting better at content recognition without it, schema markup remains a powerful tool recommended by Google itself. Ignoring it means potentially missing out on significant visibility and CTR benefits in Google Search.
Your schema might confirm or add details like the official logo, founder information, contact details, location, product attributes, event details, etc., to the Knowledge Graph’s representation of that entity. When using Vertex AI, the best thing is to adhere to its specific requirements for structured data.
Let’s wrap it up.
Table of Key Differences: Vertex AI Graph Concept vs. Google Search Schema
Feature | The Graph Concept in Vertex AI (Structured Data) | Schema Markup Consumption by Google Search |
---|---|---|
Primary Goal | Improve ML model performance on a specific dataset | Enhance search engine recognition and enable Rich Results |
Data Source | Internal structured data (BigQuery, SQL, CSV) | Public web page content (HTML) |
Representation | Often implicit graph traversal for features; explicit for GNNs | Explicit semantic markup using Schema.org vocabulary |
Consumer | ML models, Data Scientists within Vertex AI / GCP | Google Search engine, Google Knowledge Graph |
Scope | Internal to a specific ML project/pipeline | Public Web / Global Search Ecosystem |
Control | User controls data and processing within Vertex AI | Website owner provides markup; Google controls consumption |
Nature of Graph | Represents relationships within the provided dataset | Represents semantic meaning on a webpage for external use |
In my opinion weak, vague, SEO’d, mall-formed, obsolete or redundant schema markup may be a contributing factor to Google relying on it less in the future. I’d rather see Google announce markup will be used to enrich AIO results.
Google’s Open Multimodal Model with Long Context & Vision seeks to reduce hallucinations by implementing in-context attribution techniques to minimize factual errors. It may consider this a more trustworthy approach to data facts than manually influenced schema markup. [3]
“With the AI revolution sweeping everything in marketing and technology, I believe a better framing is systems of context and systems of truth. But the defining characteristic of the martech “stack” in an AI world is going to be context and the truth it’s wrapped around.” – Scott Brinker [4]
SUMMARY: Explore Google Vertex AI and Useful Schema Markup
Focus on preparing data according to the specific requirements of the Vertex AI service you’re using, while continuing to implement robust Schema.org markup on your website for its crucial role in Google Search visibility and entity search. The key takeaway is to leverage either or both methods to transform raw data into actionable insights for business growth.
As an AI integration consultant, Hill Web Marketing helps businesses sort through what effort, impact, and timing will make a worthwhile difference. We look for real results that matter to you.
Call us at 651-206-2410 to incorporate a semantic learning approach by Using Predictive Search to Match User Intent
Resources:
[1] https://cloud.google.com/vertex-ai/docs/featurestore/latest/overview
[2] https://cloud.google.com/retail/docs/data-quality
[3] https://developers.google.com/search/docs/appearance/structured-data/intro-structured-data
[4] https://chiefmartec.com/2025/02/meet-the-new-martech-stack-systems-of-context-and-systems-of-truth/