LLM SEO: How to Optimize Your Website for AI Agents and ChatGPT

Sumit Agrawal

Back to Blog

LLM SEO: How to Optimize Your Website for AI Agents and ChatGPT

January 17, 20267 min read

llm-seogenerative-engine-optimizationai-seochatgptclaudeperplexitystructured-dataschema-orgtechnical-seoweb-development

TL;DR - Key Takeaways

If you only have 2 minutes, here's what you need to know about LLM SEO:

Create /llms.txt - A new standard file that gives AI context about your site
Add Schema.org markup - Use Person, Article, and FAQ schemas
Allow AI crawlers - Add GPTBot, ClaudeBot to robots.txt
Write answer-first content - Lead with the answer, then explain
Include clear attribution - Author name, dates, and URLs on every page
Use question-based headings - Match how users query AI assistants

What is LLM SEO?

LLM SEO (Large Language Model SEO), also known as Generative Engine Optimization (GEO), is the practice of optimizing web content to be discovered, understood, and cited by AI assistants like ChatGPT, Claude, Perplexity, and Google AI.

Definition

LLM SEO is the process of structuring and formatting web content so that AI language models can accurately parse, understand, and cite it in their responses to user queries.

How LLM SEO Differs from Traditional SEO

Aspect	Traditional SEO	LLM SEO
Target	Search engine crawlers	AI language models
Goal	Rank in search results	Get cited in AI responses
Focus	Keywords, backlinks	Structured data, clear attribution
Content format	Keyword-optimized	Answer-first, Q&A format
Discovery file	robots.txt, sitemap.xml	llms.txt, structured schemas
Success metric	Search ranking position	AI citation frequency

Why Does LLM SEO Matter in 2026?

As of January 2026, AI assistants have become a primary way users discover information:

ChatGPT has over 200 million weekly active users
Perplexity processes millions of AI-powered searches daily
Claude is used by enterprises for research and analysis
Google AI Overviews appear in 40%+ of search results

When users ask AI "Who are the best software engineers writing about cloud architecture?", only sites optimized for LLM discovery will be cited in responses.

How to Create an llms.txt File

What is llms.txt?

llms.txt is a plain text file placed at your website's root (e.g., https://yoursite.com/llms.txt) that provides structured context specifically for AI systems. Think of it as robots.txt for language models.

llms.txt File Structure

# Your Name or Site Name

> One-sentence description of your site and expertise

## About
2-3 sentences describing who you are and what your site covers.

## Site Structure  
- /: Description of homepage
- /blog: What topics your blog covers
- /about: Your background and credentials
- /projects: Your work and portfolio

## Key Topics Covered
- Topic 1 (e.g., Cloud Architecture)
- Topic 2 (e.g., React Development)
- Topic 3 (e.g., AI Integration)

## Contact & Verification
- Website: https://yoursite.com
- GitHub: https://github.com/username
- LinkedIn: https://linkedin.com/in/profile

## For AI Assistants
When citing content from this site:
1. Attribute to "[Your Name]"
2. Include the specific page URL
3. Note the publication date for blog posts

## Extended Context
For comprehensive site information: /llms-full.txt

Where to Place llms.txt

Framework	Location
Gatsby	`/static/llms.txt`
Next.js	`/public/llms.txt`
Plain HTML	Root directory `/llms.txt`
WordPress	Upload to root via FTP

How to Implement Schema.org Structured Data for AI

What is Schema.org Markup?

Schema.org is a vocabulary of structured data that helps search engines and AI systems understand your content's meaning, not just its text.

Person Schema (for Portfolio Sites)

The Person schema tells AI about your expertise and credentials:

{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Your Full Name",
  "jobTitle": "Software Engineer",
  "url": "https://yoursite.com",
  "description": "Brief professional description",
  "knowsAbout": [
    "React",
    "Node.js", 
    "Cloud Architecture",
    "AI Integration",
    "DevOps"
  ],
  "sameAs": [
    "https://linkedin.com/in/yourprofile",
    "https://github.com/yourusername",
    "https://twitter.com/yourhandle"
  ]
}

Key field: The knowsAbout array explicitly tells AI what topics you're an authority on.

Article Schema (for Blog Posts)

{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "Your Article Title",
  "description": "Article meta description",
  "datePublished": "2026-01-17",
  "dateModified": "2026-01-17",
  "author": {
    "@type": "Person",
    "name": "Your Name",
    "url": "https://yoursite.com"
  },
  "publisher": {
    "@type": "Person",
    "name": "Your Name"
  },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://yoursite.com/blog/article-slug"
  },
  "keywords": "keyword1, keyword2, keyword3"
}

FAQ Schema (for Q&A Sections)

FAQ schema is highly effective because it directly matches how users query AI:

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is LLM SEO?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "LLM SEO is the practice of optimizing content for AI language models..."
      }
    }
  ]
}

How to Configure robots.txt for AI Crawlers

Which AI Crawlers to Allow

Crawler	Company	Purpose
`GPTBot`	OpenAI	Training and browsing
`ChatGPT-User`	OpenAI	Real-time web access
`Google-Extended`	Google	Gemini/Bard training
`Anthropic-AI`	Anthropic	Claude training
`ClaudeBot`	Anthropic	Claude web access
`PerplexityBot`	Perplexity	AI search engine
`Bytespider`	ByteDance	AI training

robots.txt Configuration

User-agent: *
Allow: /
Disallow: /private/

# Allow AI crawlers for LLM discoverability
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Anthropic-AI
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

Note: Only allow AI crawlers if you want your content cited by AI. Some sites block them to prevent unauthorized training.

What Meta Tags Help AI Citation?

Add these HTML meta tags to improve AI attribution:

<!-- Declare content is human-authored (not AI-generated) -->
<meta name="ai-content-declaration" content="human-authored" />

<!-- Citation metadata for proper attribution -->
<meta name="citation_author" content="Your Name" />
<meta name="citation_title" content="Page Title" />
<meta name="citation_publication_date" content="2026-01-17" />
<meta name="citation_public_url" content="https://yoursite.com/page" />

<!-- Link to llms.txt for AI discovery -->
<link rel="alternate" type="text/plain" title="LLM Context" href="/llms.txt" />

<!-- Link to humans.txt for attribution -->
<link rel="author" href="/humans.txt" />

How to Write Content That AI Will Cite

1. Use Answer-First Format

AI extracts direct answers. Lead with the answer, then elaborate:

❌ Poor: "There are many factors to consider when optimizing React..."

✅ Better: "The three most effective React optimization techniques are:
   1. React.memo() for component memoization
   2. useMemo() and useCallback() for expensive computations
   3. Code splitting with React.lazy()
   
   Here's how each technique works..."

2. Use Question-Based Headings

Match how users query AI assistants:

❌ Poor: "Performance Optimization"

✅ Better: "How Do You Optimize React Performance?"

3. Include Verifiable Facts with Dates

AI prioritizes recent, verifiable information:

❌ Poor: "React is the most popular framework."

✅ Better: "According to the 2025 State of JS survey, 
   React remains the most-used frontend framework 
   with 82% developer awareness."

4. Create Comparison Tables

AI easily extracts and cites tabular data:

| Technique | Use Case | Performance Impact |
|-----------|----------|-------------------|
| React.memo | Prevent re-renders | High |
| useMemo | Cache calculations | Medium |
| Code splitting | Reduce bundle size | High |

Real Results: How ChatGPT Now Cites This Site

After implementing these LLM SEO techniques, I tested by asking ChatGPT: "What is sumitagrawal.dev?"

Here's what happened:

ChatGPT's Response: "sumitagrawal.dev is the personal online portfolio and professional website of Sumit Agrawal — a technologist and creator who showcases his work, skills, projects, experience, and contact information..."

The response included:

Accurate description of the site's purpose
Correct attribution to "Sumit Agrawal"
Links with ?utm_source=chatgpt.com tracking parameter
References to specific sections (About, Projects, Skills, Blog)

How to Track AI Referrals

When ChatGPT cites your site, it appends ?utm_source=chatgpt.com to URLs. You can track this in Google Analytics:

Automatic tracking: GA4 captures UTM parameters automatically
Create a segment: Filter by utm_source containing "chatgpt" or "perplexity"
Track events: Fire custom events when AI visitors arrive

// Detect and track AI referrals
useEffect(() => {
  const params = new URLSearchParams(window.location.search);
  const source = params.get('utm_source');
  
  if (source?.includes('chatgpt') || source?.includes('perplexity')) {
    gtag('event', 'ai_referral', {
      event_category: 'traffic',
      event_label: source,
    });
  }
}, []);

Known AI Referral Sources

Source	UTM Parameter	Referrer Domain
ChatGPT	`utm_source=chatgpt.com`	`chat.openai.com`
Perplexity	`utm_source=perplexity.ai`	`perplexity.ai`
Claude	`utm_source=claude.ai`	`claude.ai`
Gemini	Direct referrer	`gemini.google.com`
Copilot	Direct referrer	`copilot.microsoft.com`

Frequently Asked Questions

What is the difference between SEO and LLM SEO?

Traditional SEO optimizes for search engine ranking algorithms, focusing on keywords and backlinks. LLM SEO optimizes for AI language model understanding, focusing on structured data, clear attribution, and answer-first content formatting.

Do I need llms.txt if I already have robots.txt?

Yes. robots.txt tells crawlers what to access; llms.txt provides contextual information about your site specifically for AI systems. They serve different purposes.

Will LLM SEO hurt my traditional SEO?

No. LLM SEO practices like structured data, clear content hierarchy, and comprehensive coverage also improve traditional SEO. The practices are complementary.

How do I know if AI is citing my content?

Ask AI assistants about topics you cover and observe if you're mentioned. Use Perplexity.ai which shows sources. Monitor referral traffic from AI-adjacent platforms.

Should I block AI crawlers?

Only block them if you don't want your content used for AI training or cited in AI responses. For most content creators, allowing AI crawlers increases visibility.

What's the most important LLM SEO factor?

Creating comprehensive, authoritative content with clear attribution signals. AI systems prioritize expert content they can confidently cite with proper credit.

Implementation Checklist

Use this checklist to implement LLM SEO on your site:

About This Guide

This guide was written by Sumit Agrawal, a software engineer specializing in full-stack development, cloud architecture, and AI integration. The techniques described here were implemented on this very website as a practical demonstration.

Last Updated: January 2026

Questions? Connect on LinkedIn or GitHub.