First published: 7 June 2026 · Last updated: 7 June 2026
The biggest silent change in SEO between 2023 and 2026 is not algorithm updates or AI Overviews. It is that Google indexes less. Crawl frequency per URL has dropped across the long tail of the web, and the quality bar for inclusion in the index has risen. Sites that historically had 95 percent index coverage now sit at 70 to 80 percent, and the missing URLs are typically not technical errors but quality and crawl-priority decisions made by Google's indexing pipeline. This guide is the practitioner's view of what changed, why, how to diagnose the most common 2026 indexing failures, and the sequenced fixes that work.
For broader algorithm-update context, our Google algorithm guide covers the major updates from 2022 to 2026. For helpful-content recovery specifically, our HCU recovery deep-dive covers the post-March 2024 quality recalibration. This article complements both with the technical indexing layer: the GSC report patterns, the crawl-budget mechanics, and the per-status fix sequences.
What Actually Changed in Google's Indexing Pipeline
Three forces converged between 2024 and 2026 to produce the "crawl less, care more" pattern.
Force 1: Cost-of-crawling pressure. Google's compute and storage costs for the index grew faster than ad revenue from long-tail SERPs. The economic logic of indexing every URL stopped pencilling out. The 2024 to 2026 response was selective crawl: prioritise URLs likely to satisfy actual queries, deprioritise URLs that historically generate few impressions per index-cost.
Force 2: AI-era quality evaluation. With AI Overviews taking citation share at the top of SERPs, Google's index increasingly serves a "extract for AI" function rather than a "rank for blue links" function. The quality bar for inclusion shifted toward "extractable content with verifiable claims and entity authority", which is a higher bar than the prior "relevant and not spam" bar.
Force 3: AI-generated content saturation. The 2023 to 2025 explosion of AI-generated content forced Google to apply stricter quality filtering before indexing. Pages flagged by Google's AI-content-quality classifiers as low-effort or undifferentiated face reduced crawl priority and higher discovered-not-indexed rates. This affects legitimate human-authored thin content as well as actual AI-generated spam.
The net effect on the SEO practitioner. The technical baseline of "URL exists, returns 200, has unique content, gets indexed" is no longer reliable. Indexing is conditional on quality signals and crawl-priority economics, evaluated continuously. Pages that pass the technical baseline can still sit in Discovered-not-indexed for months.
The GSC Index Coverage Report: 2026 Edition
The Pages report (formerly Index Coverage) in Google Search Console is the single most important diagnostic surface. The 2026 status taxonomy and what each means in practice.
The two statuses that dominate 2026 indexing problems are Discovered-currently-not-indexed and Crawled-currently-not-indexed. They share a name structure but have categorically different root causes and fix sequences.
Discovered – Currently Not Indexed: The Crawl Priority Problem
"Discovered – currently not indexed" means Google knows the URL exists (it appeared in a sitemap, internal link, or external link) but has not yet crawled it. This is a crawl-priority decision, not a quality decision. The page has not yet been evaluated.
The 2026 root causes, in order of frequency.
Cause 1: Weak internal priority. The URL is internally linked from few or low-authority pages on the site. Google's crawl scheduler uses internal link signals as a proxy for which URLs the site itself considers important. URLs that are only linked from the sitemap or deep navigation pages get low crawl priority.
Cause 2: Site-wide low crawl budget allocation. The site as a whole has been allocated low crawl budget by Google's crawl scheduler, typically because past crawls returned low-value pages relative to crawl cost. New URLs on such sites wait longer.
Cause 3: URL pattern matches a known low-value template. Google's crawl scheduler learns site-level patterns. If your /blog/ URLs historically rank well, blog URLs get crawl priority. If your /tag/ or /category/ URLs historically rank poorly or generate duplicate content flags, new URLs in those patterns get deprioritised.
Cause 4: Content overlap with already-indexed pages. If the URL appears semantically similar to existing indexed URLs on the site, Google may defer crawling under the assumption the new URL is duplicate or near-duplicate.
Cause 5: Faceted navigation explosion. Sites generating tens of thousands of filter URLs (colour=red&size=L&brand=X) consume crawl budget on low-value combinations, leaving genuine new URLs at the back of the queue.
The fix sequence for Discovered-not-indexed:
- Audit internal links to the URL. Use Screaming Frog or similar to find all internal links to the affected URL. If under 3 internal links from non-navigation pages, add 5 to 10 contextual internal links from relevant content.
- Check the URL pattern history. Group affected URLs by template (/blog/, /products/, /case-studies/). If a specific template dominates, audit that template for thin content or duplicate patterns.
- Eliminate index bloat. If the site has 50,000 indexable URLs but only 8,000 of meaningful intent, prune. Add noindex to category, tag, paginated-deep, faceted-deep, and parameter-noise URLs. The crawl budget that was wasted on these gets redirected.
- Manually request indexing for high-priority URLs. GSC URL Inspection tool, Request Indexing button. Limit to 10 to 20 priority URLs per day to avoid throttling.
- Wait 14 to 28 days for the next crawl cycle. Re-check status. If unchanged after 28 days, escalate to per-page quality audit (the URL is likely failing both crawl-priority and would-fail quality on crawl).
Crawled – Currently Not Indexed: The Quality Problem
"Crawled – currently not indexed" means Google crawled the URL, evaluated it, and decided not to include it in the index. This is a quality decision. The fix is at the page level, not the crawl-priority level.
The 2026 root causes, in order of frequency.
Cause 1: Thin or low-effort content. Page is under 300 words of unique content, or content is heavily templated with little differentiation from sister pages on the site. AI-content classifiers also flag low-effort content here.
Cause 2: Duplicate or near-duplicate of existing indexed page. Page is semantically very similar to an already-indexed URL on the same site or another site. Google chooses not to add a duplicate.
Cause 3: Failing quality signals (E-E-A-T). Page lacks named author, dates, citations, expertise signals. Google's quality classifiers flag the page as low-trust. See our E-E-A-T 2026 deep-dive for the signal hierarchy.
Cause 4: Poor user experience signals. Page fails Core Web Vitals (especially the new INP threshold), has aggressive interstitials, or has mobile usability problems. See our INP for SEO guide for the responsiveness layer.
Cause 5: Suspected AI-generated low-value content. Page exhibits patterns Google's AI classifiers flag as undifferentiated AI output: generic openings, lack of specific entities, predictable structures, no original data. Affects legitimate AI-assisted content as well as fully AI-generated content.
The fix sequence for Crawled-not-indexed:
- Per-page audit against quality bar. Word count under 800? Unique value-add unclear? Generic structure? If yes, rewrite.
- Add original data, named entities, and verifiable claims. Replace generic prose with specific examples, real numbers, named sources.
- Add proper authorship. Named author, datePublished, dateModified, sameAs links. See AEO content framework.
- Address technical UX failures. Fix INP, fix CLS, remove aggressive interstitials, fix mobile-usability errors in GSC.
- Internal link from authoritative pages. Build 3 to 5 internal links from your strongest pages to the affected URL.
- Request reprocessing. GSC URL Inspection > Request Indexing.
- Wait 28 to 60 days. Quality reassessment is slower than crawl-priority reassessment. Patience is required.
If the page is still Crawled-not-indexed after 60 days post-fix, the realistic conclusion is that the page should not exist as an indexable URL. Consolidate into a parent page, redirect, or remove.
Index Bloat: The Hidden Driver of Indexing Failures
Index bloat is the most underdiagnosed cause of indexing problems on SG SMB sites. Bloat happens when the indexable URL count vastly exceeds the count of URLs with genuine search intent. The classic patterns:
The pruning principle: every URL in your index should have a clear, distinct search intent that justifies its existence. URLs that exist only because the CMS generates them automatically (tag archives, date archives, paginated 5+) should be noindexed by default.
The expected outcome of comprehensive bloat pruning. Crawl budget that was being spent on low-value URLs gets redirected to your genuine content URLs. Discovered-not-indexed rates typically drop 30 to 60 percent within 28 days of a thorough prune. This is one of the highest-leverage technical SEO interventions available in 2026.
The Diagnostic Workflow
The reliable workflow we run on every indexing audit.
Export full status breakdown. Group by status. Note which statuses have abnormal counts.
Screaming Frog or Sitebulb. Crawl from sitemap and from homepage. Compare URL counts against GSC indexable count.
Group crawled URLs by template. Flag templates with low-search-intent (tag archives, faceted, paginated deep).
Pull 10 Discovered-not-indexed and 10 Crawled-not-indexed URLs. Inspect each for the root cause patterns above.
Sequence: bloat prune first, internal-link strengthening second, per-page quality fixes third, request indexing last.
GSC reflects crawl-budget reallocation in 14 to 28 days. Quality re-evaluation can take 60 days.
The most common mistake at the diagnostic stage is jumping straight to "request indexing" for every affected URL. This treats the symptom (URL not indexed) rather than the cause (poor crawl priority or low quality), and Google's manual reindex queue throttles aggressive use. The correct sequence is bloat prune first, then internal link, then per-page quality, then request indexing only for the priority URLs that remain stuck.
How AI Crawlers Affect Google's Crawl Budget
A 2026 wrinkle. Sites are now crawled by Googlebot plus a fleet of AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Bytespider, OAI-SearchBot, and others). Total crawl traffic on most sites has increased significantly.
This does not directly affect Google's crawl budget allocation (the AI crawlers do not feed Google's index), but indirectly it does in two ways:
- Server load constraints. If your origin or CDN responds slowly under combined crawler load, Googlebot detects the slow response and throttles its own crawl rate to avoid impacting users.
- Resource budget on render-heavy sites. Sites that depend on JavaScript rendering for content delivery pay a higher per-crawl cost. Combined crawler load amplifies the cost.
The mitigation. Audit your robots.txt for selective crawler permissions (covered in our AI crawlers guide). Ensure your CDN handles crawler load efficiently. Pre-render or static-render content where possible to reduce per-crawl resource cost.
The Render Budget: An Underdiagnosed Indexing Drag
Many SG sites built on JavaScript frameworks (React, Vue, Next.js with client-side rendering) hit render-budget problems before they hit crawl-budget problems. Google's two-stage indexing (initial HTML crawl, then JavaScript render in a separate queue) means JS-rendered content faces additional latency before indexing.
The 2026 reality: Google's render queue is more constrained than the crawl queue. Sites that depend on client-side rendering for primary content can sit in the render queue for days or weeks before the rendered content is indexed. The fix is server-side rendering (SSR), static site generation (SSG), or partial pre-rendering of critical content.
The diagnostic check: use the URL Inspection tool > "View tested page" > "More info" > "Screenshot" tab. If the screenshot shows blank or partial content where the actual page has content, your JavaScript rendering is the indexing constraint. The fix is at the framework level, not at the GSC level.
A Worked Example: Indexing Recovery on a SG Ecommerce Site
Concrete worked example. Client: SG ecommerce site, 12,000 indexable URLs claimed in sitemap, 4,200 actually indexed (35% coverage). Goal: 80% coverage within 90 days.
The audit findings:
- 8,400 URLs in Discovered-not-indexed. Predominantly faceted navigation URLs (colour, size, brand combinations) and paginated category pages.
- 1,800 URLs in Crawled-not-indexed. Predominantly thin product variant pages and near-duplicate product descriptions copied from manufacturer.
- Sitemap had 12,000 URLs, of which 7,800 were filter URLs that should not have been included.
The fix sequence over 90 days:
- Week 1-2: Bloat prune. Removed 7,800 filter URLs from sitemap. Added noindex to faceted URLs. Added robots.txt exclusion for parameter combinations. Result: indexable URL count dropped to 4,200, matching the actual unique-intent count.
- Week 3-4: Internal link audit. Strengthened internal links to deep category and product URLs. Added contextual links from blog content. Average internal links per product URL rose from 1.8 to 4.6.
- Week 5-8: Per-page quality on Crawled-not-indexed. Rewrote 1,800 thin product pages with original copy, added schema, fixed CWV. Added named author for content pages.
- Week 9-12: Manual reindex requests for priority URLs. 200 priority product URLs and 50 priority category URLs requested via URL Inspection.
Outcome at 90 days: index coverage rose to 89% of the cleaned 4,200-URL sitemap. Organic traffic to indexed product pages rose 31% over the same period (mix of more URLs ranking and existing URLs ranking better). Crawl budget telemetry showed 3.2x more URLs crawled per day post-cleanup, despite the smaller indexable footprint.
This is the realistic profile of a 2026 indexing recovery. The work is mostly bloat-cleanup and per-page quality, not technical SEO trickery. The recovery is measurable but not instant; 90 days is the reasonable cycle for a site of this scale.
Frequently Asked Questions
How long should I wait before considering "Discovered – currently not indexed" a problem?
For a new URL on a healthy site, 14 to 28 days is normal. Beyond 28 days indicates a crawl-priority problem worth diagnosing. For a site with site-wide low crawl budget, the threshold extends to 45 to 60 days. The variable is the site's crawl-budget allocation, not the URL itself.
Can I force indexing by submitting URLs through the Indexing API?
Only for specific content types. Google's Indexing API officially supports JobPosting and BroadcastEvent (livestream) content. Use for other URL types is against guidelines and produces no indexing lift; some users have reported temporary indexing followed by removal. The supported channels for non-job/broadcast content are sitemap submission, internal linking, and URL Inspection request indexing.
Does removing thin or old content help indexing of new content?
Yes, indirectly. Pruning thin or low-value URLs reduces index bloat, which reallocates crawl budget toward URLs you do want indexed. The mechanism is not direct ("removed URL X causes URL Y to be indexed faster") but aggregate ("less wasted crawl budget improves overall site crawl rate"). Pruning is one of the highest-leverage interventions in 2026.
What is the relationship between indexing and AI Overview citation?
A page must be indexed to be cited in AI Overviews. Pages in Discovered-not-indexed or Crawled-not-indexed are invisible to AI synthesis. The indexing recovery work covered here is the prerequisite for the citation work covered in our SGE and AI Overviews guide. The two should be treated as sequential, not parallel.
Should I use canonical tags or noindex to handle near-duplicate content?
Use noindex for content you do not want in the index at all. Use canonical for variant URLs that should be consolidated to a single representative URL. Filter URLs in ecommerce: usually noindex (they should not be in the index). Print-friendly variants of an article: usually canonical (consolidate to the primary URL). Get this wrong and you signal contradictory intent to Google, which leads to inconsistent indexing.
Why did my pages drop out of the index after a quality update?
The Helpful Content Update sequence (March 2024 onward) and ongoing quality recalibrations periodically re-evaluate previously-indexed content. Pages that fall below the new quality bar move to Crawled-not-indexed. The fix is per-page quality work, not technical SEO. See our HCU recovery guide for the recovery sequence.
