When a sitemap actually helps — and when it doesn't
A sitemap helps Google discover URLs it might not find through crawling. It's most valuable for: large sites (1,000+ pages) where crawl budget matters, sites with pages that aren't well-linked internally, and new sites with few external backlinks. For a small site with strong internal linking, Googlebot will typically discover all pages through crawling anyway — a sitemap speeds up initial indexing but isn't the difference between getting indexed or not.
Submitting a sitemap in Google Search Console is the most reliable way to verify that Google has it. The console shows how many URLs Google has discovered vs. how many you submitted — a gap here indicates crawling or indexing issues worth investigating.
Which sitemap fields Google actually uses
| Tag | Google uses it? | Notes |
|---|---|---|
| <loc> | Yes — required | The URL. Must be absolute, including protocol and www/no-www consistently |
| <lastmod> | Sometimes | Used to prioritize recrawl of recently changed pages. Format: YYYY-MM-DD |
| <changefreq> | Ignored | Google documented in 2023 that it ignores this field |
| <priority> | Ignored | Google documented in 2023 that it ignores this field |
| <image:image> | Yes | Required for image search indexing if images are not otherwise linked |
| <video:video> | Yes | Helps Google understand video content; requires duration and thumbnail URL |
| <xhtml:link> | Yes | hreflang for multilingual sites — specify alternate language URLs here |
Sitemap index for large sites
A single sitemap file is limited to 50,000 URLs and 50 MB uncompressed. Sites above this threshold use a sitemap index file — an XML file that lists multiple sitemap files. Each sub-sitemap can cover a section of the site (e.g., one for blog posts, one for product pages, one for category pages). This also makes it easier to see in Search Console which sections are being indexed vs. which have coverage issues.
