How to use wildcards in robots.txt?
The first is that you don’t need to append a #wildcard to every string in your robots.txt. It is implied that if you block
/route-foo/, you want to block everything in this directory and do not need to include a wildcard (such as
The second thing you need to know is that there are actually two different types of wildcards supported by Google:
* wildcard character will simply match any sequence of characters. This is useful whenever there are clear URL patterns that you want to disallow such as filters and parameters.
The $ wildcard character is used to denote the end of a URL. This is useful for matching specific file types, such as .pdf.
Block search engines from accessing any URL that has a ? in it:
User-agent: * Disallow: /*?
Block search engines from #crawling any URL a search results page (query?kw=)
User-agent: * Disallow: /query?kw=*
Block search engines from crawling URLs in a common child directory
User-agent: * Disallow: /*/child/
Block search engines from crawling URLs in a specific directory which 3 or more dashes
User-agent: * Disallow: /directory/*-*-*-