How to use wildcards in robots.txt?

Edit answer History
How to use wildcards in robots.txt?

The first is that you don’t need to append a #wildcard to every string in your robots.txt. It is implied that if you block /route-foo/, you want to block everything in this directory and do not need to include a wildcard (such as /route-foo/*).

The second thing you need to know is that there are actually two different types of wildcards supported by Google:

Wildcard character

The * wildcard character will simply match any sequence of characters. This is useful whenever there are clear URL patterns that you want to disallow such as filters and parameters.

The $ wildcard character is used to denote the end of a URL. This is useful for matching specific file types, such as .pdf.

Examples

Block search engines from accessing any URL that has a ? in it:

User-agent: *
Disallow: /*?

Block search engines from #crawling any URL a search results page (query?kw=)

User-agent: *
Disallow: /query?kw=*

Block search engines from crawling URLs in a common child directory

User-agent: *
Disallow: /*/child/

Block search engines from crawling URLs in a specific directory which 3 or more dashes

User-agent: *
Disallow: /directory/*-*-*-
Answer contributors
Аркадий Базаров Андрей Ликарин
Authors of the answer are users who wrote the first or added to an existing answer. Other types of contributions to answering questions are also taken into account.
Top contributors
Аркадий Базаров
Аркадий Базаров

С недавних пор, безработный нигилист и технический писатель. Большой п…

Андрей Ликарин
Андрей Ликарин

Учитель русского языка и литературы

About Answeropedia

Answeropedia is like Wikipedia, only for questions and answers. You ask a question and get one complete, comprehensive and competent answer from the community.

About question
  • 2 contributors
  • Updated 24 Sen at 12:43 pm