########################################################################### ## Start New Site Disallows ## ########################################################################### # Added Disallow on /notifyme.asp back since we still get occasional requests to this deleted page 03-25-2016 AB # Added /mobileinfo.asp to the global disallow. AB 03-30-16 # Fixed typos for Yeti and Youdao. CB 02-23-2018 User-agent: * Disallow: /cd/ Disallow: /checkout/ Disallow: /dvd/ Disallow: /demo/ Disallow: /giftcert/ Disallow: /login.asp Disallow: /images/default_coverart.gif Disallow: /javascript/tagcreator.js Disallow: /iframe.asp Disallow: /wishlist.asp Disallow: /notifyme.asp Disallow: /mobileinfo.asp # https://aspiegel.com/petalbot User-agent: PetalBot Disallow: / # http://www.a6corp.com/a6-web-scraping-policy/ User-agent: A6-Indexer/1.0 Disallow: / User-agent: ADmantX Disallow: / User-agent: admedia Disallow: / # http://ahrefs.com/robot/ User-agent: AhrefsBot Disallow: / User-agent: betaBot Disallow: / User-agent: BLEXBot Disallow: / User-agent: booglebot Disallow: / User-agent: BUbiNG Disallow: / User-agent: CCBot Disallow: / User-agent: Cliqzbot Disallow: / # http://ews.converasearch.com/crawl.htm User-agent: ConveraCrawler Disallow: / User-agent: DAUM Disallow: / User-agent: diffbot Disallow: / # Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html) User-agent: EasouSpider Disallow: / # http://ews.converasearch.com/crawl.htm User-agent: ERACrawler Disallow: / # BiggerBetter exalead.com Commercial Data Collection User-agent: exabot Disallow: / User-agent: fatbot Crawl-delay: 20 User-agent: German Wikipedia Broken Weblinks Bot; contact: gifti@tools.wmflabs.org Disallow: / User-agent: grapeshot Disallow: / User-agent: heritrix Disallow: / User-agent: ia_archiver # Alexa Spider Allow: /default.asp Disallow: / Crawl-delay: 10 # http://www.nict.go.jp/en/univ-com/plan/crawl.html 11-07-2015 CB User-agent: ICC-Crawler Disallow: / User-agent: IstellaBot Disallow: / # http://shoulu.jike.com/spider.html They didnt honor a 30 second crawl delay, so now they are blocked 08-21-2013 CB User-agent: JikeSpider Disallow: / User-agent: linkdex Disallow: / User-agent: linkdexbot Disallow: / # http://www.majestic12.co.uk/bot.php?+ Disallow 12-02-2015 Chuck User-agent: majestic Disallow: / User-agent: MJ12bot Disallow: / User-Agent: Mozilla/5.0+(compatible; Dow Jones Searchbot) Disallow: / User-agent: NTEntBot Disallow: / # Mozilla/5.0 (compatible; proximic; +http://www.proximic.com/info/spider.php) User-agent: proximic Disallow: / # psbot/0.X (+http://www.picsearch.com/bot.html) User-agent: psbot Disallow: / # https://rankactive.com/resources/rankactive-linkbot 03-27-2016 CB User-agent: RankActiveLinkBot Disallow: / #https://www.semanticscholar.org/crawler PN 4/4/17 User-agent: SemanticScholarBot Disallow: / User-agent: SemrushBot Disallow: / User-agent: SemanticScholarBot Disallow: / # AB 6-26-19: added at Chucks request. https://serpstatbot.com/ User-agent: serpstatbot Disallow: / User-agent: DeuSu Disallow: / User-agent: SeznamBot Disallow: / # ShopWiki/1.0 ( +http://www.shopwiki.com/wiki/Help:Bot ) User-agent: ShopWiki Crawl-Delay: 5 User-agent: Slurp Crawl-delay: 0.5 User-agent: SMTBot Disallow: / User-agent: soleobot Disallow: / User-agent: spbot Crawl-delay: 10 # TurnitinBot (https://turnitin.com/robot/crawlerinfo.html) User-agent: TurnitinBot Disallow: / # http://ews.converasearch.com/crawl.htm User-agent: VSWCrawler Disallow: / # 05-19-2016 CB User-agent: WikiDo Disallow: / User-agent: XoviBot Disallow: / User-agent: yacybot Disallow: / User-agent: yeti Crawl-delay: 30 User-agent: centurybot Disallow: / User-agent: MauiBot Disallow: / #################################################### ## Chinese Search Engines in Order of Marketshare ## #################################################### # Increase Baidu from 10 to 30 secs 09-23-2018 CB User-agent: baiduspider Crawl-delay: 30 User-agent: 360Spider Crawl-delay: 30 User-agent: sogou Crawl-delay: 30 # soso does not appear to honor this delay. We're getting a page request a second User-agent: Sosospider Crawl-delay: 30 User-Agent: Youdao Crawl-delay: 120 ## End of Chinese Search Engines ############################ ## Russian Search Engines ## ############################ # User agent Yandex will include all variations of the bot according to Yandex. Doesnt seem to work. User-agent: Yandex Disallow: / User-agent: Nigma Disallow: / User-agent: Mail.Ru Disallow: / User-agent: YYSpider Disallow: / User-agent: dotbot Disallow: / User-agent: LinkpadBot Disallow: / User-agent: MegaIndex Disallow: /