{"id":92721,"date":"2025-03-28T03:17:45","date_gmt":"2025-03-28T03:17:45","guid":{"rendered":"https:\/\/neclink.com\/index.php\/2025\/03\/28\/open-source-devs-are-fighting-ai-crawlers-with-cleverness-and-vengeance\/"},"modified":"2025-03-28T03:17:45","modified_gmt":"2025-03-28T03:17:45","slug":"open-source-devs-are-fighting-ai-crawlers-with-cleverness-and-vengeance","status":"publish","type":"post","link":"https:\/\/neclink.com\/index.php\/2025\/03\/28\/open-source-devs-are-fighting-ai-crawlers-with-cleverness-and-vengeance\/","title":{"rendered":"Open source devs are fighting AI crawlers with cleverness and vengeance"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">AI web-crawling bots are the cockroaches of the internet, many software developers believe. Some devs have started fighting back in ingenuous, often humorous ways.<\/p>\n<p class=\"wp-block-paragraph\">While any website might be targeted by bad crawler behavior \u2014 <a href=\"https:\/\/techcrunch.com\/2025\/01\/10\/how-openais-bot-crushed-this-seven-person-companys-web-site-like-a-ddos-attack\/\">sometimes taking down the site<\/a> \u2014 open source developers are \u201cdisproportionately\u201d impacted, <a href=\"https:\/\/thelibre.news\/foss-infrastructure-is-under-attack-by-ai-companies\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">writes<\/a> Niccol\u00f2 Venerandi, developer of a Linux desktop known as Plasma and owner of the blog LibreNews.<\/p>\n<p class=\"wp-block-paragraph\">By their nature, sites hosting free and open source (FOSS) projects share more of their infrastructure publicly, and they also tend to have fewer resources than commercial products.<\/p>\n<p class=\"wp-block-paragraph\">The issue is that many AI bots don\u2019t honor the Robots Exclusion Protocol robot.txt file, the tool that tells bots what not to crawl, originally created for search engine bots.<\/p>\n<p class=\"wp-block-paragraph\">In a \u201ccry for help\u201d <a href=\"https:\/\/xeiaso.net\/notes\/2025\/amazon-crawler\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">blog post<\/a> in January, FOSS developer Xe Iaso described how AmazonBot relentlessly pounded on a Git server website to the point of causing DDoS outages. Git servers host FOSS projects so that anyone who wants can download the code or contribute to it.<\/p>\n<p class=\"wp-block-paragraph\">But this bot ignored Iaso\u2019s robot.txt, hid behind other IP addresses, and pretended to be other users, Iaso said.<\/p>\n<p class=\"wp-block-paragraph\">\u201cIt\u2019s futile to block AI crawler bots because they lie, change their user agent, use residential IP addresses as proxies, and more,\u201d Iaso lamented.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cThey will scrape your site until it falls over, and then they will scrape it some more. They will click every link on every link on every link, viewing the same pages over and over and over and over. Some of them will even click on the same link multiple times in the same second,\u201d the developer wrote in the post.<\/p>\n<h2 class=\"wp-block-heading\" id=\"h-enter-the-god-of-graves\">Enter the god of graves<\/h2>\n<p class=\"wp-block-paragraph\">So Iaso fought back with cleverness, building a tool called Anubis.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Anubis is <a href=\"https:\/\/xeiaso.net\/blog\/2025\/anubis\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">a reverse proxy proof-of-work check <\/a>that must be passed before requests are allowed to hit a Git server. It blocks bots but lets through browsers operated by humans.<\/p>\n<p class=\"wp-block-paragraph\">The funny part: Anubis is the name of a god in Egyptian mythology who leads the dead to judgment.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cAnubis weighed your soul (heart) and if it was heavier than a feather, your heart got eaten and you, like, mega died,\u201d Iaso told TechCrunch. If a web request passes the challenge and is determined to be human, <a href=\"https:\/\/git.xeserv.us\/xe\/anubis-test\/src\/branch\/main\/README.md\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">a cute anime picture <\/a>announces success. The drawing is \u201cmy take on anthropomorphizing Anubis,\u201d says Iaso. If it\u2019s a bot, the request gets denied.<\/p>\n<p class=\"wp-block-paragraph\">The wryly named project has spread like the wind among the FOSS community. Iaso <a href=\"https:\/\/github.com\/TecharoHQ\/anubis\/tree\/main?tab=readme-ov-file\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">shared it on GitHub<\/a> on March 19, and in just a few days, it collected 2,000 stars, 20 contributors, and 39 forks.\u00a0<\/p>\n<figure class=\"wp-block-image is-resized\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXf0-VBDUoXI5znG0aUDI5sVwf34ttyTmKmXCq6XdBA9VNkIbCcn-Fhzc4tXy2X4403oEP0lXLLoYos8ciIdlYueztletdZ5x934DW3GvAX9VnrBvZe9HncYX250z5PBQVgMQhqSrA?key=yzfwZbQgZvQ4cUpbT_6aMTdK\" alt=\"\" style=\"width:857px;height:auto\"\/><\/figure>\n<h2 class=\"wp-block-heading\" id=\"h-vengeance-as-defense-nbsp\">Vengeance as defense\u00a0<\/h2>\n<p class=\"wp-block-paragraph\">The instant popularity of Anubis shows that Iaso\u2019s pain is not unique. In fact, Venerandi shared story after story:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Founder CEO of <a href=\"https:\/\/drewdevault.com\/2025\/03\/17\/2025-03-17-Stop-externalizing-your-costs-on-me.html\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">SourceHut Drew DeVault described<\/a> spending \u201cfrom 20-100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale,\u201d and \u201cexperiencing dozens of brief outages per week.\u201d<\/li>\n<li class=\"wp-block-list-item\">Jonathan Corbet, a famed FOSS developer who runs Linux industry news site LWN, warned that his site was <a href=\"https:\/\/mastodon.social\/@AndresFreundTec\/113868582630760229\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">being slowed by DDoS-level traffic <\/a>\u201cfrom AI scraper bots.\u201d<\/li>\n<li class=\"wp-block-list-item\">Kevin Fenzi, the sysadmin of the enormous Linux Fedora project, <a href=\"https:\/\/www.scrye.com\/blogs\/nirik\/posts\/2025\/03\/15\/mid-march-infra-bits-2025\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">said the AI scraper bots<\/a> had gotten so aggressive, he had to block the entire country of Brazil from access.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">Venerandi tells TechCrunch that he knows of multiple other projects experiencing the same issues. One of them \u201chad to temporarily ban all Chinese IP addresses at one point.\u201d\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Let that sink in for a moment \u2014 that developers \u201ceven have to turn to banning entire countries\u201d just to fend off AI bots that ignore robot.txt files, says Venerandi.<\/p>\n<p class=\"wp-block-paragraph\">Beyond weighing the soul of a web requester, other devs believe vengeance is the best defense.<\/p>\n<p class=\"wp-block-paragraph\">A few days ago on <a href=\"https:\/\/news.ycombinator.com\/item?id=43422413\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Hacker News<\/a>, user <a href=\"https:\/\/news.ycombinator.com\/item?id=43432682\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">xyzal<\/a> suggested loading robot.txt forbidden pages with \u201ca bucket load of articles on the benefits of drinking bleach\u201d or \u201carticles about positive effect of catching measles on performance in bed.\u201d\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cThink we need to aim for the bots to get _negative_ utility value from visiting our traps, not just zero value,\u201d xyzal explained.<\/p>\n<p class=\"wp-block-paragraph\">As it happens, in January, an anonymous creator known as \u201cAaron\u201d released a tool called <a href=\"https:\/\/go.skimresources.com\/?id=111346X1569483&amp;xs=1&amp;url=https:\/\/zadzmo.org\/code\/nepenthes\/&amp;xcust=2-1-2592071-1-0-0-0-0&amp;sref=https:\/\/www.pcworld.com\/article\/2592071\/one-rebels-malicious-tar-pit-trap-is-driving-ai-scrapers-insane.html\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Nepenthes<\/a> that aims to do exactly that. It traps crawlers in an endless maze of fake content, a goal that the dev admitted to <a href=\"https:\/\/arstechnica.com\/tech-policy\/2025\/01\/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Ars Technica<\/a> is aggressive if not downright malicious. The tool is named after a carnivorous plant.<\/p>\n<p class=\"wp-block-paragraph\">And Cloudflare, perhaps the biggest commercial player offering several tools to fend off AI crawlers, last week released a similar tool called AI Labyrinth.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">It\u2019s intended to \u201cslow down, confuse, and waste the resources of AI Crawlers and other bots that don\u2019t respect \u2018no crawl\u2019 directives,\u201d Cloudflare described <a href=\"https:\/\/blog.cloudflare.com\/ai-labyrinth\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">in its blog post<\/a>. Cloudflare said it feeds misbehaving AI crawlers \u201cirrelevant content rather than extracting your legitimate website data.\u201d<\/p>\n<p class=\"wp-block-paragraph\">SourceHut\u2019s DeVault told TechCrunch that \u201cNepenthes has a satisfying sense of justice to it, since it feeds nonsense to the crawlers and poisons their wells, but ultimately Anubis is the solution that worked\u201d for his site.<\/p>\n<p class=\"wp-block-paragraph\">But DeVault also issued a public, heartfelt plea for a more direct fix: \u201cPlease stop legitimizing LLMs or AI image generators or GitHub Copilot or any of this garbage. I am begging you to stop using them, stop talking about them, stop making new ones, just stop.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Since the likelihood of that is zilch, developers, particularly in FOSS, are fighting back with cleverness and a touch of humor.<\/p>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2025\/03\/27\/open-source-devs-are-fighting-ai-crawlers-with-cleverness-and-vengeance\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI web-crawling bots are the cockroaches of the internet, many software developers believe. Some devs have started fighting back in ingenuous, often humorous ways. While<\/p>\n","protected":false},"author":1,"featured_media":92722,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[149],"tags":[],"class_list":["post-92721","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-business"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts\/92721","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/comments?post=92721"}],"version-history":[{"count":0,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts\/92721\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/media\/92722"}],"wp:attachment":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/media?parent=92721"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/categories?post=92721"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/tags?post=92721"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}