So, I woke up this morning with my VPS (8 gigabytes of RAM, 4 cores, my services running, my sites living happily there) screaming.
A few alerts in my inbox. What's today?
After digging a bit, the culprits popped up. Hungry bots acting at night. Yes, these three.
But what did I do yesterday? Yesterday I was installing Gitea to remove my private repos from GitHub, and selfhost them myself, seeing and predicting the funny days to come.
After moving them, I also moved the only public one I have in my personal profile.
And I kept it public in Gitea. Like it was on GitHub. Nothing fancy.
That was the beginning of the nightmare.
Only Anthropic bots made more than 100k requests to my machine, to that single public repo. Meta added 54k more from 20+ different IPs. OpenAI threw in another 19k. Total: 179k requests in 5 hours, between midnight and 5 AM.
8 human visitors. 179k bot requests. That's 22,371 bot requests for every single real person in six night hours.
I have seen this before. This is a common thing in the VPS world if you self-host. But never with this virulence. And only a few hours after publishing my project. And the three at the same time.
They consumed 20 GB of my bandwidth in one night. My CPU went from 3% to 54%. They walked 630 git commits, downloading every version of every file that ever existed in that public repo. Source code, deployment scripts, configuration files. Some personal data (fine to be there, the repo is for that), downloaded over 4,000 times in JSON and 400 times in PDF. Efficient!
The .env file
There's more. OpenAI's GPTBot specifically targeted my .env file — the one that holds secrets, API keys, passwords (not in my case). I uploaded it once by mistake in one of my commits (and removed in the next one), but that was enough for them. It downloaded the raw content 302 times across different commit hashes. 512 bytes each time. The actual file. Not the HTML page. The raw one.
Anthropic's ClaudeBot read my documentation 2,300 times. Everything, systematically. Commit by commit.
They also probed /_edit/ and /_delete/ URLs — endpoints that require authentication: Unauthorized attempts this time.
And Meta? They distributed their 54k requests across 20+ IPs from the same /24 subnet. Bravo. That's not crawling. I'll let you name it yourselves.
The irony
I discovered all of this using Anthropic's own product (Claude Code). Their AI helped me investigate the attack their crawler was performing on my server. In real time. Thanks Claude.
The takeaway
These are three of the richest companies on the planet, systematically scraping personal servers without consent, consuming resources I pay for, downloading credential files, and reading all documentation, all to feed their models, to sell me my own code next year.
I blocked them. robots.txt, nginx user-agent blocking, rate limiting, fail2ban, and the traffic dropped to zero in minutes.
If you self-host anything in 2026: a Gitea instance, Forgejo, a blog, a wiki, etc., but mainly code in all these new GitHub alternatives, block AI crawlers on day one. Not day two. Because they will find you within hours, and they will eat everything.
P.S.: if that happened to me in one single public repo, I cannot even imagine the nightmare of GitHub these months.
Salud!