Hosting a bluesky (bsky) Relay server

I had been looking for new projects to start working on, and one that seemed interesting to me was hosting a bluesky (bsky) relay server. As a daily user of bluesky, I figured it could be a good use of spare resources I have.

Here's bluesky's documentation on what it is:

Federation Architecture | Bluesky
The AT Protocol is made up of a bunch of pieces that stack together. Federation means that anyone can run the parts that make up the AT Protocol themselves, such as their own server.

I provisioned an Ubuntu 24.04 VM with the following specs:
16GB RAM (DDR3)
8 CPU cores (Intel Xeon E5-2650 v2)
160GB SSD space

For the most part, I used this guide below from Bryan Newbold for getting it setup, which is a great guide and does a perfect job explaining the steps:

A Full-Network Relay for $34 a Month | bryan newbold
This is an update to a Summer 2024 blog post. At the time, atproto relays required a cache of the full network on local disk to validate data structures. With the Sync v1.1 updates, relays don’t need all that disk I/O. What impact does that have on hosting setup and operating costs? Turns out the d…

Some changes I made from the guide:
Changed the line:
RELAY_REPLAY_WINDOW=2h to RELAY_REPLAY_WINDOW=12h
And instead of installing Caddy, I used an existing nginx reverse proxy I had setup on the network

After installing and getting it setup, I confirmed that my relay is up and running.

At the time of writing, I have 1151 Personal Data Servers (PDS) indexed and interfacing with my relay, with millions of events per day. Of course since most bluesky users are using the primary bluesky company run servers, I see most events from those, but I've seen events from about 74% of those 1151 PDS.

After running for 5 days, it's definitely interesting to see the trends of usage. The resource utilization really goes down during the US night hours. It's never too high, but there's noticeable dips.

CPU usage starts dipping around 7-9PM Central time and then will consistently pick back up around 7-9AM Central time
Network traffic inbound is usually around 3MB/s during peak hours and drops to around 1.5MB/s off hours. Outbound is pretty consistent at between 100-180KB/s
Disk IO fluctuates between 10-18MB/s
RAM Usage is fairly stable, grows slightly every day
Storage space usage fluctuates usually from about 75GB during off hours to 120GB during peak hours. The DB seems to do a good job of clearing itself up.

Overall it's been a fun and interesting project that I'll keep up with and I'm excited to see how resource utilization changes as bluesky keeps growing.