queer.party's infrastructure at present is..janky. The plan is to make it non-janky, by moving to a new server with simpler internal infrastructure. On the 28th of January at 8PM GMT, this will happen.
The plan for this change is documented on my Notion site, which will live-update as progress occurs: https://maffsie.notion.site/Server-move-35173c78fa5d42e89dca4ff14daf4951
At present, queer.party is hosted on one physical server, across three VMs, and is dependent on a fourth VM. The intention here was to use Docker Swarm to cluster things and make scaling and maintenance easier, but that didn't really work out. The result of this is occasional outages due to the dependent fourth VM - a firewall appliance.
I don't like how it's architected at present, because there's more points of failure than there needs to be, and the server host is raising the price for the server hosting queer.party anyway, so what better time to move to a new server.
What's the plan?
- The contents of queer.party's CDN (roughly 172GB) will be synced to the new server, and will keep syncing until the old server is offline
- On the 28th of January, 2022, at 8PM GMT, queer.party will go offline
- The final CDN sync will occur
- The database behind queer.party will be fully copied over
- The firewall in front of queer.party will be configured to point to the new server
- queer.party will come back online on the new server
- DNS entries will be updated to point to the new server directly
- Once no traffic can reach the old server, it will be shut down, and the disks securely erased by my own paw.
What could go wrong?
- It might be a more IO-intensive operation to keep the CDN data in sync between the two servers prior to the final move, so this document will be updated once this is set up to indicate whether the performance is acceptable. I might use minio's replication functionality to accomplish this, but I don't remember how to configure it..
- The firewall might be annoying to configure the redirect, but I'll figure it out
- Status page will need updated to check the new control backplane for service health
What's actually going to change?
- Web traffic will go through one load-balancer (Traefik 2) instead of through a firewall+load-balancer (OPNsense+HAProxy) then another load-balancer (Traefik 1)
- All queer.party services will operate on a Docker Swarm cluster operating on bare metal, instead of in VMs, so fewer layers between the network and the hardware, and the actual applications.
- The increased resources mean I might be able to add extra services (alternative front-ends, elasticsearch, etc).. we will see.
Why that date and time?
It gives me enough time to do the migration before the end of the month, because I don't want to pay next month's server bill when i'm not gonna be using the server.