colourful words and phrases

techy tangents and general life chatter from a tired sysadmin

RCA – Remote Media service degradation


Terminology

  • Remote -- something which originated outside of queer.party and came in due to federation
  • Media -- a catch-all for photos, videos, audio, or any other kind of content that you can attach to a toot
  • Backport -- taking code from a newer version and putting it into an existing version
  • URI -- same as URL but it makes me feel cooler to use it
  • URL -- an internet address, usually HTTP/HTTPS

Summary of issue

queer.party citizens were unable to view the avatars or profile banners of remote Fediverse citizens, and custom emoji or attached media in remote toots may not have been visible.

Timeline of events

Initial cause

On Thursday (9th July, 2020) at around 13:30 BST, queer.party was upgraded from Mastodon version 3.1.3, to version 3.1.5.

Database migrations

Shortly after the upgrade to 3.1.5, I realised that the timelines were not updating, and checked on the back-end to discover some database errors. It seems that Mastodon, in either 3.1.4 or 3.1.5, included some changes to the database, but either I missed the notice to run these migrations, or there was no notice. For approximately half an hour, timelines were not being updated.

Missing avatars

I and several queer.party citizens noticed that a small number of users seemed to not have a profile picture or banner. After realising the problem was on queer.party's side, I investigated.

Mastodon version 3.1.4 included a change which, for most deployments, should have been seamless - remote avatars, profile banners, custom emoji and media should now be stored in a root directory named 'cache' - to keep local and remote content separate.

For queer.party, somehow, this change was not seamless, and in the case of avatars, profile banners and custom emoji, an incorrect path was being used to attempt to access the content. In some cases, this was a missing 'cache/' in the URI, in others this was an erroneous 'cache/' where one should not have been.

Goofing it up some more

The documentation for Mastodon version 3.1.4 indicated that, while not necessary, administrators could run the command tootctl upgrade storage-schema to transition existing remote content to the new directory. Figuring this may somehow be necessary for queer.party, I ran this command.

It was not necessary.

- Voice-Over Narrator, "Arrested Development"

Beginning to fix the problem

Over the weekend, I tried to dig through Mastodon's source code to try and identify where things may be going wrong. This, mostly, was fruitless, however I did identify that the code which handled the tootctl upgrade storage-schema command was broken for me in the 3.1.4 and 3.1.5 releases, and that it was fixed in the development branch. I backported the fix, which allowed the command to work.

Realising the goof

Unfortunately, due to a bug in the original release code, a large number of avatars and profile banners, as well as some custom emoji, had been marked as transitioned to the new directory, though in reality the operation had been failing. This had caused the disappearing profile picture problem to accelerate, and now many remote avatars were broken.

Fixing it, finally

Through trial-and-error testing (because hey, media was already broken at this point, I couldn't make it worse) I managed to identify that the S3_ALIAS_HOST configuration value, used by Mastodon to tell it what address it should use when linking to content, could be used incorrectly to fix the problem.

Pounded in the Butt by the Hyperfixations Caused by my Undiagnosed ADHD

While doing this, I grew frustrated with how I was managing the internal configuration for Mastodon, and I wrote some new scripts to aid me in simplifying the configuration. While this does have a net positive effect, it did result in some additional downtime while I worked on this.

It's all coming together

The backported patch from the development branch of Mastodon, in conjunction with the configuration change, eventually resolved the weird, intermittent issues.

Un-goofing what's left

There still remains a large number of remote avatars and profile banners which were broken due to the previous upgrade command, those are currently being re-downloaded fresh in the background.