Update on Unintended Outage the Morning of Sept 1st

A quick apology and update in regards to our hours of downtime the morning of September 1st, 2025.

If you attempted to access the Adocasts site in the early morning hours (eastern time) of September 1st, 2025, you were likely met with a 502 error screen. I first want to deeply apologize for this downtime. Looking at our traffic, the downtime lasted about 6 hours, unfortunately happening as I was sleeping. I'd like to take a second to take a look back at what happened, because it was a bit of a rough weekend overall.

Saturday, August 30th, 2025

In the late hours of Saturday, August 30th, I released a major refactor and redesign of the Adocasts site. This too has about an hour of unintended downtime, and I apologize for that as well. This hour of downtime was caused by a couple issues:

The updated code was using newer NodeJS features and my local NodeJS version was newer than what was installed on our server. I missed this and the code went out and immediately ran into issues. This issue was pinpointed and NodeJS was updated to the latest LTS on our server.
As our site was redeploying to spin it back up, our server ran out of available disk space. It had been close to max for a while, but it was managed and mostly a non-issue until now. I immediately provisioned a larger server, giving us more available disk space and memory to boot.

Monday, September 1st, 2025

Cut to the issue this morning, September 1st. A part of this refactor was the introduction of dynamic OG (Open Graph) images generated with Puppeteer. The images generated dynamically for our OG images are cached with a long life by Cloudflare. When a cached image is requested, Cloudflare serves it and the request never reaches our server.

However, the code running on our server was less than perfect and it resulted in instances where the Puppeteer browser was left lingering. As OG images were requested, these lingering browser instances compounded and our server ran out of memory overnight.

I awoke this morning and immediately noticed the server had run out of memory and began debugging why. By listing the running processes with htop via TERM=xterm-256color htop I was able to see multiple running puppeteer instances were the culprit. I couldn't get access to our sudo account to kill the processes because it couldn't get a shell running due to the lack of memory, so I opted to restart the server itself in order to kill the processes.

As the server was rebooting, I fixed the error handling to ensure the Puppeteer browser process closed correctly. I also added the --disable-dev-shm-usage flag, a setting in Chromium that improves memory management on resource-limited machines like ours by using temporary disk space (tmp) instead of shared memory (shm). These two changes have been a huge help; our memory usage has been minimal and stable ever since.

Again, I want to deeply apologize for the downtime we've had this weekend. I've implemented these changes to ensure this specific issue doesn't happen again. I'll continue to monitor our systems closely to prevent any future disruptions.

To all our Adocasts Plus subscribers, I sincerely apologize for the disruption. As a token of my appreciation for your patience, you can use the coupon code SEP1 to get 50% off your next month's subscription. I value your membership and am working hard to ensure a seamless experience.

I'll be working on a way to automatically apply these discounts in the future so that any future make-goods won't require any action on your part.

Update on Unintended Outage the Morning of Sept 1st

Saturday, August 30th, 2025

Monday, September 1st, 2025

Join the Discussion 0 comments