WorldwideUKIrelandAustraliamore
media.info

Multiple emails today: an apology

By James Cridland
Posted 23 April 2015, 12.10pm edt
Jim H.
Remove the ads and support us: GO PRO




Sometimes things go wrong, and today was a fine example of something going wrong.

Today we sent a very small minority of our users - we estimate less than ten people, and we've heard from three so far - many different copies of their daily newsletter. One person reports 85; others report more.

When notified, we reacted quickly and killed the email service; and suspended the website altogether for five minutes while we checked what was going on.

Email newsletters for the rest of the day will be slightly delayed as we bring the system back online.

We're hugely apologetic to those people who were notified. Since they unsubscribed (naturally!) we're not able to apologise directly, and we've posted a link to this post on Twitter and other services.

Users who were affected: if you'd be able to send us a screenshot of the emails you were sent, we'd be happy to give you a complementary Pro account by way of compensation.

Incident details

Between 10.16am and 12.22pm, one script on media.info was occasionally running very slow. This script normally takes about twenty seconds to complete, and we run it once a minute. It performs a number of tasks, including sending 50 newsletter emails.

The fault was caused by Amazon SES, the email service we use, which was running slowly and erratically. Amazon report:

5:37 AM PDT Between 2:22 AM PDT and 4:40 AM PDT we experienced elevated API error rates in the EU-WEST-1 Region. This affected the SendEmail/SendRawEmail APIs as well as calls made to the SMTP endpoint. The issue has been resolved and the service is now operating normally.

Our script looked a little like this...

for each email we have {
send the mail
mark it as being sent in the database
}

Amazon SES's issues were meaning that, at "send the mail", above, the script waited a long time for acknowledgement that the mail had been sent; and eventually gave up. So the email was sent, but it was never marked as being sent.

Additionally, because the script was taking some time to run, an additional script started up at the beginning of the next minute, which meant that multiple copies of the script were being invoked.

Mitigation

We've amended the script to read...

for each email we have {
try to mark it as being sent in the database {
if that succeeds, send the mail
}
}

... so that should Amazon SES fail in this way again, it won't stop the database being marked as having sent it, so users will never get more than one email.

We've also amended this script so that it can not run concurrently.

Remove the ads and support us: GO PRO

Be the first to comment

Login or register to comment
It only takes a second with your Google or Facebook account.


- follow us on @minfodiscuss