Postmortems

Monzo (new UK bank) struggles to transfer money

some customers were left with money in their Monzo account that didn’t actually leave the sending bank

(Monzo as a bank is only 4 years old and only fully licensed for two years.)

Root cause: buggy software in equipment run by a third-party intermediary. Third-party intermediary’s response deemed inadequate and the function to be brought in-house. (Sounds like a use-after-free bug.)

Our Gateway provider found a bug that caused one of their systems to get stuck in a state where it corrupts all messages passing through it. They deployed a fix for this on Thursday the 13th of June.

The bug was in a computer program the Gateway uses to translate payment messages between two formats. When the program was operating under load, the system tried to clear memory it believed to be unused (a process known as garbage collection).

But because it was using an unsafe method to access memory, the code ended up reading memory that had already been cleared away, causing it not to know how to translate the date field in payment messages.

75% of inbound payments succeed in real-time. With Stand-In mode disabled, the Gateway begins sending Monzo payments.

25% of inbound payments are credited and then reversed. As the Gateway is continuing to corrupt our responses to inbound messages, the Hub ignores them and reverses these payments.

via the discussion on HN
https://news.ycombinator.com/item?id=20231845

From the HN discussion:

It happens all the time, and they just don’t tell you. We get automated notices when banks connect and disconnect from the Faster Payments network, something happens every few days. Not always this length, but occasionally.

Just yesterday a major high street bank stopped sending payments for an hour, and was telling customers on Twitter that there were no problems.

Hell, the central system (what I called the Hub in this article) had a 12 hour split brain meltdown last July which had banks emailing each other spreadsheets back and forth for two weeks afterwards.