A firmware bug in disk drives causes correlated failures, which can defeat redundancy tactics. Larger disks with longer rebuild times are more vulnerable, of course.
The official statement in 2020 by Western Digital, the manufacturer and owner of SanDisk, acknowledged the firmware issue in SAS SSD. But, they didn’t share any concrete detail and directed the whole thing toward the responsible OEM.
The SanDisk SSD’s Death Bug renders the SSDs with 200 GB to 1.6 TB capacity unusable after 40,000 hours of usage, which is approximately 4 years and 206 days. So, it might not be an imminent danger to users who have recently started using the drives for their servers or RAID arrays. However, if not dealt with on time, the bug results in complete SSD failure, with all the data lost beyond the scope of recovery.
with links to Hacker News’ own outage