![]() ![]() In this case, it's interesting because most of the time, there's nothing actually wrong with the drive and upon a graceful reset - and we'll talk a little bit about some of the teachers in NVMe coming to help with this - you can kind of bring the drive back to life. Just SSD firmware has become really, really complex and almost all the majority of failures we see in over 50% are actually just some sort of firmware issue. 1 cause of SSD failures, it's firmware issues.Ġ4:12 S1: And one, it's just because NVMe and all SSD firmware is extremely complex, moving data around, doing garbage collection, monitoring the logical and physical mapping of the SSDs and firmware. So, typically, this is not a very common failure mode, although sometimes there's a lot of NAND or something or a bad firmware that causes more media failures than is normal and so it's not to say that it can't happen, but it is more rare than what is the No. Now, media failures, now the actual, as you look into increasing prevalence now, most enterprise drives are designed to actually withstand media failures with, so most of the drives, enterprise drives today have something like an onboard XOR or a RAID engine - or you know some vendors call it fail in place - but basically enterprise SSDs are able to withstand failures, so not only just specific blocks or page failures, but also entire die failures. ![]() And I'm not going to say that they can't exist, but things like capacitor failures or resistor failures or ASIC failures just aren't that common. So, in the long lifecycle, so enterprise and data center NVMe SSDs take a long time to get to market, so generally it's a year plus lifecycle and in that there's quality and reliability tests, there's validation, there's hardware screening, there's SSD controller power on - I mean all this stuff happens.Ġ3:12 S1: And a lot of the hardware issues get weeded out. ![]() But, basically, in a lot of use of enterprise drives even with one in three drive writes per day, class drives being the mainstream drives today, endurance failures are not very common just because one, they're understood very well and two, most customers just are not using that much endurance - and we'll talk a little bit of that when I look at some case studies. I wrote the model at /endurance based off some Python scripts that basically do this, basically monitoring the right implication and projecting the endurance.
0 Comments
Leave a Reply. |