What’s not to love about solid state drives (SSDs)? They are faster than conventional hard disk drives (HDDs), more compact, have no moving parts, are immune to magnetic fields, and can withstand more shocks and vibration than conventional magnetic platter disks. And, they are becoming available in larger and larger capacities while their cost comes down.
If you’ve upgraded an older computer with an SSD, you no doubt instantly saw the benefits. Your computer booted in less time, your applications loaded faster, and even when you ran out of memory, and apps and data had to be swapped to disk, it felt like everything was much snappier.
We’re now seeing SSDs with capacities that used to be reserved for HDDs and at prices that no longer make our eyes water. 500 GB SSDs are now affordable (under $100), and 1 TB drives are reasonably priced ($100 to $150). Even 2 TB SSDs fall into a budget range for putting together a good performance desktop system ($300 to $400).
We’ve written a number of times on this blog about SSDs, and considered the best uses for SSDs compared to HDDs. We’ve also written about the future of SSDs and how we use them in our data centers and whether we plan on using more in the future.
In this post we’re going to consider the issue of SSD reliability. For all their merits, can SSDs be trusted with your data and will they last as long or longer than if you were using an HDD instead? You might have read that SSDs are limited to a finite number of reads and writes before they fail. What’s that all about?
The bottom line question is: do SSD drives fail? Of course they do, as do all drives eventually. The important questions we really need to be asking are 1) do they fail faster than HDDs, and 2) how long can we reasonably expect them to last?
Backing Up Is Great To Do
Of course, as a data storage and backup company, you know what we’re going to say right off. We always recommend that no matter which storage medium you use, you should always have a backup copy of your data. Even if the disk is reliable and in good condition, it won’t do you any good if your computer is stolen, consumed by a flood, or lost in a fire or other act of nature. You might have heard that water damage is the most common computer accident, and few computer components can survive a thorough soaking, especially when powered.
SSD Reliability Factors to Consider
Generally, SSDs are more durable than HDDs in extreme and harsh environments because they don’t have moving parts such as actuator arms. SSDs can withstand accidental drops and other shocks, vibration, extreme temperatures, and magnetic fields better than HDDs. Add to that their small size and lower power consumption, and you can understand why they’re a great fit for laptop computers and mobile applications.
First, let’s cover the basics. Almost all types of today’s SSDs use NAND flash memory. NAND isn’t an acronym like a lot of computer terms. Instead, it’s a name that’s derived from its logic gate called “NOT AND.” (For the curious, a NAND gate is a logic gate that produces an output that is false only if all its inputs are true. Digital systems employing logic circuits take advantage of the ability of a series of NAND gates to express any Boolean function.)
The term following NAND, flash, refers to a non-volatile solid state memory that retains data even when the power source is removed. NAND storage has specific properties that affect how long it will last. When data is written to a NAND cell (also known as programming), the data must be erased before new data can be written to that same cell. NAND is programed and erased by applying a voltage to send electrons through an insulator. The location of those electrons (and their quantity) determine when current will flow between a source and a sink (called a voltage threshold), determining the data stored in that cell (the 1s and 0s). When writing and erasing NAND, it sends the electrons through the insulator and back, and the insulator starts to wear — the exact number of these cycles in each individual cell varies by NAND design. Eventually, the insulator wears to the point where it may have difficulty keeping the electrons in their correct (programmed) location, which makes it increasingly more difficult to determine if the electrons are where they should be, or if they have migrated on their own.
This means that flash type memory cells can only be programmed and erased a limited number of times. This is measured in P/E cycles, which stands for programmed and erased.
P/E cycles are an important measurement of SSD reliability, but there are other factors that are important to consider, as well. These are P/E cycles, TBW (terabytes written), and MTBF (mean time between failures).
The SSD manufacturer will have these specifications available for their products and they can help you understand how long your drive can be expected to last and whether a particular drive is suited to your application.
P/E cycles — A solid-state-storage program-erase cycle is a sequence of events in which data is written to solid-state NAND flash memory cell, then erased, and then rewritten. How many P/E cycles a SSD can endure varies with the technology used, somewhere between 500 to 100,000 P/E cycles.
TBW — Terabytes written is the total amount of data that can be written to an SSD before it is likely to fail. For example, here are the TBW warranties for the popular Samsung 860 EVO SSD: 150 TBW for 250 GB model, 300 TBW for 500 GB model, 600 TBW for 1 TB model, 1,200 TBW for 2 TB model and 2,400 TBW for 4 TB model. Note: these models are warrantied for 5 years or TBW, whichever comes first.
MTBF — MTBF (mean time between failures) is a measure of how reliable a hardware product or component is over its expected lifetime. For most components, the measure is typically in thousands or even tens of thousands of hours between failures. For example, a hard disk drive may have a mean time between failures of 300,000 hours, while an SSD might have 1.5 million hours.
This doesn’t mean that your SSD will last that many hours, what it means is, given a sample set of that model of SSD, errors will occur at a certain rate. A 1.2 million hour MTBF means that if the drive is used at an average of 8 hours a day, a sample size of 1,000 SSDs would be expected to have one failure every 150 days, or about twice a year.
There are a number of different types of SSD, and advancements to the technology continue at a brisk pace. Generally, SSDs are based on four different NAND cell technologies:
- SLC (Single Level Cell) — one bit per cell
- MLC (Multi-Level Cell) — two bits per cell
- TLC (Triple Level Cell) — three bits per cell
- QLC (Quad Level Cell) — four bits per cell
When one bit is stored (SLC), it’s not necessary to keep close tabs on electron locations, so a few electrons migrating isn’t much of a concern. Because only a 1 or a 0 is being stored, it’s necessary only to accurately determine if voltage flows or not.
MLC stores two bits per cell, so more precision is needed (determining voltage threshold is more complex). It’s necessary to distinguish among 00, 01, 10 or 11. Migrating electrons have more of an impact, so the insulator cannot be worn as much as with SLC.
This trend continues with TLC where three bits are stored: 001, 010, 100, …110 and 111. Migrating electrons have more effect than in MLC, which further reduces tolerable insulator wear.
QLC stores four bits (16 possible combinations of 1s and 0s). With QLC, migrating electrons have the most significant effect. Tolerable insulator wear is further reduced.
QLC is a good fit for read-centric workloads because NAND cells are worn negligibly when reading data versus worn more when writing data (programming and erasing). When writing and rewriting a lot of data, the insulator wears more quickly. If a NAND cell can tolerate that wear, it is well suited to read/write mixed accesses. The less wear-tolerable NAND cells are, the better they are suited for read-centric workloads and applications.
Each subsequent technology for NAND allows it to store an extra bit. The fewer bits per NAND cell, the faster, more reliable, and more energy efficient the technology is — and also, more expensive. A SLC SSD would technically be the most reliable SSD as it can endure more writes, while a QLC is the least reliable. If you’re selecting an SSD for an application where it will be written more than read, then the selection of NAND cell technology could be a significant factor in your decision. If your application is general computer use, it likely will matter less to you.
How Reliability Factors Affect Your Choice of SSD
How important these factors are to you depends on how the SSD is used. The right question to ask is how a drive will perform in your application. There are different performance and reliability criteria depending on whether the SSD will be used in a home desktop computer, a data center, or an exploration vehicle on Mars.
Manufacturers sometimes specify the type of application workload for which an SSD is designed, such as write-intensive, read-intensive or mixed-use. Some vendors allow the customer to select the optimal level of endurance and capacity for a particular SSD. For instance, an enterprise user with a high-transaction database might opt for a higher number of drive writes at the expense of capacity. Or a user operating a database that does infrequent writes might choose a lower drive writes number and a higher capacity.
Signs of SSD Failure
SSDs will eventually fail, but there usually are advance warnings of when that’s going to happen. You’ve likely encountered the dreaded clicking sound that emanates from a dying HDD. An SSD has no moving parts, so we won’t get an audible warning that an SSD is about to fail us. You should be paying attention for a number of indicators that your SSD is nearing its end of life, and take action by replacing that drive with a new one.
1) Errors Involving Bad Blocks
Much like bad sectors on HDDs, there are bad blocks on SSDs. This is typically a scenario where the computer attempts to read or save a file, but it takes an unusually long time and ends in failure, so the system eventually gives up with an error message.
2) Files Cannot Be Read or Written
There are two ways in which a bad block can affect your files, 1) the system detects the bad block while writing data to the drive, and thus refuses to write data, and 2), the system detects the bad block after the data has been written, and thus refuses to read that data.
3) The File System Needs Repair
Getting an error message on your screen can happen simply because the computer was not shut down properly, but it also could be a sign of an SSD developing bad blocks or other problems.
4) Crashing During Boot
A crash during the computer boot is a sign that your drive could be developing a problem. You should make sure you have a current backup of all your data before it gets worse and the drive fails completely.
5) The Drive Becomes Read-Only
Your drive might refuse to write any more data to disk and can only read data. Fortunately, you can still get your data off the disk.
So, How Reliable is an SSD?
Let’s go back to the two questions we asked above.
Question 1: Do SSDs fail faster than HDDs?
Answer: That depends on the technology of the drives and how they’re used. HDDs are better suited for some applications and SSDs for others. SSDs can be expected to last as long or longer than HDDs in most general applications.
Question 2: How long can we reasonably expect an SSD to last?
Answer: An SSD should ideally last as long as its manufacturer expects it to last (e.g. five years), provided that the use of the drive is not excessive for the technology it employs (e.g. using a QLC in an application with a high number of writes). Consult the manufacturer’s recommendations to ensure that how you’re using the SSD matches its best use.
SSDs are a different breed of animal than a HDD and they have their strengths and weaknesses relative to other storage media. The good news is that their strengths — speed, durability, size, power consumption, etc. — are backed by pretty good overall reliability.
SSD users are far more likely to replace their storage drive because they’re ready to upgrade to a newer technology, higher capacity, or faster drive, than having to replace the drive due to a short lifespan. Under normal use we can expect an SSD to last years. If you replace your computer every three years, as most users do, then you probably needn’t worry about whether your SSD will last as long as your computer. What’s important is whether the SSD will be sufficiently reliable that you won’t lose your data during its lifetime.
As we saw above, if you’re paying attention to your system, you will be given ample warning of an impending drive failure, and you can replace the drive before the data is not readable.
It’s good to understand how the different SSD technologies affect their reliability, and whether it’s worth it to spend extra money for SLC over MLC or QLC. However, unless you’re using an SSD in a specialized application with more writes than reads as we described above, just selecting a good quality SSD from a reputable manufacturer should be enough to make you feel confident that your SSD will have a useful life span.
Keep an eye out for any signs of failure or bad sectors, and, of course, be sure to have a solid backup plan no matter what type of drive you’re using.
• • •