A case for QLC SSDs in the data center

A case for QLC SSDs in the data center

Table of Contents

Today, HDDs are the go-to storage solution for most data centers because of their lower cost and power footprint than other solutions like TLC flash. But while HDDs are growing in size, they haven’t been growing in I/O performance. In other words, the bandwidth per TB for HDDs has been dropping. This has been forcing data center engineers to meet their storage performance needs by shifting hot (frequently accessed) data to a TLC flash tier or by overprovisioning storage.QLC flash as a technology has been around since 2009. Adoption has been slow because it has historically operated at lower drive capacity points – less than 32TB. Also, high cost and limited write endurance didn’t make it an attractive alternative to TLC in the data centers. In the meantime, HDD densities have been growing without any significant increase in the throughput. As more data is stored on a given drive, the need for I/O goes up proportionally. The continued densification of HDD capacity has led to a consistent decline in BW/TB. This has negatively affected a portion of hot workloads and forced bytes to get stranded on HDDs.QLC flash occupies a unique space in the performance spectrum in between HDDs and SSDs for servicing workloads that still depend upon performance at 10 MB/s/TB range, i.e., where we had 16-20TB HDDs. Additionally, workloads are doing large batch IOs, which do not need very high performance but still are in the 15-20 MB/s/TB range and use TLC flash today.QLC flash, introduced as a tier above HDD, can meet write performance requirements with sufficient headroom in endurance specifications. The workloads being targeted are read-bandwidth-intensive with infrequent as well as comparatively low write bandwidth requirements. Since the bulk of power consumption in any NAND flash media comes from writes, we expect our workloads to consume lower power with QLC SSDs. The advent of the 2Tb QLC NAND die, along with 32-die stack becoming mainstream illustrates just how rapidly the density scaling of QLC flash is growing at a NAND package level as well as at drive level. We expect QLC SSD density will scale much higher than TLC SSD density in the near-term and long-term. This will bring a meaningful impact to server and rack level bytes densification as well as help lower per-TB acquisition and power costs at both the drive and server level.

QLC at Meta

Meta’s storage teams have started working closely with partners like Pure Storage, utilizing their DirectFlash Module (DFM) and DirectFlash software solution to bring reliable QLC storage to Meta. We are also working with other NAND vendors to integrate standard NVMe QLC SSDs into our data centers. While today, QLC is lower in cost than TLC, it is not yet price-competitive enough for a broader deployment. Still, the gains in power consumption efficiency are material, and the above-mentioned use cases are expected to greatly benefit from that. Given that HDDs are continuing to get colder as their density increases (decreasing BW/TB) and that NAND cost structures are improving with technology advancements, we believe that adding a QLC tier is the right path forward.

Hardware considerations for adopting QLC

While E1.S as a form factor has been great for our TLC deployments, it’s not an ideal form factor to scale our QLC roadmap because its size limits the number of NAND packages per drive. The Industry standard U.2-15mm is still a prevalent form factor across SSD suppliers, and it enables us to potentially scale to 512TB capacity. E3 doesn’t bring additional value over U.2 at the moment, and the market adoption split between the 4 variants of E3 makes it less attractive. Pure Storage’s DFMs can allow scaling up to 600TB with the same NAND package technology. Designing a server to support DFMs allows the drive slot to also accept U.2 drives. This strategy enables us to reap the most benefits in cost competition, schedule acceleration, power efficiency, and vendor diversity. The primary benefit of QLC drives is byte density at the drive and server level and the associated power efficiency. Within Meta, the byte density target of the QLC-based server is 6x the densest TLC-based server we ship today. Even though the BW/TB expected of QLC is lower than TLC, the QLC server bytes density requires a more performant CPU, faster memory, and network subsystem to take advantage of the media capabilities.

Adapting our storage software for QLC

Adopting Meta’s existing storage software to QLC has presented some interesting challenges. As discussed above, our QLC systems are very high in density. We are targeting QLC SSDs as a higher-performance media compared to HDDs. This raises throughput expectations beyond any single server throughput we ever had. Scaling such high throughput across CPU cores and sockets requires careful placement of data and compute to process that I/O. We need to make sure we minimize data touchpoints and can separate the I/O by type. The software stack in Pure Storage’s solutions uses Linux userspace block device driver (ublk) devices over io_uring to both expose the storage as a regular block device and enable zero copy for data copy elimination – as well as talk to their userspace FTL (DirectFlash software) in the background. The stack uses io_uring for other vendors to directly interact with the NVMe block device. Further, QLC SSDs have a significant delta between read and write throughput. Read throughput in the case of QLC can be as high as 4x or more than write throughput. What’s more, typical use cases around reads are latency sensitive, so we need to make sure that the I/O delivering this massive read BW is not getting serialized behind the writes. This requires building and carefully tuning rate controllers and I/O schedulers.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

Related News >