NetScaler’s Evolution in Hardware SSL Offload
NetScaler’s MPX and SDX hardware form factors have long benefited from SSL/TLS offload and acceleration allowing cryptographic processing to be offloaded from the CPU processor to hardware via crypto cards or chipsets.
The Cavium Era
Early on, Cavium (now part of Marvell) N2 or N3 cards featured prominently in many series, including (but not limited) to the following series and their models:
- MPX/SDX 14xxx series
- MPX/SDX 22xxx, 24xxx series
- MPX 25xxx series
- MPX/SDX 8400, 8600, 8800 series
- MPX/SDX 115xx and 17500 series
- MPX 80xx, 8200 series
- MPX 5500, 5650 series
- MPX 7500 series
- MPX FIPS 9700, 10500, 12500, 140xx, 15500 series
The Coleto Era
In late 2015, NetScaler initiated a transition to their next hardware generation, which shifted away from Cavium cards and over to the Intel platform’s “Platform Controller Hub” (PCH) chipset-based cryptographic acceleration and offloading dubbed Intel QuickAssist (QAT) hardware v1. This new platform used the Coleto Creek chipset family featured in the following NetScaler series and their models as per documentation:
- MPX 5900
- MPX/SDX 89xx
- MPX/SDX 15xxx
- MPX/SDX 26xxx
The Lewisburg Era
In recent years, NetScaler introduced its latest generation of hardware following the same integrated chipset-accelerated cryptographic offloading architecture using the Lewisburg chipset family, also known as the C620 chipset series, which supports the first few generations of Xeon Scalable processors. As per documentation, the following models use this chipset (specifically the C627 variant):
- MPX/SDX 91xx
- MPX/SDX 16xxx
To date, the Lewisburg chipset has proven so powerful that NetScaler has been able to streamline its hardware platforms onto a mere two models while leaving the previous platform generation largely in the dust.
Intel QAT has since moved onto a hardware v2 iteration that handles the cryptographic acceleration within the processor itself, similar in concept to embedded GPU capabilities. This enhancement is found predominantly on Xeon Scalable 4th generation (Sapphire Rapids) processors and later. As of this writing, this iteration is not believed to be available on any NetScaler platform but is likely being worked on.
NetScaler VPX SSL Processing — Scale with Constraints
With the hardware-accelerated SSL/TLS offloading history lesson out of the way, we can move on to the NetScaler VPX — NetScaler’s virtual appliance form factor for popular hypervisors and public clouds. Except for VPX instances on the SDX hardware platform, VPXs handle SSL/TLS offloading in software through its Packet Processing Engines (PPE). Depending on the workload throughput and SSL/TLS cipher and key strength, this could command significant CPU resources from the underlying host. Nonetheless, the VPX has proven to be an exceptionally versatile platform that has enabled versatile and cost-effective deployments of NetScalers within on-premises hypervisors and public clouds since 2009.
At the time of this writing, VPX can scale to 100 Gbps with the right license and resource allocation. That is, however, system throughput. SSL/TLS throughput (based on 2048-bit keys per the datasheet) comes in at only 30 Gbps on the largest model. Even then, mileage may vary as VPX performance is heavily dependent on the underlying hardware, which can vary from platform to platform, unlike the consistency of the hardware form factor. Newer TLS/DTLS protocols and stronger ciphers can also impact maximum SSL/TLS throughput in software or hardware, adding another vector of possible SSL throughput strain on the VPX in very large deployments.
NetScaler VPX Gains Hardware SSL Offload
NetScaler VPXs with 14.1 firmware running on eligible server hardware on-premises with either ESXi or KVM hypervisors can now benefit from hardware-assisted cryptographic offloading.
In early 2024, Intel and NetScaler began quietly marketing a significant feature enhancement made available in NetScaler firmware 14.1 build 8.50 — the ability to add cryptographic acceleration in hardware into NetScaler VPX instances, which was introduced in late 2023. The fact that this notable feature was introduced in software further speaks to the advantages of NetScaler’s software-defined approach- its ability to optimize for commodity hardware capabilities versus being restricted by proprietary ASICs. Per testing performed in the solutions brief, significant performance improvements were yielded based on TLS 1.2 transaction rates. This remarkable development is presently available for VMware ESXi and KVM hypervisors. Some key notes from documentation and supplemental investigation:
- Relies on SR-IOV to gain access to the cryptographic offload and acceleration chips
- Presently, it is supported only on hypervisor hosts whose hardware is based on the Lewisburg PCH chipset (C620 series) with exceptions, more on this later
- Requires an Intel-provided driver to be installed onto the hypervisor
- Support for Intel QAT Adapter PCIe cards (Intel QAT 8960 and 8970) is unknown at the time of this writing
- Update June 2024: While NetScaler has not confirmed they will support them, we have successfully tested the Intel QAT 8970 on ESXi 8 and NetScaler VPX 14.1.
- Support for specialized crypto accelerator PCIe cards based on the C620 series is supported (the NetScaler MPX/9100 and 16000 series leverage C627-based add-on cards)
- No confirmed support for the Intel QAT hardware v2 (processor-embedded) acceleration that accompanies Sapphire Rapids Xeon Scalable 4th Gen+ processors (C740 chipset), although the Intel marketing material focus on the Scalable processor architecture may lead one to believe it might. NetScaler is likely working with Intel on integrating with this new iteration via software enhancement (another win for the software-defined architecture of NetScaler)
- Requires Intel processors — the acceleration is unavailable on VPXs deployed on AMD systems as the chipset is understandably Intel-only
- Does not presently have means of enablement on public cloud VPXs
- With SR-IOV, features such as live migration (vMotion) are not possible. Mind you, it is recommended to exempt VPXs from live migration (at least automated live migration such as DRS) to begin with, making this more of a minor inconvenience for host maintenance
Chipset QAT Eligibility
A note on the Lewisburg (C620 and C620A series) chipsets: This chipset series supports the Xeon Scalable 1st Generation (Skylake), 2nd Generation (Cascade Lake), and 3rd Generation (Ice Lake or Cooper Lake) processors. However, not all chipsets in the series contain the integrated Intel QAT technology. Per the Intel ARK database, the C625 and up chipsets include integrated Intel QAT capabilities. Some highlights on the QAT capabilities per Intel’s C620 datasheet:
- C625 – 1 QAT endpoint and 20 Gbps crypto throughput
- C626 – 2 QAT endpoints and 40 Gbps crypto throughput
- C627/A – 3 QAT endpoints and 100 Gbps crypto throughput
- C628 – 3 QAT endpoints and 100 Gbps crypto throughput – increased efficiency and clock speed
- C629/A – 3 QAT endpoints and 100 Gbps crypto throughput – increased efficiency and clock speed
Depending on the range, many Intel-based servers dating back several years can likely tap into the Intel QAT technology to improve their NetScaler VPX performance. By reducing reliance on software SSL acceleration and leveraging eligible QAT chipsets, SSL/TLS throughput is increased, and demand on the host CPU is reduced. Customers should consult their server vendor’s documentation to confirm which chipsets are included on their servers.
Wrapping Up
Ferroque’s NetScaler specialists look forward to validating this new feature on chipsets and PCIe accelerator cards and, hopefully, being in a position to recommend its implementation to our customers.
Our esteemed thanks to Chris Smauz, Lena Yarovaya, Saravanakumar Annamalaisami, and Venkat Swaminathan from NetScaler for their contributions to this article.
-
Michael Shuster
Michael is Ferroque's founder and a noted Citrix authority, overseeing operations and service delivery while keeping a hand in the technical cookie jar. He is a passionate advocate for end-user infrastructure technology, with a rich history designing and engineering solutions on Citrix, NetScaler, VMware, and Microsoft tech stacks.