May 31 2023 Update: Microsoft has increased the soft limit on handles on Azure Files from 2,000 to 10,000. The article has not been revised to reflect the implications for the example use case, but the logic for scaling out on multiple shares remains the same, just more relevant to 10K+ user scenarios.
Recently, we had a Citrix DaaS project that required moving workloads from on-prem to Azure. One of the challenges was the profile solution for the customer, namely the use of FSLogix as a profile solution that will be hosted on Azure Files as Azure Files was the only certified Azure storage solution for the customer at the time.
As a part of a solid design, we wanted to provide proper configuration for the profile solution. As FSLogix is becoming the de facto standard for the majority of enterprise engagements, we need to make sure that the proper configuration policies are in place as well as Azure Files storage configuration to ensure scalability of the solution, being mindful of any limits or vendor guidance. The purpose of this blog is to explain and show you the proper Azure Files storage configuration for the requirements presented by the customer.
Now, let’s look at the requirements.
- Total number of users: 2,000
- Peak logon times: 8:00 AM – 9:00 AM
- Logoff times: varies
- Number of Multi-session VDAs: 500
- User density: ~ 10 users per VM
This is the information we have gathered from the customer. The next thing we look at are the Azure Files limitations. Here is a Microsoft article about it Azure Files scalability and performance targets | Microsoft Learn, and the limits we found to be restrictive as of the beginning of March of 2023 are as follows:
- Handles: 2,000* (per file share root/file), this is not a hard limit but performance could degrade past this point
- Size: 100 TiB (premium storage account)
- IOPs: 100,000 (for 100 TiB provisioned)
*Increased to 10,000 May 31, 2023
As you can see the size and IOPS are fairly straightforward, but handles were the one limit we needed more clarification around. So, we have reached out to Microsoft, to get a clear answer about handles and how that limit impacts design decisions that we have for the customer. In a nutshell, the handles limit is tied to a folder root or file, and what happens is if one user logs in we have one handle at the root, and one handle per file (in this case one handle for the profile container for the user). When the second user logs in, we will have two handles at the file share root, one handle for the profile container of that user, and so on. Based on this the 2,000 handles limit will be reached at the share folder root level, before any of the individual file levels, as a user will have only one handle per file (vhdx container).
Based on this, we need to scale our Azure Files in such a way as not to exceed the handles limit. Theoretically, and based on these numbers, we could have a single file share for all 2,000 users, however, this does not give us any room for growth, regarding handles limit, as well as other limits we may exceed. A greater challenge is how to intelligently distribute users across Azure Files storage accounts, which we will get to later.
The next limit is the total aggregate storage size needed for the customer’s anticipated footprint. As this engagement will be hosting profiles and Office data within a single container, we need to size then accordingly. With this configuration, we are designing 30 GiB per container which gives us a need for a share of 60 TiB in size minimum.
So, let’s look at the IOPS. On average, a single user will consume 50 IOPS per logon/logoff event, as well as 10 IOPS for steady state session against the file share. Not included here are any considerations for Cloud Cache which is very write intensive on the VDA’s local storage, or the new VHDX compacting feature. This will put us at the maximum if we are to use a singe Azure Files share for 2,000 users, as per limits mentioned above.
Taking all this into consideration the following represents the proposed storage design, as per Microsoft testing and recommendations.
- Number of storage accounts: Two (2) Premium File Shares, to allow for growth and avoid storage resource provider limits per storage account as the environment scales.
- Number of file shares: One (1) per storage account (total of two).
- Number of users per file share: 1,000 user profiles, evenly distributed across the two file shares, to allow for future growth.
- Provisioned quota per share: 30 TiB (accounts for user profiles of 30 GiB), this quota accounts for user’s profiles of a maximum of 30 GB, although VHDX dynamic feature can be configured so customer only pays for a smaller provisioned size, and they increase the file share quota as the user profiles grow. Even though this is an option we need to make sure we don’t reach IOPS limits as IOPS, and throughput is affected by provisioned capacity.
- IOPs available per user: ~33 IOPs (Total for 30TiB is 33,000 IOPs), this number will cover an average of the requirements of FSLogix (steady consumption is ~10 IOPS and 50 IOPS for logon/logoff events). Additional IOPS to cover for login/logoff storms can be covered by the Storage account bursting, but should be monitored in event there is a sustained load during peak that exceeds burst allowances.
Another consideration is whether or not you may split the container into an Office and a profile container. If intending to replicate the profile, it is prudent to consider doing so, to reduce the amount of data being replicated, and their associated costs. Is it critical to your users that their cached Office data be available in a DR scenario or outage of a local file share or can it be re-created with minor inconvenience? If it can be recreated in the event of an outage, double the number of storage accounts and file shares.
In conclusion, we have defined Azure Files storage configuration for 1,000 users. This is fairly easy to remember and allows for growth within a single Azure Files share. The next thing to consider is how do we distribute the users across two shares? The easiest way is to split users alphabetically and add them to shares accordingly. This will require an additional grouping of users based on the username and different policies for FSLogix path. This assumes a relatively even distribution of users across the alphabet, which may not be the case in certain countries/cultures. The other option is to use scripting for the user distribution across multiple FSLogix shares as outlined in the Spreading users over multiple file shares with FSLogix Profile Containers – JAMES-RANKIN.COM article.
Please note that the above script will not work if you are trying to use Cloud Cache, and an alternate method for user distribution will be required.
At this time, I want to extend a big THANKS to the Microsoft product management and consulting teams, specifically to Danny Contreras and Robb Shaw, for their help and recommendations for this article.
Update September 2023
Finally, I’ve got some time to update this blog. As Microsoft has expanded the handles limits to 10,000 for the customers at scale (over 10,000 users) the IOPS has become critical factor for design. Let’s take an example for a customer that has 10,000 users. In this example, we will use a single container for both, profile and office. You can choose to separate, but the same logic applies.
Now, let’ review our limits.
- Handles: 10,000 (per file share root/file)
- Size: 100 TiB (premium storage account)
- IOPs: 100,000 (for 100 TiB provisioned)
We can do the calculations in the same order as in the previous example, so for 10,000 users, the number of handles per root, will be reached out before number of handles per file. Theoretically, based on the number of handles per root, we could have a single Azure Files SMB share.
If we take Microsoft default value for the profile container of 30 GB for a container, we will need 300 TB of provisioned storage. So, based on the just size we will need three Azure Files SMB shares, 100 TB each.
The last limit is the IOPS, and what I wanted to accomplish in this blog is to get several options, so customers can decide based on the steady-state and logon/logoff event. We know that we need 50 IOPS per logon/logoff event, and 10 IOPS for steady state.
Let’s look at the requirements when it comes to IOPS. We know by the size calculation that we need 3 SMB shares, and that configuration will give us about 30 IOPS per user on average. Depending on the logon/logoff storms this can be sufficient for most of the customers. For your reference here is a quick average IOPS calculations for different numbers of users.
- 5000 users -> 20 IOPS per user
- 4000 users -> 25 IOPS per user
- 3000 users -> 33 IOPS per user (this is the number I feel most comfortable)
- 2000 users -> 50 IOPS per user
Hope this helps in your next FSLogix implementation.
Zeljko Macanovic is a leading expert on Citrix technologies, with over three decades of experience working with Citrix and Microsoft platforms. During his tenure at Citrix, Zeljko had sat on the CCAT board and contributed to the development of various Citrix Consulting standards and methodology refinement. At Ferroque, Zeljko serves as a senior architect driving delivery and oversight across our consulting engagements.