Warnings:
"Azure Stack HCI" has been re-branded/renamed to "Azure Local"
ECC RAM might be a hard requirement now, where it wasn't before. That means, it may not be possible to use the MS-01 workstations to build a cluster anymore. I need to do some testing on this, and I will update this post when I do.
If you haven't already please check out the other parts of the series first:
Creating a HCI Cluster (finally!)
Alright, we're finally ready to talk about the process of building the cluster. Stack HCI version 23H2 introduced a whole new cluster creation process. It is now done entirely in the Azure Portal. This is a huge departure from the previous method, which required a lot of manual steps to be performed on the servers themselves. But now, all you have to do is click a few buttons in the portal.
Let's deploy the cluster in the portal and walk through the deployment steps.
1 - Basics
This is the first tab you'll see. You'll need to specify which Subscription & Resource group to use for the various Azure Stack HCI resources that will be created. It is important that you pick the same Resource Group that you used for your servers in Part 2 of this series. Otherwise, your servers won't show up as selectable.
You'll need to type a name for your cluster. Then, you'll need to pick an Azure Region for your cluster. Note that Azure Stack HCI only supports a small subset of Azure Regions. Finally, you'll also need to create a new Key Vault to be used by the cluster. It will be used for storing things like cryptographic keys, local admin credentials, and BitLocker recovery keys. There is currently no way to select your own Key Vault in the portal, per my knowledge.
Lastly, you'll need to select each server that you want to include in the cluster. If the servers have been prepared and connected to Azure Arc, as described in Part 2, then they will show up here as selectable. You'll need to select all of your servers and click "Validate Selected Servers." If the validation returns a green check mark, as show in the screenshot, then you can advance to the next tab.
2 - Configuration
This screen is pretty self-explanatory. We're not going to use a pre-existing Template Spec to build our cluster, so we'll pick "New Configuration."
3 - Networking
This tab is where we define all of the networking settings for our cluster.
As specified in Part 2, I'll need to pick "No switch for storage" as well as "Group management and compute traffic." Next, I can configure the "Intents" for each grouping of traffic. I'll also specify which network adapter(s) the intents will be associated with. You'll see all of this on screenshot 1.
For each intent you can configure various network settings, as shown on screenshot 2. Unless you have a reason to change these, I would suggest you leave them alone.
Finally, on screenshot 3 you'll see that we need to specify some IP addresses for the cluster to use. The cluster will need a total of 6 IP addresses from your management network.
4 - Management
This tab contains multiple different settings. First, we must specify a name for a new Custom Location. A Custom Location can be thought of like your own custom Azure Region that represents your new cluster. When you go to deploy a workload on your Stack HCI cluster, instead of picking a normal Azure Region, you would pick this Custom Location instead.
Next, you'll need to either create a new Storage Account or pick an existing one. It will be used as the cluster witness. This is helpful when your cluster has an even number of nodes, which can lead to potential "split-brain" issues, hence why a witness is needed. The amount of storage needed for the witness is next to nothing, less than 1 KB.
After that, you'll need to specify details of your Active Directory domain and provide a couple of credentials. The first credential is the deployment account that you created in the AD prep phase. The second credential is the local administrator credentials for your cluster servers. All of this information will be used to automatically add your servers to the domain and build the failover cluster.
5 - Security
Another pretty self-explanatory tab. I'd suggest just keeping the recommended settings, unless you have a specific reason to change those.
6 - Advanced
Azure Stack HCI must create some required "infrastructure" volumes on your cluster storage. Optionally, you can also choose to create "workload" volumes on the cluster storage as well. If you choose this option, then 1 workload volume will be created per server in your cluster. Since I have 2 servers it will create 2 workload volumes. I'll discuss more about volumes in a later article.
7 - Validation
I'm skipping the tab for Tags, because there's no need to discuss those. We'll jump directly to the tab tab for Validation.
Right away, some initial steps will automatically kick off. There are 6 steps as shown in screenshot 1. These steps will create the cluster object in Azure, assign permissions to the Azure Resource Group, create a new Storage Account and configure the existing Key Vault to send its audit logs to this new Storage Account. Lastly, it will assign permissions to said Key Vault, and create some secrets as well. When all of these show as succeeded you can click on the button for "Start validation."
The validation takes about 10-15 minutes and runs through multiple different checks, as you can see on screenshot 2. If the validation show all green check marks, then we are finally allowed to create the deployment for our new Azure Stack HCI cluster.
The Deployment
Once you successfully submit the deployment get ready to wait. This process can take multiple hours, depending on the configuration of your cluster. In the case of my small 2-node cluster, it took a little over 2 hours to complete.
This process has many, many steps. I couldn't fit them all in one screenshot, so I had to split them up into the 2 screenshots above.
The created resources
Above you'll see a screenshot of the OU that we created in Active Directory. At this point, it should have the following resources:
The deployment user account that we created during the AD prep phase
A unique computer account for each server in your cluster
A failover cluster computer object which represents your cluster
The screenshot above shows the Azure Resource Group after the cluster deployment has finished. At this point, it should have the following resources:
Azure Stack HCI - this is the cluster itself
Machine Azure Arc - one of these per server in your cluster
Custom location
Resource bridge
This is a VM that gets created on your cluster
Key Vault
Storage Accounts
One for the cluster secrets
One for Key Vault audit logs
Storage Paths - one of these per server in your cluster (as long as you picked the option to create workload volumes)
Above you'll see a diagram I created that shows all of the resources that you should have in Azure after a successful deployment. There are a couple of resources listed here that must be manually added later (Kubernetes, Logical Networks, VM Images) but we'll get to those in a later article.
Security Changes
Just wanted to put a quick note here, mentioning that your servers will have some security changes applied to them during the cluster creation.
One, Remote Desktop will be disabled. However, you can use a remote PowerShell session to re-enable it. Use the following commands to do so:
$ip="<IP address of the Azure Stack HCI server>"
Enter-PSSession -ComputerName $ip -Credential get-Credential
Enable-ASRemoteDesktop
Two, the local administrator username will be changed. The local administrator account will now have a username of ASBuiltInAdmin
HCI Updates
Pretty soon after the deployment, the Azure Portal should be giving you a warning that the HCI Cluster is out of date and needs an update. These updates are fully integrated into Azure Update Manager, as you can see in the screenshot above. Just like with the deployment of the cluster, the updates are completely automated and initiated from the portal. Also, the updates take a long time as well. For my small cluster, applying the latest update took exactly 2 hours.
In order to deploy the updates safely, Azure will automatically configure Cluster-Aware Updating (CAU). After CAU is enabled another failover cluster computer object will appear in the Active Directory OU, as you can see in the screenshot above.
Wrap Up
That wraps up everything required to deploy the cluster. In upcoming parts of this series, I will discuss deploying workloads to our cluster. This will include AKS on HCI, Azure Virtual Desktop on HCI, and some good old VMs as well.