This series of blog posts will discuss Microsoft's recommended method for doing CI/CD within their Azure Data Factory (ADF) service.
For the purposes of this series, let's assume we have 3 different ADF environments: development, test, and production.
The first step would be to deploy all the necessary ADF resources into all 3 environments. This includes the ADF instance itself, as well as any child resources used by ADF, such as ADF managed virtual networks, KeyVaults, Storage Accounts, etc. You can use whatever method you wish to deploy these resources, Bicep, Terraform, whatever. Deployment of these resources is not the point of this series.
The second step would be creating the CI/CD workflow that will handle the promotion of the entities that you create within ADF. These entities include things like pipelines, datasets, data flows, and more. This is the process that we will outline in this series.
The 1,000 foot view of this process is as follows:
Connect your development ADF instance to a Git repo
Make changes and updates inside your development ADF instance
The development ADF instance will "publish" its entities as Azure ARM Templates, which will be placed into the Git repo
Finally, deploy these ARM Templates to the other ADF instances, such as test and production.
You can, of course, do this manually. You can go into the development ADF Studio > Manage > ARM Template > Export ARM Template. Then, you can go into one of the other environment's ADF Studio > Manage > ARM Template > Import ARM Template. But, what is the fun of that? Nobody wants to manually do this process every time they need to promote changes to different environments.
As you can see, only the development ADF instance is connected to a Git repo. Do not connect the other ADF instances to a Git repo. The whole idea behind this workflow is that (a) you make changes ONLY to your development ADF instance, (b) those changes get captured as code in your Git repo, and then (c) you deploy that code to the other environments. By doing it this way, you should never be making manual changes to the test or production ADF instances, as the code will be making all of the changes for you.
You'll need to do some initial setup and configuration when you connect the development ADF instance to a Git repo:
Choose a "Collaboration Branch":
Any changes you make within the Dev ADF will be stored as regular JSON files in this branch
Any feature branches that you create should be based off this branch
Choose a "Publish Branch":
ADF will read the JSON files from the collaboration branch, and it will then generate the ARM Templates and store them here in the publish branch
Let's dig into the process in further details. These numbers line up with the screenshots found above.
While using ADF Studio on the development ADF instance, create a new feature branch. Using branches is a standard best practice. You typically do not want to make changes directly to your collaboration branch.
Once you have created your new branch, you can begin to make changes inside ADF Studio. Do whatever you need to do, create a new linked service, create a new pipeline, etc. In the background, ADF will automatically save all of your changes as JSON files in your feature branch.
Debug and test your new resources in ADF Studio. Do your pipelines run correctly? Do your linked services connect properly? etc.
Once you feel confident that your new changes are working correctly, you can use ADF Studio to create a pull request. In this example we are using GitHub, so ADF will automatically open the GitHub web interface and it will initiate the pull request process for you. You will need to complete the form on GitHub to officially open your pull request.
Using GitHub, work with your team to get the necessary approvals on your pull request. Once you have met all the requirements, you can merge the pull request. This will update the collaboration branch with the latest changes from your feature branch.
Back in ADF Studio, you can click on the "Publish" button. This will automatically read the latest code from the Collaboration Branch, it will build ARM Templates from that code, and finally it will store the ARM Templates in the Publish Branch.
Note: Step 6 is manual. Every time your PR is merged you must go into ADF and hit the "Publish" button in order to generate the ARM Templates. You can automate this if you want, Microsoft has an NPM package called ADFUtilities which can be used to automate Step 6. See this link for more information.
So, that's part 1 of this process. As you can probably guess, part 2 involves using the ARM Templates from the Publish Branch in order to deploy changes to the remaining ADF environments. Continue to part 2.
Comments