Protect your your Azure Resources from accidental deletion

Last week I’ve messed up a big time. I was doing Azure Storage Account migrations via a PowerShell script, part of that script was deletion of old resource group. That group was meant to be empty every time because I was moving resources out of it.

At one point script threw exception and last storage account was not moved out of the resource group, but last line of the script was to delete that resource group. And it deleted the storage account. OOOOPSSS!

We had backups of data, but that accident made us think twice about our practices. So as part of a response to that accident we decided to use RM locks on all resources.

Azure Resource Management allows to lock subscription, resource group or individual resource to prevent users from accidentally deleting critical resources. Locks can be CanNotDelete or ReadOnly. Read more about locks in official documentation. If you put a subscription lock it covers all resources and groups within this subscription, but for our purposes this was too generic – you remove a lock and all of your resources are open for accidental deletion. So I settle on adding locks per Resource Group.

We have a few hundred different resources with about 50ish resource groups. I did not fancy going through each of that manually and add locks to them. So I wrote a PowerShell script to do that: go find every resource group, and if there is no lock already, create a deletion lock.

You need latest version of Azure PowerShell (currently v5.2.0) to run this script:

if ([string]::IsNullOrEmpty($(Get-AzureRmContext).Account)) 
{
    # only login when we need it
    # https://stackoverflow.com/a/46271580/809357    
    Login-AzureRmAccount
}
Select-AzureRmSubscription -SubscriptionId <SubscriptionId>

$rgs = Get-AzureRmResourceGroup; 

if(!$rgs)
{ 
    Write-Output "No resource groups in your subscription"; 
} 
else
{ 
    foreach($resourceGroup in $rgs)
    { 
        $groupName = $resourceGroup.ResourceGroupName
        $locks = Get-AzureRmResourceLock -ResourceGroupName $groupName

        if ($locks -eq $null) {
            # need to add lock
            $lockName = "$groupName-DoNotDeleteGroup"
            Write-Host "Adding new DoNotDelete lock on $groupName resource Group. Lock name is $lockName"
            New-AzureRmResourceLock -LockName $lockName -LockLevel CanNotDelete -ResourceGroupName $groupName -LockNotes "Automatically added by a script" -Force 
        }
    } 
}   

Azure Automation

This worked nice and easy. But our Azure account is undergoing constant changes – we do create new stuff every week and I need to remember to run this script from time to time. Here comes Azure Automation Runbooks.

Azure Automation is a way to trigger repetitive management tasks on your Azure resources. You can do random stuff with this, for example run scheduled OS updates in your VMs. This is perfect for my little task of locking resources.

It is virtually free to run. You get 500 minutes per month for free and £0.001 per minute after that. I don’t think I’ll ever reach over the free tier.

There are a ton of tutorials about how to get started with Automation, I’ll leave that for you to discover.

In short I’ve created Automatoin Account and a Runbook that is run on schedule. When you create Automation Account make sure you say Yes to “Create Azure Run as Account”. This will save you a lot of headaches later. This sets up an application in your AD with account and this will allow your scripts to access your Azure Resources without entering credentials.

For Runbook my script ended up like this:

# Autogenerated bit for authentication
$connectionName = "AzureRunAsConnection"
try
{
    # Get the connection "AzureRunAsConnection "
    $servicePrincipalConnection=Get-AutomationConnection -Name $connectionName         

    "Logging in to Azure..."
    Add-AzureRmAccount `
        -ServicePrincipal `
        -TenantId $servicePrincipalConnection.TenantId `
        -ApplicationId $servicePrincipalConnection.ApplicationId `
        -CertificateThumbprint $servicePrincipalConnection.CertificateThumbprint 
}
catch {
    if (!$servicePrincipalConnection)
    {
        $ErrorMessage = "Connection $connectionName not found."
        throw $ErrorMessage
    } else{
        Write-Error -Message $_.Exception
        throw $_.Exception
    }
}
#End of autogenerated authentication script

$rgs = Get-AzureRmResourceGroup; 

if(!$rgs)
{ 
    Write-Output "No resource groups in your subscription"; 
} 
else
{ 
    foreach($resourceGroup in $rgs)
    { 
        $groupName = $resourceGroup.ResourceGroupName
        $locks = Get-AzureRmResourceLock -ResourceGroupName $groupName

        if ($locks -eq $null) {
            # need to add lock
            $lockName = "$groupName-DoNotDeleteGroup"
            Write-Host "Adding new DoNotDelete lock on $groupName resource Group. Lock name is $lockName"
            New-AzureRmResourceLock -LockName $lockName -LockLevel CanNotDelete -ResourceGroupName $groupName -LockNotes "Automatically added by a script" -Force 
        }
    } 
}   

I’ve executed this just to find a bunch of errors:

New-AzureRmResourceLock : AuthorizationFailed : The client '#ClientId#' with object id 
'#PrincipalId#' does not have authorization to perform action 
'Microsoft.Authorization/locks/write' over scope '/subscriptions/#SubscriptionId#/resourceGroups/ResourceName/providers/Microsoft.Authorization/locks/MyGroup-DoNotDeleteGroup'.

Take note of PrincipalId and SubscriptionId – you’ll need it later.

Turned out that default account that is created for you when you create Automation Account has Contributor role. And that role does not have permissions to modify resource locks.

So I had to create a new role, give correct permission and assign that service principal to that role. I did that in PowerShell so it is easier to reproduce next time:

if ([string]::IsNullOrEmpty($(Get-AzureRmContext).Account)) 
{
    # only login when we need it
    # https://stackoverflow.com/a/46271580/809357    
    Login-AzureRmAccount
}
Select-AzureRmSubscription -SubscriptionId "<SubscriptionId>"

# get application
$application = Get-AzureRmADApplication -DisplayNameStartWith "<NameOfYourAutomationAccount>"

# Get service Principal. Take ID from the error message
$servicePrincipal = Get-AzureRmADServicePrincipal -ObjectId "<PrincipalId>"

## Create new role https://docs.microsoft.com/en-us/powershell/module/azurerm.resources/new-azurermroledefinition?view=azurermps-5.2.0
$role = Get-AzureRmRoleDefinition -Name "Virtual Machine Contributor" # get existing role and use the object as a template
$role.Id = $null
$role.Name = "Resource Locks Operator"
$role.Description = "Can add and delete resouce locks"
$role.Actions.RemoveRange(0,$role.Actions.Count)
$role.Actions.Add("Microsoft.Authorization/locks/*")
$role.AssignableScopes.Clear()
$role.AssignableScopes.Add("/subscriptions/<SubscriptionId>") # give these permissions over entire subscribtion
New-AzureRmRoleDefinition -Role $role

# now assign this new role to the principal
New-AzureRMRoleAssignment -RoleDefinitionName "Resource Locks Operator" -ServicePrincipalName $servicePrincipal.ApplicationId -Scope "/subscriptions/<SubscriptionId>" 

# double check that principal has that role
Get-AzureRmRoleAssignment -ObjectId "<PrincipalId>"

After I’ve executed this script, my runbook script got executed without errors and done what it was supposed to do.