Azure SQL Elastic Pools offer some great benefits for SQL Databases such as DTU based Performance and on-demand scaling up and down of compute/memory/storage via the Azure Portal. However, one thing that is not native out of the box, is scaling of storage on an elastic pool, in response to storage pressure. Recently I was exposed to this where an elastic pool reached its storage limit due to large amounts of data load happening, which caused the pool to run out of space and which would then quickly cause an outage as databases couldn’t allocate anymore space.

So, lets fix that and create our own….

To do this, we are going to leverage a couple of Azure features

  • Azure Monitor Alerts
  • Azure Automation Accounts & Runbooks

How do we do this?

What we want to happen is that when an Elastic Pool hits a certain limit of Data Space Allocated Percentage that an Azure Monitor Alert is triggered, which executes an Azure Automation Runbook, which runs some powershell to scale the elastic pool. How I have structured this is the following.

  1. At 80% Data Space Allocated, send me an email but don’t scale (so I get chance to validate it)
  2. At 90% Data Space Allocated, send me an email but this time scale the pool.
  3. At 95% Data Space Allocated, repeat the scale as before. This is a safety net in case the pool grows very quickly.

And part of this is also to have a hard limit of elastic pool size, so that I don’t get a situation where runaway storage growth needs me to take out a loan to pay the bill. All of this is easily doable with Azure Runbooks.

Lets Test

You’ll need some pre-reqs for this:

  1. Azure SQL Server
  2. Azure SQL Elastic Pool
  3. Azure Automation Account

If you don’t have these you can create some quickly with the below powershell, replacing the vars with what you want.

# Install Pre-Req Modules
Install-Module Az.Resources, Az.Sql, Az.ManagedServiceIdentity, Az.Automation

# Define our variables
$location = "australiaeast"
$rsgName = "myResourceGroup"
$sqlServerName = "mySQLServer"
$sqlServeElasticPoolName = "myElasticPool"
$sqlServerUser = "mysSQLAdminName"
$sqlServerPass = "SuperSecretPassword123!"
$automationAccountName = "myAutomationAccount"

# Create Resource Group
New-AzResourceGroup -Name $rsgName -Location $location

# Create our Credential Object
[securestring]$secStringPassword = ConvertTo-SecureString $sqlServerPass -AsPlainText -Force
[pscredential]$sqlCreds = New-Object System.Management.Automation.PSCredential ($sqlServerUser, $secStringPassword)

# Create SQL Server and Elastic Pool
$sqlServer = New-AzSQLServer -ServerName $sqlServerName -ResourceGroupName $rsgName -Location $location -ServerVersion "12.0" -SqlAdministratorCredentials $sqlCreds
$sqlElasticPool = New-AzSqlElasticPool -ServerName $sqlServerName -ResourceGroupName $rsgName -ElasticPoolName $sqlServeElasticPoolName -Edition "Standard" -Dtu 50 -DatabaseDtuMin 10 -DatabaseDtuMax 50 -StorageMB 51200

# Create Automation Account
New-AzAutomationAccount -Name $automationAccountName -ResourceGroupName $rsgName -Location $location -AssignSystemIdentity

Create your Azure Automation Runbook

Go to your Azure Automation Account, either an existing one or the one you created above, and create a Runbook.

create a runbook

Name it and ensure its running Powershell and 5.1

create a runbook

Once its created, open it and click Edit at the top, and then paste in the following Powershell Code. You can also get this from my GitHub Repo here. Once pasted in click Save and then Publish. If you don’t click Publish then the Runbook won’t execute.

create a runbook

<#
    .SYNOPSIS
    Automatically grow the allocated storage of an Azure SQL Elastic Pool from an Azure Runbook

    .DESCRIPTION
    When executed from an Azure Monitor Runbook Trigger, this script will expand the elastic pool by
    its specified amount, up to a hard limit.
    It also won't scale if it doesn't meet a scaling threshold, initially defined as 85%

    Created by Andy Roberts (andyr8939@gmail.com)

    .PARAMETER WebhookData
    Raw data passed to the runbook from Azure Monitor

    .PARAMETER poolIncreaseMB
    How much space to add to the pool on each scale event.  Initially set at 50 GB.

    .PARAMETER poolIncreaseLimitMB
    A hard limit of how big the pool can increase too.  Initially set at 1 TB.
#>

param
(
[Parameter (Mandatory=$false)]
[object] $WebhookData,
[Parameter (Mandatory= $false)]
[String] $poolIncreaseMB = "51200", # 50GB
[Parameter (Mandatory= $false)]
[String] $poolIncreaseLimitMB = "1048576" # 1TB
)

$connectionName = "AzureRunAsConnection"
try
{
    # Get the connection "AzureRunAsConnection "
    $servicePrincipalConnection=Get-AutomationConnection -Name $connectionName         

    "Logging in to Azure..."
    Connect-AzAccount `
        -ServicePrincipal `
        -TenantId $servicePrincipalConnection.TenantId `
        -ApplicationId $servicePrincipalConnection.ApplicationId `
        -CertificateThumbprint $servicePrincipalConnection.CertificateThumbprint 
}
catch {
    if (!$servicePrincipalConnection)
    {
        $ErrorMessage = "Connection $connectionName not found."
        throw $ErrorMessage
    } else{
        Write-Error -Message $_.Exception
        throw $_.Exception
    }
}

Write-Output "This runbook was started from webhook $WebhookName."
# Collect Webhook Data and convert to use with JSON
$WebhookBody = (ConvertFrom-Json -InputObject $WebhookData.RequestBody)

# Obtain the WebhookBody containing the AlertContext and any details we need
$AlertContext = [object] ($WebhookBody.data).context
$SubId = $AlertContext.subscriptionId
$ResourceGroupName = $AlertContext.resourceGroupName
$ResourceType = $AlertContext.resourceType
$ResourceName = $AlertContext.resourceName
$status = ($WebhookBody.data).status
$threshold = $WebhookBody.data.context.condition.allOf.threshold

# Write results to output for logging
Write-output "Subscription - $SubId"
Write-output "RSG Name - $ResourceGroupName"
Write-output "Resource Type - $ResourceType"
Write-output "Resrouce Name - $ResourceName"
Write-output "Status - $status"
Write-output "Threshold - $threshold"

# Only scale if over 85% allocated, to give chance to manually clean up on earlier 80% trigger with just email
if ($threshold -le 85) {
    Write-Output "Threshold is $threshold which does not require scaling.  Email only sent."
    Write-Output "Scaling Occurs once threshold is over 90%"
    Exit   
}

# As its over 85% threshold, prepare to scale
# Get Resources for SQL and Elastic Pool
$sqlServer = Get-AzSqlServer -ResourceGroupName $ResourceGroupName
$elasticPool = Get-AzSqlElasticPool -ElasticPoolName $ResourceName -ResourceGroupName $ResourceGroupName -ServerName $sqlserver.ServerName

# Add 50GB to the current size
$newElasticPoolStorageGB = $elasticpool.StorageMB + $poolIncreaseMB # Add 50GB

# Check new size and if it will exceed 1TB as a hard limit to prevent runaway scale
if ($newElasticPoolStorageGB -ge $poolIncreaseLimitMB) {
    $failError = "Elastic Pool Size for $ResourceName will exceed its allowed upper limit of $($poolIncreaseLimitMB /1024) GB so cannot scale"
    Write-Output $failError
    throw $failError
}

Write-Output "Current Elastic Pool Size for $ResourceName is $($elasticpool.StorageMB /1024) GB"
Write-Output "Scaling Elastic Pool $ResourceName to $($newElasticPoolStorageGB /1024) GB"

# Scale Elastic Pool to new size
$elasticpool | Set-AzSqlElasticPool -StorageMB $newElasticPoolStorageGB

Write-Output "New Elastic Pool Size for $ResourceName is $($newElasticPoolStorageGB /1024) GB"

Now you had added your Runbook, you need to ensure that the Automation Account knows about the AZ.Sql Module so it can actually run the powershell cmdlets in the runbook.

In your Automation Account, select Modules and Add a Module.

add module

Select Browse from Gallery and then click to Browse the Gallery. A nice feature here is that you don’t need to know the module name, just type the Powershell cmd you want and it will tell you the module. So for example type in Get-AzSQLServer and it will show you the Az.Sql Module. Select it and then when it comes back to the Add a Module screen, select 5.1 as the Runtime Version and Import. This can take a few minutes but as we are finished here so you can move on.

add module

Create your SQL Elastic Pool Alert

Now you have your Automation Runbook, its time to configure the Alert Rule. Go to your SQL Elastic Pool, Alerts and then Create Alert Rule.

create an alert rule

The metric you want to monitor is Data Space Allocated Percent, and the values you use are up to you, but I like to create an initial one at 80%. Also the period depends on your individual needs, but for a demo, I set it to 1 minute so it quickly notices the change.

alert rule

Next you need to crete an Action Group. This is what you are going to action, surprisingly, so it’s executing a runbook, triggering an email etc. So give it a sample name.

action group

Under Action Type, select Automation Runbook.

action group

To find the Runbook you created earlier, change the Runbook Source to User and then find the Automation Account and Runbook you Created earlier. You don’t need to Configure Parameters at this stage, but if you wanted to add in custom policies like how many GB to scale and the upper limit, you can do this here.

action group runbook

I also like to add in additional actions here, such as sending myself an email when it happens, but this is totally up to you.

Repeat all of the above Alert Rules for as many levels as you want. I typically have rules trigger at 80%, 90% and 95%, but you will end up with a rule looking like this.

completed alert rule

Time to expand that pool

Its now time to put this all together and see how it functions. You’ll need to add some data to your SQL Elastic Pool to push it over the threshold you set. I like to use these sample StackOverflow Databases for this purpose but you can use whatever suits.

80% Allocated - Don’t Grow

On my first run, I pushed Data Space Allocated to 80%. The Alert Rule on the Elastic Pool triggered the Runbook (yey) but it didn’t grown the pool. This was because I had the threshold in the parameters set to only scale at 90%. This is perfect, as my action group sent me an email so I can investigate if I need to.

no scale

90% Allocated - Lets Grow

But for the demo perspective, lets assume it was valid and now it suddenly hits 90%. So the 90% Alert Rule triggers the runbook again.

scale success

It works! We can see above that the runbook saw the threshold from the alert was 90%, which was over my 85% threshold. So because of that it knew it needed to scale and added 50 GB to the pool, meaning it was now 100 GB!

Protect with a limit

Like we put in earlier, we want a hard limit to stop run away growth of the pool. To illustrate this here is an example where I changed my thresholds so it would hit the limit in one go. This causes the runbook to throw an exception (as by default all runbooks will complete) and fail, highlighting the limit.

limit

That’s all there is to it. You can adjust and change that with thresholds to suit your needs really easily. It was a fun one for me to do as it solved a fairly common problem and got me more familar with Azure Automation Runbooks at the same time.

Andy