SqlAzure and a Best Practices way to deal with the Required Retries on Connections

Introduction

If you’ve started using SqlAzure for your SqlServer with your Azure application, you’ve probably discovered that you get a reasonable number of connection failures.  The advice from the Azure team is add retry logic to all your connections to SqlAzure. There is a long discussion posted by the Azure team here.

The key paragraph states the problem as follows:

The Problem
One of the things that SQL Azure does to deliver high availability is it sometimes closes connections. SQL Azure does some pretty cool stuff under the covers to minimize the impact, but this is a key difference in SQL Azure development vs. SQL Server development.

Basically, what this means is that you must be able to deal with connections failing when you call SqlAzure.  Something that all of probably should have been doing forever, but because most of the time SqlServer is running on your local LAN and the likelihood if a connection failing was next to zero unless something else was going terribly wrong.  Certainly not something we had to do on regular basis.  To emphasize that even more, most of the controls built into asp.net that open connections to sqlserver don’t even do this and that’s from Microsoft itself.

The solution proposed in the thread mentioned above basically has you add tons of code to everyplace you access a connection object.  Personally, I don’t like that because I have hundreds if not thousands of places I open connections and inserting tens of thousands of lines of extra new untested code is a little scary.

So, what to do?

Fortunately, another team at Microsoft, known as the Windows Server AppFabric Customer Advisory Team published a general purpose solution using Extension Methods and some darn clever coding wrote a great article and published code including azure examples that solves this problem very elegantly without requiring a lot of changes to your existing code base.

In this article I plan on giving an example and publishing a sample project that uses this code with SqlAzure to solve the connection retry problem.  My goal here is not to simply restate what they published but to simply have a very simple concrete example of using their library.


Design Goal

We have two goals.

  1. Change as little code as possible
  2. Log Connection Errors when they happen With Locations
  3. Make sure not to trap errors that are NOT connection related such as bad column names

 

Incorrect Code

So, this is what the original code looks like that will fail because it does not have connection retry logic:

 

public static int UsersIdFromUserNameNoConnectionRetry(string userName)
{
var retUsersId = 0;
const string sql =
@"SELECT Id FROM Users
WHERE Username = @Username"
;

using (var sqlConnection = new SqlConnection(
ConfigurationManager.ConnectionStrings["CRStorageWebConnectionString"].
ConnectionString))
{
sqlConnection.Open();
using (var sqlCommand = new SqlCommand(sql, sqlConnection))
{
sqlCommand.Parameters.Add("@Username", SqlDbType.NVarChar).Value = userName;
using (var reader = sqlCommand.ExecuteReader())
{
while (reader.Read())
{
retUsersId = reader.IsDBNull(0) ? 0 : reader.GetInt32(0);
}
}
}
}
return retUsersId;
}

So, if there is a connection, an exception will get thrown and will need to be caught, but then the method will not have done it’s job.

 

Correct Code With Retries

So, now take a look at the revised code after the library is setup and used.  The setup is non-trivial, but you just have to do that once and then you can simply fix all your other code with very few changes.  Below is the new code with connection retry logic built in.

 

public static int UsersIdFromUserName(string userName)
{
var retUsersId = 0;
const string sql =
@"SELECT Id FROM Users
WHERE Username = @Username"
;

using (var sqlConnection =
new ReliableSqlConnection(
ConfigurationManager.ConnectionStrings["CRStorageWebConnectionString"].
ConnectionString,
new RetryUtils("", "UsersIdFromUserName").GetRetryPolicy()))
{
sqlConnection.Open();
using (var sqlCommand = new SqlCommand(sql, sqlConnection.Current))
{
sqlCommand.Parameters.Add("@Username", SqlDbType.NVarChar).Value = userName;
using (var reader = sqlCommand.ExecuteReader())
{
while (reader.Read())
{
retUsersId = reader.IsDBNull(0) ? 0 : reader.GetInt32(0);
}
}
}
}
return retUsersId;
}
 
There are basically two changes. 
 
  1. The first is instead of create an SqlConnection(…) we are creating a ReliableSqlConnection(…).  The ReliableSqlConnection takes an extra parameter which basically wraps the retry logic used, as well as labelling this connection so when it fails, it gets logged.  In the log, there will be the comment “UsersIdFromUserName” so we know what method threw the retry.
  2. The second is slight change we when we create the SqlCommand, we have to add the property .Current to it so we know we are talking about the currently executing connection.  There may be a default way to handle this but I could not figure it out.
That’s it!  you are new connection safe for retrying failed connections.
 

The Setup Pieces

 
First, download the library from the article mention above which can be found here in Microsoft’s Code Gallery
 
 

The project has all kinds of stuff in it that build quite nicely, run and test under vs2010.  All I’m interested in is the ado.net piece and the retry logic around that.  I actually used the 1.2 version, however I now see there is a 1.3 version with some improvements.  I would post my project but I don’t want to post it with old code so I’ll just tell you the steps I went through so you can do the same.

Build the class library

Compile the project and make sure you have the dll from the library lincluded in your actual visual studio project.  The library you want is Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling.

 

image

 

Update Your Web.config

Add the config section below to your web.config file.

<configSections>
<section name="RetryPolicyConfiguration"
type="Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling.Configuration.RetryPolicyConfigurationSettings,
Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling"
/>
</configSections>

<RetryPolicyConfiguration defaultPolicy="FixedIntervalDefault" defaultSqlConnectionPolicy="FixedIntervalDefault"
defaultSqlCommandPolicy="FixedIntervalDefault" defaultStoragePolicy="IncrementalIntervalDefault"
defaultCommunicationPolicy="IncrementalIntervalDefault">
<add name="FixedIntervalDefault" maxRetryCount="10" retryInterval="100" />
<add name="IncrementalIntervalDefault" maxRetryCount="10" retryInterval="100" retryIncrement="50" />
<add name="ExponentialIntervalDefault" maxRetryCount="10" minBackoff="100" maxBackoff="1000" deltaBackoff="100" />
</RetryPolicyConfiguration>

This defines a new section in your web.config, then implements it with several different retry policies include a default one which seems quite reasonable to me.

If you remember, in the implementation section above, we used a public class called RetryUtils.  This class is actually one I invented as a convenience class to minimize the code I have to update on each use of the ReliableConnection Object.  Just to refresh your memory, the implementation is this:

using (var sqlConnection =
new ReliableSqlConnection(
ConfigurationManager.ConnectionStrings["CRStorageWebConnectionString"].
ConnectionString,
new RetryUtils("", "UsersIdFromUserName").GetRetryPolicy()))

The actual code fo rthe class RetryUtils is below here.  You’ll have to stick this someplace in your project.

namespace Utils
{

public class RetryUtils
{
public string Username { get; set; }
public string CallersName { get; set; }

public RetryUtils(string username, string callersName)
{
Username = username;
CallersName = callersName;
}

public RetryUtils()
{
Username = string.Empty;
CallersName = string.Empty;
}

public RetryPolicy<SqlAzureTransientErrorDetectionStrategy> GetRetryPolicy()
{
var retryPolicy = new RetryPolicy<SqlAzureTransientErrorDetectionStrategy>
(RetryPolicy.DefaultClientRetryCount, RetryPolicy.DefaultMinBackoff,
RetryPolicy.DefaultMaxBackoff,
RetryPolicy.DefaultClientBackoff);

retryPolicy.RetryOccurred += RetryPolicyRetryOccurred;

return retryPolicy;
}

void RetryPolicyRetryOccurred(int currentRetryCount, Exception lastException, TimeSpan delay)
{
GeneralUtils.GetLog4NetAllDataContext().AddLog4NetAll(new Log4NetAll
{
Date = DateTime.UtcNow,
EllapsedTime = 0,
ExceptionStackTrace = lastException.StackTrace,
Message = "RetryCount: " + currentRetryCount + " delay: " + delay.Seconds,
ExceptionMessage = lastException.Message,
Logger = "",
Level = "Error",
UserName = Username,
PartitionKey = Username,
RowKey = Guid.NewGuid().ToString()
});
}
}
}
This code actually has the nice callback delegate that does the logging when a retry actually occurs.  I’m not including my implementation of GeneralUtils.GetLog4NetAllDataContext().AddLog4NetAll, but you can pretty much guess what it does.  It simply logs the retry with all the details of what happened.  My implementation sticks this in an Azure Table, but that’s really for another post.
 

Non Connection Related Errors

 
Remember, our second design criteria is that we should only fail on errors that are connection related and not things like data column not found.  With no additional work, this library takes care of this for us.  Actually, in the release notes for the 1.3 release (which I have not used yet) say they have improved that feature by adding additional codes not to fail on.  That is, the last thing you want is your code spending 5 minutes retrying on a problem you’d just like reported immediately.
 
Keep in mind that the library we are using here is a general purpose retry library not designed just for use with ado.net.  The team has provided us with examples using LINQ2SQL, EntityFramework and other technologies.  I spent  a little time reading about using those other technologies but did not get far enough to blog about it.  My current SqlAzure implementation uses 100% ado.net because performance is critical to me and neither EF or LINQ2SQL are quite up to my task yet.
 

Conclusions

 
So basically, that’s it!  You know have the tools to implement very nice retry logic in your ado.net code so Azure will not fail on “normal” connection failures.  I suggest that if you are using SqlAzure, you implement this as soon as possible so your code will be solid going forward.
 
Thanks for reading!
About Peter Kellner

Peter is a software professional specializing in mobile and web technologies. He has also been a Microsoft MVP for the past 7 years. To read more about Peter Kellner and his experience click here. For information about how Peter Kellner might be able to help you with your project click here.

Follow me:


Trackbacks

  1. [...] As part of the meetup, Robin (@robindotnet) and I both did short presentations on how to  work SqlAzure, basically based around the problem of Azure (by design) dropping connections.  Robin talked about her real world experiences, and I talked about a method for how to deal with the problem elegantly in ado.net.  I did a blog post explaining what I presented here:  http://peterkellner.net/2011/01/21/sqlazure-connection-retry-problem-using-best-practices/ [...]

  2. [...] SqlAzure and a Best Practices way to deal with the Required Retries on Connections [...]

  3. [...] of connection failures.  The advice from the Azure team is add retry logic to all your… [full post] Peter Kellner PeterKellner.net asp.net 4.0azurebest practicesc# 0 0 [...]

Your Comments

*

Protected with IP Blacklist CloudIP Blacklist Cloud

Follow

Get every new post delivered to your Inbox

Join other followers: