Introduction
If you’ve started using SqlAzure for your SqlServer with your Azure application, you’ve probably discovered that you get a reasonable number of connection failures. The advice from the Azure team is add retry logic to all your connections to SqlAzure. There is a long discussion posted by the Azure team here.
The key paragraph states the problem as follows:
The Problem
One of the things that SQL Azure does to deliver high availability is it sometimes closes connections. SQL Azure does some pretty cool stuff under the covers to minimize the impact, but this is a key difference in SQL Azure development vs. SQL Server development.
Basically, what this means is that you must be able to deal with connections failing when you call SqlAzure. Something that all of probably should have been doing forever, but because most of the time SqlServer is running on your local LAN and the likelihood if a connection failing was next to zero unless something else was going terribly wrong. Certainly not something we had to do on regular basis. To emphasize that even more, most of the controls built into asp.net that open connections to sqlserver don’t even do this and that’s from Microsoft itself.
The solution proposed in the thread mentioned above basically has you add tons of code to everyplace you access a connection object. Personally, I don’t like that because I have hundreds if not thousands of places I open connections and inserting tens of thousands of lines of extra new untested code is a little scary.
So, what to do?
Fortunately, another team at Microsoft, known as the Windows Server AppFabric Customer Advisory Team published a general purpose solution using Extension Methods and some darn clever coding wrote a great article and published code including azure examples that solves this problem very elegantly without requiring a lot of changes to your existing code base.
In this article I plan on giving an example and publishing a sample project that uses this code with SqlAzure to solve the connection retry problem. My goal here is not to simply restate what they published but to simply have a very simple concrete example of using their library.
Design Goal
We have two goals.
-
Change as little code as possible
-
Log Connection Errors when they happen With Locations
-
Make sure not to trap errors that are NOT connection related such as bad column names
Incorrect Code
So, this is what the original code looks like that will fail because it does not have connection retry logic:
public static int UsersIdFromUserNameNoConnectionRetry(string userName)
{
var retUsersId = 0;
const string sql =
@"SELECT Id FROM Users
WHERE Username = @Username";
using (var sqlConnection = new SqlConnection(
ConfigurationManager.ConnectionStrings["CRStorageWebConnectionString"].
ConnectionString))
{
sqlConnection.Open();
using (var sqlCommand = new SqlCommand(sql, sqlConnection))
{
sqlCommand.Parameters.Add("@Username", SqlDbType.NVarChar).Value = userName;
using (var reader = sqlCommand.ExecuteReader())
{
while (reader.Read())
{
retUsersId = reader.IsDBNull(0) ? 0 : reader.GetInt32(0);
}
}
}
}
return retUsersId;
}
So, if there is a connection, an exception will get thrown and will need to be caught, but then the method will not have done it’s job.
Correct Code With Retries
So, now take a look at the revised code after the library is setup and used. The setup is non-trivial, but you just have to do that once and then you can simply fix all your other code with very few changes. Below is the new code with connection retry logic built in.
public static int UsersIdFromUserName(string userName)
{
var retUsersId = 0;
const string sql =
@"SELECT Id FROM Users
WHERE Username = @Username";
using (var sqlConnection =
new ReliableSqlConnection(
ConfigurationManager.ConnectionStrings["CRStorageWebConnectionString"].
ConnectionString,
new RetryUtils("", "UsersIdFromUserName").GetRetryPolicy()))
{
sqlConnection.Open();
using (var sqlCommand = new SqlCommand(sql, sqlConnection.Current))
{
sqlCommand.Parameters.Add("@Username", SqlDbType.NVarChar).Value = userName;
using (var reader = sqlCommand.ExecuteReader())
{
while (reader.Read())
{
retUsersId = reader.IsDBNull(0) ? 0 : reader.GetInt32(0);
}
}
}
}
return retUsersId;
}
- The first is instead of create an SqlConnection(…) we are creating a ReliableSqlConnection(…). The ReliableSqlConnection takes an extra parameter which basically wraps the retry logic used, as well as labelling this connection so when it fails, it gets logged. In the log, there will be the comment “UsersIdFromUserName” so we know what method threw the retry.
- The second is slight change we when we create the SqlCommand, we have to add the property .Current to it so we know we are talking about the currently executing connection. There may be a default way to handle this but I could not figure it out.
The Setup Pieces
The project has all kinds of stuff in it that build quite nicely, run and test under vs2010. All I’m interested in is the ado.net piece and the retry logic around that. I actually used the 1.2 version, however I now see there is a 1.3 version with some improvements. I would post my project but I don’t want to post it with old code so I’ll just tell you the steps I went through so you can do the same.
Build the class library
Compile the project and make sure you have the dll from the library lincluded in your actual visual studio project. The library you want is Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling.
Update Your Web.config
Add the config section below to your web.config file.
<configSections>
<section name="RetryPolicyConfiguration"
type="Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling.Configuration.RetryPolicyConfigurationSettings,
Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling" />
</configSections>
<RetryPolicyConfiguration defaultPolicy="FixedIntervalDefault" defaultSqlConnectionPolicy="FixedIntervalDefault"
defaultSqlCommandPolicy="FixedIntervalDefault" defaultStoragePolicy="IncrementalIntervalDefault"
defaultCommunicationPolicy="IncrementalIntervalDefault">
<add name="FixedIntervalDefault" maxRetryCount="10" retryInterval="100" />
<add name="IncrementalIntervalDefault" maxRetryCount="10" retryInterval="100" retryIncrement="50" />
<add name="ExponentialIntervalDefault" maxRetryCount="10" minBackoff="100" maxBackoff="1000" deltaBackoff="100" />
</RetryPolicyConfiguration>
This defines a new section in your web.config, then implements it with several different retry policies include a default one which seems quite reasonable to me.
If you remember, in the implementation section above, we used a public class called RetryUtils. This class is actually one I invented as a convenience class to minimize the code I have to update on each use of the ReliableConnection Object. Just to refresh your memory, the implementation is this:
using (var sqlConnection =
new ReliableSqlConnection(
ConfigurationManager.ConnectionStrings["CRStorageWebConnectionString"].
ConnectionString,
new RetryUtils("", "UsersIdFromUserName").GetRetryPolicy()))
The actual code fo rthe class RetryUtils is below here. You’ll have to stick this someplace in your project.
namespace Utils
{
public class RetryUtils
{
public string Username { get; set; }
public string CallersName { get; set; }
public RetryUtils(string username, string callersName)
{
Username = username;
CallersName = callersName;
}
public RetryUtils()
{
Username = string.Empty;
CallersName = string.Empty;
}
public RetryPolicy<SqlAzureTransientErrorDetectionStrategy> GetRetryPolicy()
{
var retryPolicy = new RetryPolicy<SqlAzureTransientErrorDetectionStrategy>
(RetryPolicy.DefaultClientRetryCount, RetryPolicy.DefaultMinBackoff,
RetryPolicy.DefaultMaxBackoff,
RetryPolicy.DefaultClientBackoff);
retryPolicy.RetryOccurred += RetryPolicyRetryOccurred;
return retryPolicy;
}
void RetryPolicyRetryOccurred(int currentRetryCount, Exception lastException, TimeSpan delay)
{
GeneralUtils.GetLog4NetAllDataContext().AddLog4NetAll(new Log4NetAll
{
Date = DateTime.UtcNow,
EllapsedTime = 0,
ExceptionStackTrace = lastException.StackTrace,
Message = "RetryCount: " + currentRetryCount + " delay: " + delay.Seconds,
ExceptionMessage = lastException.Message,
Logger = "",
Level = "Error",
UserName = Username,
PartitionKey = Username,
RowKey = Guid.NewGuid().ToString()
});
}
}
}