Enabling Azure SQL Database Auto-Tuning at Scale for Microsoft IT

Enabling Azure SQL Database Auto-Tuning at Scale for Microsoft IT


All right, we’re back with
another Azure in the Enterprise, I’m Lyle Dodge. Today I have with me
Rajesh Vasireddy. Rajesh, nice to see you again.>>Nice to meet you.>>So today we’re gonna talk
about one specific thing in the SQL PaaS platform. But you’re from Microsoft IT. Can you give us just kind of
an idea of what we run in Microsoft IT for
the data platform?>>Yeah, like any enterprise,
we do have our supply chain, marketing, sales. And have all these applications
relying on the native SQL instances over
a period of time. For the cloud disruption, well,
SQL PaaS is our obvious choice. So we are a huge
Azure SQL DB PaaS database shop.>>Mm-hm. So we’ve got couple SQL Servers
on-prem, a lot of SQL on IaaS, and we’re heavily
using SQL PaaS, so that’s the Azure SQL Database.>>We’re using that left and
right.>>That is correct.>>And so
what I wanted you to talk to us. And especially our
enterprise customers today that are looking at SQL PaaS,
is one specific feature on that. And that is the auto tuning feature-
>>Yep.>>Of SQL PaaS. So can you just tell us a bit,
quickly, what that is? And then we’re gonna actually
show some results of something that you did.>>Absolutely, so Azure SQL DB being a platform
as a service offering for the product, had a lot of
intelligence built into it. And one of the ultimate
is the advisory. Advisory means platform,
look at your usage. Some metadata is good enough
to produce actionable recommendations. And by default, it will provide you an advisor
when you go to the Azure Portal. And there is an optional setting
that you can trust it to go ahead and implement it
without you being reactive to the advisory, and
pushing a button to implement. You can turn on a setting called
auto tuning that exists for every Azure SQL DB as well
as a logical server setting. So that it go ahead and
implements the recommendations.>>And so we’re looking
at this graph here. So today we just have
create index, drop index.>>Mm-hm.
>>Those are the first two that they’ve released.>>Yep.
>>So can you walk me through what we’re looking at here?>>Yeah, so two weeks ago, we
executed a big batch of one of our business units,
a critical business unit, where we collect all the
subscriptions that there is all the applications for
the entire business unit. And we turn on auto tuning
by default across all subscriptions. And in total, we covered approximately 348
servers with 829 databases. And turn on auto tuning. So the good thing is when you
turn on auto tuning, we can see the effectiveness of each of
the implemented recommendation. And the service plane is capable
of generating recommendations, implementing the
recommendations, and capable of detecting regression. And can reboot on its own if
an implemented recommendation started exhibiting regression
that outweighs the benefit of the recommendation.>>So we look here,
over the past two weeks, it’s created 2,200 indexes and
it rolled back 300 of those.>>Yep.>>But it still did a bunch and
then over that same two week period, it dropped 5,700
indexes across those 800->>Yeah, isn’t that amazing? Just imagine a world where every
index the functionality by putting some index,
mode application to production, and then you have a new
workload patterns that the existing indexes may or
may not serve the purpose. And then actually most of
the time, they tend to be.>>Mm-hm.
>>So we often do this index managing being reactive, and
performance drops or something.>>And you have to fight for
almost 15 days by with all the things involved to it either
add an index or drop an index. But here by just turning an auto
tuning in the last two weeks, as it said, it dropped
5,700 on user indexes and there is a definition on. Why this index can be trapped? And it is capable of, as I said,
detecting regression, and see how many reverted out
of the 5,700 drop indexes. It only reverted 97, which means
it recreated those indexes because it felt that there’s
some workload after dropping that index, missing that index
so it recreated it back. So it is completely safe.>>So then you can see here, one
of the other things you have in here is-
>>Mm-hm.>>It improved over
roughly 4,000 queries. And so especially for enterprises that are migrating
a database over to SQL PaaS, the workloads have
changed over time. So this is improving
everybody’s experience so->>It is. This advisory is all
about giving meaningful recommendations with a view to
enhance business experience by pushing the performance and reducing the resource
consumption. As you see here, almost 4,000
plus queries got a positive enhancement on its performance,
and consuming lesser resources. And there’s a slight number of
queries that regression and that’s why you see a correlated
number of approximately 300 indexes created. It has been reverted. So just imagine 4,000 dynamic
SQL queries and how many man hours it will take for you to
see out of this dynamic SQL coming in, which one index is
gonna give you the most benefit. At the same time, you cannot
create all the indexes for all those 4,000 or
5,000 queries, right? So->>So you can see here, a couple of these queries really
got some massive benefits. In addition, you did some
measuring to see kind of roughly ballpark what the resource drop
is or the two week period. So can you walk me
through this tray here?>>Yeah, often we see that
an Azure SQL DB when you buy it with the comes with the and the
is the common CPU memory I/O. CPU is the obvious factor that
most of the times we see that pushing the radio, and often,
[INAUDIBLE] this recommendation, we see on an average 257
CP hours has been reduced. You might not realize that
as a huge number, but when it’s spread across
tons of databases, it’s good enough to push
the database tier from a higher level to lower tier
in most of the cases.>>That saves a lot of money.>>Yeah,
that saves a lot of money. At the same time-
>>And not only are you saving
money but the people that are running the workloads are
experiencing better performance.>>Absolutely, because you no longer have the
noisy [INAUDIBLE] contention. You have less resource
consumption so that you’re enabling your peers
to use those resources in spare and for results in a much
better enhancer performance.>>And so
what is this CreateIndex, DropIndex over here in
the top left corner?>>So the main concern that
anybody will have that when I let somebody create an index. First of all, is the right index and how much
space it’s gonna consume because space is also a factor baked
into your SQL elements. So this graph shows that of
all the 2,200 indexes added, the total capacity that
additional took is 368 GB across->>830 databases->>800 databases, so it’s not that much because at the same
time when you drop this->>Ballpark a third of a gigabyte of database,
something like that.>>Yeah, yeah, if you take
a probably that’s one-third of the size, but that’s for
approximately 900 databases.>>And you save some space
on your drop index as well.>>Yeah, those are just indexes
sitting there doing nothing. They are not
contributing at all. They are never touched in
that’s long, long, long time. We know the definition
internally because we took part in engineering
in this function. So we know the definition. So they’re absolutely safe
to go ahead and drop it.>>Yep. And so what’s this daily
logical reads down here?>>Yeah. After the CPU, the other
significant factor is your reads, because when you
have a query running, most of the times we see
I/O being the bottleneck.>>Mm-hm.
>>And because you have a query that is doing a huge scan and
doing a lot of reads. By putting the right indexes
there, you’re reducing. You possibly can reduce to
the same operation with the less read operation. So if you look at it, this is a chart drilled down
on implemented recommendation. The first one if you see it,
may probably. Yeah, it reduced those many
logical reads by just putting that one right index there.>>78 billion.>>Yes, and I cannot count this
number if we sum up all that so that the graph on the right
shows that all the logical read that are reduced after we
implement this recommendation. Count the number.>>Yeah, so you’re looking
at 263 billion logical reads a day just by
loading the platform.>>Yeah.
So which means you’re getting the same result set by doing
these many lesser reads. So that automatically translates
to your lesser resource consumption. Significantly improved
end user experience.>>Yeah, and this doesn’t mean
that the DBAs are terrible. This doesn’t mean any of that.>>No.
>>This just means that hey, we have databases.>>Yeah.>>We can’t predict how the
workloads gonna be day over day, month over month,
year over year. But let’s just trust that for
them to go ahead and do it.>>Yeah, there’s no way that
even ten colored code DBAs might just sit for a month, and
still cannot find out the two indexes that can cover the most
that will benefit, right? Because it is impossible. You have diverse word patterns
of workload coming in, and there’s no one index that’s
a magical bullet that works for all. So here we’re trusting
the platform because it know what is that one index
that got the most benefit. It will show you,
after a bunch of validations, it show there’s a high
impact recommendation. Go ahead and
trust it and turn it on.>>Okay, so if when our
enterprise customers that are watching this, they can
go to their Azure Portal, they can go either at
the server, turn on, tuning at the server, or they
can turn on to database log.>>Yep.>>They can also turn it
on to database level, and say don’t inherit. So, if for some reason,
somebody doesn’t trust it.>>Yep.
>>But it’s like, this is pretty.>>Yeah,
this is very easy to configure.>>Which business unit is this?>>It’s for core finance.>>Okay.
>>This is our fifth business unit. We started turning it on
probably in the last six months.>>Mm-hm.>>We’re working on this feature
from last one and a half year, when it is in pilot,
doing work on recommendation. So we saw this recent wall
order, paid it off time. It has matured enough that
we believed we just go ahead and turn it on.>>So you just turned it on
across 348 servers and finance, 830 databases, which I don’t see anything that
says which is production or dev. So is this just everything?>>No, we don’t even measure
how many of this production. Probably one and a half feet
ago when we were piloting, probably [INAUDIBLE]
started you add first and see, now these reads, so
reads reached to a sophisticated level that we just
turned it on by default.>>Yeah.
>>The only parameter we supply
is subscription IDs. We don’t even know
which our production->>So a great thing for an enterprise would be maybe set
an Azure policy that if you’re creating a SQL
patterns database->>Yep.>>Just turn it on by default.>>Absolutely, it had all
the safeguards built into it. It is capable of
detecting regression. It cannot overrun
your resources. It have intelligence
baked into it. When is the right time to
implement those recommendations?>>So
it’s not gonna go [CROSSTALK] or anything like that.>>Yeah, so that makes it. Why not?>>Yeah.
>>I don’t see a reason why any enterprise cannot turn
this on by default. And to be honest, I know
a matter of fact that very soon, we might see this as
a default setting. That the [INAUDIBLE]
might offer.>>Yeah, [INAUDIBLE] going on. So->>Yeah.>>I know after this,
you’re gonna help turn it on for the rest of IT, and I know you’re in talks with
the SQL product group.>>Yep.
>>As might do more pilots across all
the divisions of Microsoft. So->>Yeah, there’s a lot more features
other than just index and create and dropping indexes. There’s a lot more
coming in the way. But these two
are production ready. We’re running it for
a long time. And probably so far, for all
the five batches, we make our movement at some 15,000
recommendations, and we have yet to receive a single incident
from any user complaining that, hey, something happened. And we’re often
pleasantly surprised. Hey, I’m struggling to
identify why my job all of a sudden started running
fast and I don’t know why.>>Stop complaining,
let it run faster.>>Yeah, so that’s
a pleasant surprise, right? So we trust this level, and we want it implemented across
Microsoft internal consumption.>>Okay, well Rajesh,
thanks for your time today. I’m sure we’ll see you again. I’d like to do this again in
a few months here once we’ve rolled this out and see what
would happen across all of IT.>>Sure, thank you very much for allowing us to tell
this great story.>>And so here’s an action item. Go to the Azure Portal on
your SQL Azure databases, production dev test,
doesn’t matter.>>Yep.>>Turn it on. We’ll come back in a few months,
and we’ll look at what this
means for all of MS IT.>>Yeah.>>We’ll see you on the next
Azure in the Enterprise.

Danny Hutson

Leave a Reply

Your email address will not be published. Required fields are marked *