sql server 主键_SQL Server中人口过多的主键和CE模型的变化-白红宇

sql server 主键_SQL Server中人口过多的主键和CE模型的变化

阅读量：2514 次

发布时间：2019-05-11

本文共 5534 字，大约阅读时间需要 18 分钟。

sql server 主键

In this blog post, we are going to talk about another cardinality estimation model enhancement in SQL Server 2014 – Overpopulated Primary Key (OPK).

在此博客文章中，我们将讨论SQL Server 2014中的另一个基数估计模型增强功能–人口过多的主键（OPK）。

Consider a fact table that contains information about some sales, for example, and a date dimension table. Usually, a fact table contains the data about the current year and past years, but a dimension table usually contains the data for the next few years also.

考虑一个事实表，其中包含有关某些销售的信息（例如）和日期维度表。通常，事实表包含有关当年和过去几年的数据，但是维度表通常也包含未来几年的数据。

If we are joining two tables fact and dimension and filtering on dimension table we usually demand the rows that do exist in a fact table, for example, we ask for the last month sales, but not for the next year sales. We also remember that a filter influences a join estimation, the join column statistics information should be modified according to the filter selectivity and cardinality. I don’t know the precise formula that is used for that in SQL Server, however, the algorithm is called Selection Without Replacement and is taken from the probability theory (the exact method doing maths in SQL Server is sqllang!CCardUtilSQL12::ProbSampleWithoutReplacement). You may read some details about the similar algorithm in Oracle further in useful links.

如果我们要连接两个表事实和维度并在维度表上进行过滤，则通常需要事实表中确实存在的行，例如，我们要求上个月的销售额，而不要求下一年的销售额。我们还记得过滤器会影响联接估计，联接列统计信息应根据过滤器的选择性和基数进行修改。我不知道用于SQL Server的精确公式，但是该算法称为“无替换选择”，它取自概率论（SQL Server中进行数学运算的确切方法是sqllang！CCardUtilSQL12 :: ProbSampleWithoutReplacement）。您可以在有用的链接中进一步阅读有关Oracle中类似算法的一些详细信息。

What is important for that post, that the number of distinct values in the join columns is used to calculate the join selectivity after filtering by another column. As we said before, a dimension table usually contains more distinct values than a fact table, because it is filled with the data for the next few years also, but we usually select only the relevant data, that is present in a fact table, i.e. those overpopulated distinct values should be somehow compensated and not concerned when we estimating the join over filtered dimension. That is what Overpopulated Primary Key model assumption about.

对于该帖子很重要的是，连接列中不同值的数量用于在被另一列过滤后计算连接选择性。如前所述，维表通常比事实表包含更多不同的值，因为维表也填充了未来几年的数据，但是我们通常只选择事实表中存在的相关数据，即当我们估计过滤后的维度的联接时，应该以某种方式补偿那些过度填充的不同值，而不用担心。这就是人口过多的主键模型假设。

Let’s see this in action. We will use AdventureWorksDW2012 database and issue two identical queries, the first one – uses the new CE framework (I also add TF 2363, to view some diagnostic output), the second query uses the old framework (because we run it with TF 9481 – that forces the old CE behavior).

让我们看看实际情况。我们将使用AdventureWorksDW2012数据库并发出两个相同的查询，第一个查询–使用新的CE框架（我还添加了TF 2363，以查看某些诊断输出），第二个查询使用旧的框架（因为我们使用TF 9481运行它–强制执行旧的CE行为）。

alter database AdventureWorksDW2012 set compatibility_level = 120;goset statistics xml onselect	sum(s.SalesAmount)from	dbo.FactInternetSales s	join dbo.DimDate d on d.DateKey = s.DueDateKeywhere	d.CalendarYear = 2008option(querytraceon 3604, querytraceon 2363)goselect	sum(s.SalesAmount)from	dbo.FactInternetSales s	join dbo.DimDate d on d.DateKey = s.DueDateKeywhere	d.CalendarYear = 2008option(querytraceon 9481)set statistics xml offgo

The actual number of rows for both of the queries equals 34 229, let’s look at the estimates.

这两个查询的实际行数等于34 229 ，让我们看一下估计值。

The estimation with the new CE framework is much closer to the reality (26 154 new vs. 10 047 old vs. 34 229 actual). This is because the join selectivity was scaled to compensate the key overpopulation. If you look at the diagnostic output, you’ll see the message that tells you, that this action was taken.

与新的CE架构的估计更接近于现实（26 154新主场迎战10岁047比34 229实际）。这是因为按比例选择了连接选择性以补偿关键的人口过剩。如果查看诊断输出，将看到一条消息，告诉您已采取了该操作。

模型变化 (The Model Variation)

As you may suppose in this case the model variation would be not to use the OPK assumption and do not adjust the selectivity. To do it, you should run the query with TF 9482.

您可能会在这种情况下假设模型变化是不使用OPK假设并且不调整选择性。为此，您应该使用TF 9482运行查询。

set statistics xml on--New - OPK disabledselect	sum(s.SalesAmount)from	dbo.FactInternetSales s	join dbo.DimDate d on d.DateKey = s.DueDateKeywhere	d.CalendarYear = 2008option(querytraceon 9482)go-- Oldselect	sum(s.SalesAmount)from	dbo.FactInternetSales s	join dbo.DimDate d on d.DateKey = s.DueDateKeywhere	d.CalendarYear = 2008option(querytraceon 9481)set statistics xml offgo

If we now examine the estimates we will see that they are very close, you also won’t see the “Scaling join selectivity…” output in the diagnostic console messages.

如果现在检查估算值，我们会发现它们非常接近，您也将不会在诊断控制台消息中看到“伸缩连接选择性…”输出。

Again, I should warn everybody. This is undocumented and should not be used in production.

同样，我应该警告大家。这是无证的，不应在生产中使用。

The final thing to mention, that there is nothing special about a fact and a dimension table for the OPK to be used. The decision to adjust the selectivity is made based on the difference between the distinct value counts. That means that you may observe this behavior, not only in DW databases, but in OLTP also, depending on your data distribution.

最后要提到的是，要使用的OPK的事实和维度表没有什么特别的。基于不同值计数之间的差异来做出调整选择性的决定。这意味着，不仅在DW数据库中，而且在OLTP中，您都可能会观察到此行为，具体取决于您的数据分布。

That’s all for this post. Next, we will talk about one of my favorite and really helpful changes in the new CE – the Ascending Key situation.

这就是这篇文章的全部内容。接下来，我们将讨论我在新的CE中最喜欢的且真正有用的更改之一-升序关键情况。

目录 (Table of contents)

Overpopulated Primary Key and CE Model Variation in SQL Server

SQL Server中人口过多的主键和CE模型的变化

参考资料 (References)

翻译自:

sql server 主键

转载地址：http://fyiwd.baihongyu.com/

你可能感兴趣的文章