<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:copyright="http://blogs.law.harvard.edu/tech/rss" xmlns:image="http://purl.org/rss/1.0/modules/image/">
    <channel>
        <title>SQL</title>
        <link>http://geekswithblogs.net/mucman/category/6405.aspx</link>
        <description>Sql queries and practices</description>
        <language>en-CA</language>
        <copyright>Scott Muc</copyright>
        <managingEditor>scottmuc@gmail.com</managingEditor>
        <generator>Subtext Version 0.0.0.0</generator>
        <item>
            <title>High Performance OLTP Queries Containing Aggregate Data</title>
            <link>http://geekswithblogs.net/mucman/archive/2007/06/05/113015.aspx</link>
            <description>&lt;p&gt; There are many situations where you'll have &lt;a href="http://www.w3schools.com/sql/sql_functions.asp"&gt;aggregate functions&lt;/a&gt; in your queries. Usually you want to avoid these aggregations in an &lt;acronym title="Online Transaction Processing"&gt;OLTP&lt;/acronym&gt; application in order to decrease resource consumption. As an example I'll be using the following simplified schema to show what the problem is and a solution that I often use to deal with this issue. &lt;/p&gt;
&lt;p&gt; What we have is a table containing articles and another table to track each time an article is read. The schema has been simplified to focus on the problem, but imagine that a table like &lt;span style="font-weight: bold;"&gt;NewsArticles&lt;/span&gt; having many more columns and relations with other tables. &lt;/p&gt;
&lt;!-- code formatted by http://manoli.net/csharpformat/ --&gt;
&lt;div class="csharpcode"&gt;
&lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;CREATE&lt;/span&gt; &lt;span class="kwrd"&gt;TABLE&lt;/span&gt; NewsArticles (&lt;/pre&gt;
&lt;pre&gt;    &lt;span class="kwrd"&gt;INT&lt;/span&gt; ArticleId &lt;span class="kwrd"&gt;IDENTITY&lt;/span&gt;(1, 1) &lt;span class="kwrd"&gt;NOT&lt;/span&gt; &lt;span class="kwrd"&gt;NULL&lt;/span&gt;,&lt;/pre&gt;
&lt;pre class="alt"&gt;    NTEXT ArticleText,&lt;/pre&gt;
&lt;pre&gt;    &lt;span class="kwrd"&gt;PRIMARY&lt;/span&gt; &lt;span class="kwrd"&gt;KEY&lt;/span&gt; (ArticleId)&lt;/pre&gt;
&lt;pre class="alt"&gt;)&lt;/pre&gt;
&lt;pre&gt;&lt;span class="kwrd"&gt;GO&lt;/span&gt;&lt;/pre&gt;
&lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;CREATE&lt;/span&gt; &lt;span class="kwrd"&gt;TABLE&lt;/span&gt; NewsArticleHits (&lt;/pre&gt;
&lt;pre&gt;    SHORTDATETIME ReadDate &lt;span class="kwrd"&gt;DEFAULT&lt;/span&gt; GETDATE(),&lt;/pre&gt;
&lt;pre class="alt"&gt;    &lt;span class="kwrd"&gt;INT&lt;/span&gt; ArticleId &lt;span class="kwrd"&gt;NOT&lt;/span&gt; &lt;span class="kwrd"&gt;NULL&lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;)&lt;/pre&gt;
&lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;GO&lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;span class="kwrd"&gt;CREATE&lt;/span&gt; &lt;span class="kwrd"&gt;CLUSTERED&lt;/span&gt; &lt;span class="kwrd"&gt;INDEX&lt;/span&gt; IDX_NewsArticleHits_ArticleId&lt;/pre&gt;
&lt;pre class="alt"&gt;    &lt;span class="kwrd"&gt;ON&lt;/span&gt; NewsArticleHits(ArticleId)&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt; So what is the problem? Let's say we have a web application where when an article is read a row is inserted into the &lt;span style="font-weight: bold;"&gt;NewsArticleHits&lt;/span&gt; table. For some reason we want to display how many times the article has been read on the front end. To do this we would have a query like the following: &lt;/p&gt;
&lt;!-- code formatted by http://manoli.net/csharpformat/ --&gt;
&lt;div class="csharpcode"&gt;
&lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;SELECT&lt;/span&gt; A.ArticleText, &lt;span class="kwrd"&gt;COUNT&lt;/span&gt;(H.TrackId) &lt;span class="kwrd"&gt;AS&lt;/span&gt; HitCount&lt;/pre&gt;
&lt;pre&gt;&lt;span class="kwrd"&gt;FROM&lt;/span&gt; NewsArticles A &lt;span class="kwrd"&gt;LEFT&lt;/span&gt; &lt;span class="kwrd"&gt;OUTER&lt;/span&gt; &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; NewsArticleHits H&lt;/pre&gt;
&lt;pre class="alt"&gt;        &lt;span class="kwrd"&gt;ON&lt;/span&gt; A.ArticleId = H.ArticleId&lt;/pre&gt;
&lt;pre&gt;&lt;span class="kwrd"&gt;WHERE&lt;/span&gt; A.ArticleId = @ArticleId &lt;span class="rem"&gt;-- Of course we use parameterized queries&lt;/span&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt; This query should reveal my decision for the &lt;a href="http://en.wikipedia.org/wiki/Index_(database)"&gt;CLUSTERED INDEX&lt;/a&gt;. There's also an off by one bug because it will return a count of 1 even if no hits have occurred. This setup will work fine for a very small system. Just guessing here, but this query should be fast enough for almost all but the most popular articles. Obviously &lt;a href="http://slashdot.org/"&gt;Slashdot&lt;/a&gt; wouldn't be able to use a setup like this to see a posts stats. &lt;/p&gt;
&lt;p&gt; There's another issue that's caused by the choice of the CLUSTERED INDEX. If you want to perform date range queries you're going to see full table scans on the table. &lt;/p&gt;
&lt;p&gt; What I've chosen to do is to replace the schema with the following: &lt;/p&gt;
&lt;!-- code formatted by http://manoli.net/csharpformat/ --&gt;
&lt;div class="csharpcode"&gt;
&lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;CREATE&lt;/span&gt; &lt;span class="kwrd"&gt;TABLE&lt;/span&gt; NewsArticles (&lt;/pre&gt;
&lt;pre&gt;    &lt;span class="kwrd"&gt;INT&lt;/span&gt; ArticleId &lt;span class="kwrd"&gt;IDENTITY&lt;/span&gt;(1, 1) &lt;span class="kwrd"&gt;NOT&lt;/span&gt; &lt;span class="kwrd"&gt;NULL&lt;/span&gt;,&lt;/pre&gt;
&lt;pre class="alt"&gt;    NTEXT ArticleText,&lt;/pre&gt;
&lt;pre&gt;    &lt;span class="kwrd"&gt;INT&lt;/span&gt; HitCount &lt;span class="kwrd"&gt;DEFAULT&lt;/span&gt; 0 &lt;span class="kwrd"&gt;NOT&lt;/span&gt; &lt;span class="kwrd"&gt;NULL&lt;/span&gt;,&lt;/pre&gt;
&lt;pre class="alt"&gt;    &lt;span class="kwrd"&gt;PRIMARY&lt;/span&gt; &lt;span class="kwrd"&gt;KEY&lt;/span&gt; (ArticleId)&lt;/pre&gt;
&lt;pre&gt;)&lt;/pre&gt;
&lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;GO&lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;span class="kwrd"&gt;CREATE&lt;/span&gt; &lt;span class="kwrd"&gt;TABLE&lt;/span&gt; NewsArticleHits (&lt;/pre&gt;
&lt;pre class="alt"&gt;    SHORTDATETIME ReadDate &lt;span class="kwrd"&gt;DEFAULT&lt;/span&gt; GETDATE(),&lt;/pre&gt;
&lt;pre&gt;    &lt;span class="kwrd"&gt;INT&lt;/span&gt; ArticleId &lt;span class="kwrd"&gt;NOT&lt;/span&gt; &lt;span class="kwrd"&gt;NULL&lt;/span&gt;&lt;/pre&gt;
&lt;pre class="alt"&gt;)&lt;/pre&gt;
&lt;pre&gt;&lt;span class="kwrd"&gt;GO&lt;/span&gt;&lt;/pre&gt;
&lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;CREATE&lt;/span&gt; &lt;span class="kwrd"&gt;CLUSTERED&lt;/span&gt; &lt;span class="kwrd"&gt;INDEX&lt;/span&gt; IDX_NewsArticleHits_ReadDate&lt;/pre&gt;
&lt;pre&gt;    &lt;span class="kwrd"&gt;ON&lt;/span&gt; NewsArticleHits(ReadDate)&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt; I've added a column to &lt;span style="font-weight: bold;"&gt;NewsArticles&lt;/span&gt; to hold the hit counter. This column will have to be computed and updated. I use &lt;a href="http://en.wikipedia.org/wiki/Microsoft_SQL_Server"&gt;SQL Server&lt;/a&gt; and don't plan on leaving that platform anytime soon so I decided to use &lt;a href="http://en.wikipedia.org/wiki/Database_trigger"&gt;TRIGGERs&lt;/a&gt; to keep that column up to date. I've also changed the CLUSTERED INDEX on the &lt;span style="font-weight: bold;"&gt;NewsArticleHits&lt;/span&gt; table to allow for date range queries. With the computed hit column there's no need to JOIN the &lt;span style="font-weight: bold;"&gt;NewsArticleHits&lt;/span&gt; table so I won't bother to put an INDEX on the &lt;span style="font-weight: bold;"&gt;ArticleId&lt;/span&gt; column. &lt;/p&gt;
&lt;p&gt; If you ever search for implementations of TRIGGERs used you may find some rather naive implementations: &lt;/p&gt;
&lt;!-- code formatted by http://manoli.net/csharpformat/ --&gt;
&lt;div class="csharpcode"&gt;
&lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;CREATE&lt;/span&gt; &lt;span class="kwrd"&gt;TRIGGER&lt;/span&gt; trig_Increment_NewsArticles_HitCount&lt;/pre&gt;
&lt;pre&gt;&lt;span class="kwrd"&gt;ON&lt;/span&gt; NewsArticleHits &lt;span class="kwrd"&gt;AFTER&lt;/span&gt; &lt;span class="kwrd"&gt;INSERT&lt;/span&gt;&lt;/pre&gt;
&lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;AS&lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;span class="kwrd"&gt;UPDATE&lt;/span&gt;     NewsArticles&lt;/pre&gt;
&lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;SET&lt;/span&gt; HitCount = HitCount + 1&lt;/pre&gt;
&lt;pre&gt;&lt;span class="kwrd"&gt;WHERE&lt;/span&gt; ArticleId = inserted.TrackId&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt; Unfortunately that TRIGGER won't be accurate if you happen want to do multiple INSERTs at once. This could easily be the case if the hit updates are performed in batches. Make sure you don't use the above TRIGGER and use something more like the following: &lt;/p&gt;
&lt;!-- code formatted by http://manoli.net/csharpformat/ --&gt;
&lt;div class="csharpcode"&gt;
&lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;CREATE&lt;/span&gt; &lt;span class="kwrd"&gt;TRIGGER&lt;/span&gt; trig_Increment_NewsArticles_HitCount&lt;/pre&gt;
&lt;pre&gt;&lt;span class="kwrd"&gt;ON&lt;/span&gt; NewsArticleHits &lt;span class="kwrd"&gt;AFTER&lt;/span&gt; &lt;span class="kwrd"&gt;INSERT&lt;/span&gt;&lt;/pre&gt;
&lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;AS&lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;span class="kwrd"&gt;UPDATE&lt;/span&gt;     NewsArticles&lt;/pre&gt;
&lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;SET&lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;    NewsArticles.HitCount = NewsArticles.HitCount + C.HitCount&lt;/pre&gt;
&lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;FROM&lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;    (&lt;span class="kwrd"&gt;SELECT&lt;/span&gt; ArticleId, &lt;span class="kwrd"&gt;COUNT&lt;/span&gt;(*) &lt;span class="kwrd"&gt;AS&lt;/span&gt; HitCount&lt;/pre&gt;
&lt;pre class="alt"&gt;    &lt;span class="kwrd"&gt;FROM&lt;/span&gt; inserted&lt;/pre&gt;
&lt;pre&gt;    &lt;span class="kwrd"&gt;GROUP&lt;/span&gt; &lt;span class="kwrd"&gt;BY&lt;/span&gt; ArticleId) &lt;span class="kwrd"&gt;AS&lt;/span&gt; C&lt;/pre&gt;
&lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;WHERE&lt;/span&gt;&lt;/pre&gt;
&lt;pre&gt;    &lt;span class="kwrd"&gt;WHERE&lt;/span&gt; ArticleId = C.ArticleId&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt; And there you go! So far I have this kind of implementation in a production database where the hit table has over 17 million rows. I'll leave the DELETE TRIGGER as an exercise for the reader. Well... mainly because our system won't be deleting rows from this database. You may have noticed that I did not make the &lt;span style="font-weight: bold;"&gt;ArticleId&lt;/span&gt; a &lt;a href="http://en.wikipedia.org/wiki/Foreign_key"&gt;FOREIGN KEY&lt;/a&gt; to the &lt;span style="font-weight: bold;"&gt;NewsArticle&lt;/span&gt; table. This is for a couple reasons: If the article gets deleted we don't necessarily want to lose the stats, and CASCADING DELETES on these kind of tables can be killer performance wise. &lt;/p&gt;
&lt;p&gt; I would love to hear other ways people have tackled this problem. Would you prefer to do this on the application level? &lt;/p&gt;&lt;p&gt;&lt;a href="http://www.pheedo.com/click.phdo?x=6cda6ad746d942b9a1110d0715a4fa12&amp;u=113015"&gt;&lt;img src="http://www.pheedo.com/img.phdo?x=6cda6ad746d942b9a1110d0715a4fa12&amp;u=113015" border="0"/&gt;&lt;/a&gt;&lt;/p&gt;&lt;iframe src="http://ads.geekswithblogs.net/a.aspx?ZoneID=5&amp;amp;Task=Get&amp;amp;PageID=31016&amp;amp;SiteID=1" width=1 height=1 Marginwidth=0 Marginheight=0 Hspace=0 Vspace=0 Frameborder=0 Scrolling=No&gt;
&lt;script language='javascript1.1' src="http://ads.geekswithblogs.net/a.aspx?ZoneID=5&amp;amp;Task=Get&amp;amp;Browser=NETSCAPE4&amp;amp;NoCache=True&amp;PageID=31016&amp;amp;SiteID=1"&gt;&lt;/script&gt;
&lt;noscript&gt;&lt;a href="http://ads.geekswithblogs.net/a.aspx?ZoneID=5&amp;amp;Task=Click&amp;amp;Mode=HTML&amp;amp;SiteID=1&amp;amp;PageID=31016" target="_blank"&gt;
&lt;img src="http://ads.geekswithblogs.net/a.aspx?ZoneID=5&amp;amp;Task=Get&amp;amp;Mode=HTML&amp;amp;SiteID=1&amp;amp;PageID=31016" width="1" height="1" border="0"  alt=""&gt;&lt;/a&gt;
&lt;/noscript&gt;
&lt;/iframe&gt;
&lt;img src="http://geekswithblogs.net/mucman/aggbug/113015.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Scott Muc</dc:creator>
            <guid>http://geekswithblogs.net/mucman/archive/2007/06/05/113015.aspx</guid>
            <pubDate>Wed, 06 Jun 2007 01:02:22 GMT</pubDate>
            <wfw:comment>http://geekswithblogs.net/mucman/comments/113015.aspx</wfw:comment>
            <comments>http://geekswithblogs.net/mucman/archive/2007/06/05/113015.aspx#feedback</comments>
            <wfw:commentRss>http://geekswithblogs.net/mucman/comments/commentRss/113015.aspx</wfw:commentRss>
            <trackback:ping>http://geekswithblogs.net/mucman/services/trackbacks/113015.aspx</trackback:ping>
        </item>
    </channel>
</rss>