Welcome to dbFreaks.com!
FAQFAQ    SearchSearch      ProfileProfile    Private MessagesPrivate Messages   Log inLog in

Non-clustered index design rules...

 
   Database Help (Home) -> Programming RSS
Next:  problems w/ external files  
Author Message
fromeo

External


Since: Aug 17, 2010
Posts: 2



(Msg. 1) Posted: Tue Aug 17, 2010 1:38 am
Post subject: Non-clustered index design rules...
Archived from groups: microsoft>public>sqlserver>programming (more info?)

Hi,

Are there any general rules of thumb when it comes to designing non-
clustered indexes?
For instance, given an example table like the following:

CREATE TABLE dbo.table1
(
table1_id INT IDENTITY(1,1) NOT NULL,
customer_id INT NOT NULL,
file_date DATETIME NOT NULL,
customer_name VARCHAR(30) NOT NULL,
industry_code CHAR(3) NOT NULL,
country VARCHAR(30) NOT NULL
-- Other fields etc...
);

ALTER TABLE dbo.table1
ADD CONSTRAINT pk_table1 PRIMARY KEY CLUSTERED (table1_id);

And the following query between the table above and another table:

SELECT t1.customer_name, t1.industry_code, t1.country,
t2.account_code, t2.product
FROM table1 AS t1
INNER
JOIN table2 AS t2
ON t1.customer_id = t2.customer_id
AND t1.file_date = t2.file_date
WHERE industry_code IN ('MAN', 'TEL', 'BAN');

What is the best way to create a non-clustered index on table1?
Is it best to include the JOIN conditions in the index definition and
the WHERE and SELECTed elements in the INCLUDE statement like the
following:

CREATE INDEX NONCLUSTERED nci_table1
ON dbo.table1 (customer_id, file_date) INCLUDE (customer_name,
industry_code, country);

Or better to create the index like this:

CREATE INDEX NONCLUSTERED nci_table1
ON dbo.table1 (customer_id, file_date, customer_name, industry_code,
country);

Basically, what are the design factors to take into consideration when
creating a non-clustered index across multiple columns when those
columns are used in SELECTs, WHEREs and JOINs?

Many thanks,
F.

 >> Stay informed about: Non-clustered index design rules... 
Back to top
Login to vote
Mohit K. Gupta

External


Since: Jun 13, 2008
Posts: 41



(Msg. 2) Posted: Tue Aug 17, 2010 9:23 am
Post subject: RE: Non-clustered index design rules... [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Your first index is better choice then 2nd for few reasons.

1) First index will be smaller in size and quicker to search.
2) When indexes you want to use columns that show up in where clause or join
predicate not select statements. In SQL Server 2000 you had to but with SQL
2005 and include predicate it is no longer needed.

What is the difference between first Index and second Index, is how the data
is stored with in SQL Server.

With first index you get index size of 16 bytes/entry and at the leaf node
of the index the final information is stored. Where as your second index is
going to be 75 bytes/entry; with no additional storage on leaf.

If you create both these indexes check them out in the
sys.dm_db_index_physical_stats for the difference in size.

SELECT DB_NAME(database_id) AS DatabaseName,
OBJECT_ID,
index_id,
avg_fragmentation_in_percent,
page_count
FROM sys.dm_db_index_physical_stats(DB_ID(),NULL,NULL,NULL,NULL)
WHERE index_id > 0
AND OBJECT_ID = Object_id('table_1')
GO

For small indexes you probably wouldn't notice difference but for large
indexes you should notice a considerable size difference.

--
Mohit K. Gupta
B.Sc. CS, Minor Japanese
MCITP: Database Administrator
MCTS: SQL Server 2005
http://sqllearnings.blogspot.com/


"fromeo" wrote:

> Hi,
>
> Are there any general rules of thumb when it comes to designing non-
> clustered indexes?
> For instance, given an example table like the following:
>
> CREATE TABLE dbo.table1
> (
> table1_id INT IDENTITY(1,1) NOT NULL,
> customer_id INT NOT NULL,
> file_date DATETIME NOT NULL,
> customer_name VARCHAR(30) NOT NULL,
> industry_code CHAR(3) NOT NULL,
> country VARCHAR(30) NOT NULL
> -- Other fields etc...
> );
>
> ALTER TABLE dbo.table1
> ADD CONSTRAINT pk_table1 PRIMARY KEY CLUSTERED (table1_id);
>
> And the following query between the table above and another table:
>
> SELECT t1.customer_name, t1.industry_code, t1.country,
> t2.account_code, t2.product
> FROM table1 AS t1
> INNER
> JOIN table2 AS t2
> ON t1.customer_id = t2.customer_id
> AND t1.file_date = t2.file_date
> WHERE industry_code IN ('MAN', 'TEL', 'BAN');
>
> What is the best way to create a non-clustered index on table1?
> Is it best to include the JOIN conditions in the index definition and
> the WHERE and SELECTed elements in the INCLUDE statement like the
> following:
>
> CREATE INDEX NONCLUSTERED nci_table1
> ON dbo.table1 (customer_id, file_date) INCLUDE (customer_name,
> industry_code, country);
>
> Or better to create the index like this:
>
> CREATE INDEX NONCLUSTERED nci_table1
> ON dbo.table1 (customer_id, file_date, customer_name, industry_code,
> country);
>
> Basically, what are the design factors to take into consideration when
> creating a non-clustered index across multiple columns when those
> columns are used in SELECTs, WHEREs and JOINs?
>
> Many thanks,
> F.
> .
>

 >> Stay informed about: Non-clustered index design rules... 
Back to top
Login to vote
John Bell

External


Since: Jan 11, 2008
Posts: 157



(Msg. 3) Posted: Tue Aug 17, 2010 2:25 pm
Post subject: Re: Non-clustered index design rules... [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Tue, 17 Aug 2010 01:38:13 -0700 (PDT), fromeo
wrote:

>Hi,
>
>Are there any general rules of thumb when it comes to designing non-
>clustered indexes?
>For instance, given an example table like the following:
>
>CREATE TABLE dbo.table1
>(
> table1_id INT IDENTITY(1,1) NOT NULL,
> customer_id INT NOT NULL,
> file_date DATETIME NOT NULL,
> customer_name VARCHAR(30) NOT NULL,
> industry_code CHAR(3) NOT NULL,
> country VARCHAR(30) NOT NULL
> -- Other fields etc...
>);
>
>ALTER TABLE dbo.table1
>ADD CONSTRAINT pk_table1 PRIMARY KEY CLUSTERED (table1_id);
>
>And the following query between the table above and another table:
>
>SELECT t1.customer_name, t1.industry_code, t1.country,
>t2.account_code, t2.product
>FROM table1 AS t1
>INNER
>JOIN table2 AS t2
>ON t1.customer_id = t2.customer_id
>AND t1.file_date = t2.file_date
>WHERE industry_code IN ('MAN', 'TEL', 'BAN');
>
>What is the best way to create a non-clustered index on table1?
>Is it best to include the JOIN conditions in the index definition and
>the WHERE and SELECTed elements in the INCLUDE statement like the
>following:
>
>CREATE INDEX NONCLUSTERED nci_table1
>ON dbo.table1 (customer_id, file_date) INCLUDE (customer_name,
>industry_code, country);
>
>Or better to create the index like this:
>
>CREATE INDEX NONCLUSTERED nci_table1
>ON dbo.table1 (customer_id, file_date, customer_name, industry_code,
>country);
>
>Basically, what are the design factors to take into consideration when
>creating a non-clustered index across multiple columns when those
>columns are used in SELECTs, WHEREs and JOINs?
>
>Many thanks,
>F.

You are assuming that you need an index in the first place. You may
want to read http://msdn.microsoft.com/en-us/library/ms191195.aspx
Tuning for a specific query is not always a good thing because index
maintenance could affect performance elsewhere, also you may have a
situation where a second query could use the index if it was slightly
different. So waiting to see what the index tuning wizard or missing
index dmvs after the table has been thoroughly excerised is a valid
emperical option.

John
 >> Stay informed about: Non-clustered index design rules... 
Back to top
Login to vote
Erland Sommarskog2

External


Since: May 30, 2004
Posts: 2061



(Msg. 4) Posted: Tue Aug 17, 2010 4:25 pm
Post subject: Re: Non-clustered index design rules... [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

fromeo ( ) writes:
> And the following query between the table above and another table:
>
> SELECT t1.customer_name, t1.industry_code, t1.country,
> t2.account_code, t2.product
> FROM table1 AS t1
> INNER
> JOIN table2 AS t2
> ON t1.customer_id = t2.customer_id
> AND t1.file_date = t2.file_date
> WHERE industry_code IN ('MAN', 'TEL', 'BAN');
>
> What is the best way to create a non-clustered index on table1?
> Is it best to include the JOIN conditions in the index definition and
> the WHERE and SELECTed elements in the INCLUDE statement like the
> following:
>
> CREATE INDEX NONCLUSTERED nci_table1
> ON dbo.table1 (customer_id, file_date) INCLUDE (customer_name,
> industry_code, country);
>
> Or better to create the index like this:
>
> CREATE INDEX NONCLUSTERED nci_table1
> ON dbo.table1 (customer_id, file_date, customer_name, industry_code,
> country);

As always when it comes to performance the answer is: it depends.

Of these two indexes, the second is somewhat smaller, since not all
columns are in the upper nodes of the index.

I guess the idea is to make a covering index, but if someone changes
the query, the index is no longer covering. And may be it's OK anyway.

And if table2 is a big big table, and table1 is only a big table,
an index on industry_code may be what you need.

It's difficult to give any fixed rules for indexing, because any such
rules are likely to be misunderstood. And moreover, rules are bad if
they prevent you from thinking and analysing and that is what you
always need when it comes to indexing. That, and a good deal of
gut feeling.

--
Erland Sommarskog, SQL Server MVP, esquel DeleteThis @sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
 >> Stay informed about: Non-clustered index design rules... 
Back to top
Login to vote
fromeo

External


Since: Aug 17, 2010
Posts: 2



(Msg. 5) Posted: Wed Aug 18, 2010 12:34 am
Post subject: Re: Non-clustered index design rules... [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Hi,

Thanks everybody for their comments. Very helpful.

Cheers,
F.
 >> Stay informed about: Non-clustered index design rules... 
Back to top
Login to vote
Cleary

External


Since: Aug 19, 2010
Posts: 1



(Msg. 6) Posted: Thu Aug 19, 2010 4:42 pm
Post subject: Re: Non-clustered index design rules... [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

If I were building indexes to optimize the above query in a reasonable
warehouse, I would've put a compound index on the natural join
condition
CREATE INDEX NONCLUSTERED nci_table1ON dbo.table1 (customer_id,
file_date)

and a separate index on the fields queried in the WHERE clause:
CREATE INDEX NONCLUSTERED nci_table2 ON dbo.table1 (industry_code)

obviously neither of these is a spanning index, but unless I was
trying to really optimize for a particular query, I would use these as
general purpose indexes.


However, I have a question of my own. Suppose you have a table that
is essentially a growing LOG table
CREATE TABLE log_table
(
category varchar(100) not null,
log_message varchar(1000) not null,
log_date datetime not null default getdate()
)


The category field is a small fixed list of maybe 100 categories. The
log table is growing by hundreds or thousands of messages per day.
Occasionally we may need to scan the table, but 99% of the time, we
want to select the most recent record for each category. The code I
have seen in production does something like:

SELECT * from log_table
INNER JOIN (
select category, max(log_date) as max_date from log_table group
by category
) max_date_inner_query
ON log_table.category = max_date_inner_query.category
AND log_table.log_date = max_date_inner_query.max_date

While I am not a fan of this style of query, it does work correctly.

My question is, what is the ideal index to put on this table to fully
optimize the 'gimme the most recent log_message per category' query?

In particular, would you put a compound index on
CREATE INDEX NONCLUSTERED idx1_logtable ON log_table (category_id,
log_date) INCLUDE (log_message)

OR

CREATE INDEX NONCLUSTERED idx1_logtable ON log_table (log_date,
category_id) INCLUDE (log_message)

The reason I ask is, I had always read you should put the most
selective value first in a compound index. Log_date is more unique
than category, but it just seems counter intuitive to me.
 >> Stay informed about: Non-clustered index design rules... 
Back to top
Login to vote
Gert-Jan Strik

External


Since: Feb 19, 2010
Posts: 6



(Msg. 7) Posted: Fri Aug 20, 2010 6:25 am
Post subject: Re: Non-clustered index design rules... [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Cleary wrote:
> However, I have a question of my own. Suppose you have a table that
> is essentially a growing LOG table
> CREATE TABLE log_table
> (
> category varchar(100) not null,
> log_message varchar(1000) not null,
> log_date datetime not null default getdate()
> )
>
> The category field is a small fixed list of maybe 100 categories. The
> log table is growing by hundreds or thousands of messages per day.
> Occasionally we may need to scan the table, but 99% of the time, we
> want to select the most recent record for each category.

Cleary, it is usually best to start a new thread when you want to ask a
different question.

This is a case where the number of groups is very low compared to the
total amount of rows. I am going to write an article on this, but here
it goes for you.

You ask about the best possible index for this, and I give you a table
(in combination with an index). In situations like this, when (on
average) each group has several pages of data, what you want is to have
a separate table that list all these groups. For example

CREATE TABLE log_table_categories (category varchar(100) not null)
-- Add primary key constraint if you want to enforce data integrity:
-- ALTER TABLE log_table_categories ADD PRIMARY KEY CLUSTERED
(category)

This allows the optimizer to scan this table, and seek the "MAX" row for
each category. For that to be effective, you need the following index:

CREATE INDEX IX_logtable_category_logdate ON log_table
(category,log_date)

You could then rewrite your SELECT statement to this:

SELECT category, max_date, log_message
FROM (
SELECT LT.category, log_date as max_date, log_message
, ROW_NUMBER() OVER (PARTITION BY LT.category ORDER BY log_date
DESC) AS rn
FROM log_table_categories C
JOIN log_table LT
ON LT.category = C.category
) DT
WHERE rn = 1

Or (if you are on an older version of SQL Server) to this:

SELECT LT1.category, max_date, log_message
FROM log_table LT1
INNER JOIN (
SELECT C.category, MAX(log_date) as max_date
FROM log_table_categories C
JOIN log_table LT2
ON LT2.category = C.category
GROUP BY C.category
) M
ON LT1.category = M.category
AND LT1.log_date = M.max_date

Now your first proposed index (that includes the log_message) would be
even better for these queries, because they cover the query. But it is a
tradeoff, because then that index would be just as big as the table
itself. In other words, it would cost a lot of disk space.

> The reason I ask is, I had always read you should put the most
> selective value first in a compound index. Log_date is more unique
> than category, but it just seems counter intuitive to me.

Selectivity is the second most important consideration. The first is
whether you do exact matches or range scans. You should have the column
on which exact matches are done as the leading column(s), then follow
with the column(s) on which you do range scans. For them, mention the
most selective column first, because for range scans, only the first of
these range columns can be used efficiently.
--
Gert-Jan
 >> Stay informed about: Non-clustered index design rules... 
Back to top
Login to vote
Display posts from previous:   
Related Topics:
The Difference between a Clustered and Non-Clustered Index - Hey, everyone. I have had this question 3 times in the last 4 interviews. I am not a DBA and try not to do a whole lot of Database Administration/ Architecture. I do feel, however, that this question may be one of the ones costing me job offers. Can....

How to remove clustered Index - We have a table(Employees) with a unique clustered index on EmpID column. We also have many other tables which are refering(foreign key) to EmpID column on Employees table. Now we want to remove the clustered index from the Employees table EmpID colum...

What are the advantages of clustered Index - When we do select * from table it always return rows sorted by the column which has clustered Index. What are the other advantages of having clustered Index. Does it increase performance. Do we need to have one or not. Please advise. Thanks, Rob

add a clustered index or not? - I have a highly volatile table that gets data inserted into it constantly. it currently does not have a clustered index on it because of this. a colleague suggested adding a clustered index on the primary key to improve performance. i'm not sure ho...

why should there be only one clustered index in a table? - why should there be only one clustered index in a table?
   Database Help (Home) -> Programming All times are: Pacific Time (US & Canada)
Page 1 of 1

 
You can post new topics in this forum
You can reply to topics in this forum
You can edit your posts in this forum
You can delete your posts in this forum
You can vote in polls in this forum



[ Contact us | Terms of Service/Privacy Policy ]