sql server - Using a meaningless ID as my clustered index rather than my primary key -

- March 15, 2014

i'm working in sql server 2008 r2

as part of complete schema rebuild, creating table used store advertising campaign performance zipcode day. table setup i'm thinking of this:

create table [dbo].[zip_perf_by_day] ( [campaignid] int not null, [zipcode] int not null, [reportdate] date not null, [performancemetric1] int not null, [performancemetric2] int not null, [performancemetric3] int not null, , on... )

now combination of campaignid, zipcode, , reportdate perfect natural key, uniquely identify single entity, , there shouldn't 2 records same combination of values. also, of queries table going filtered on 1 or more of these 3 columns. however, when thinking clustered index table, run problem. these 3 columns not increment on time. reportdate ok, campaignid , zipcode going on place while inserting rows. can't order them ahead of time because results come in different sources during day, data campaignid 50000 might inserted @ 10am, , campaignid 30000 might come in @ 2pm. if use pk clustered index, i'm going run fragmentation problems.

so thinking need identity id column, let's call performanceid. can see no case ever use performanceid in either select list or clause of query. should use performanceid pk , clustered index, , set unique constraint , non-clustered indexes on campaignid, zipcode, , reportdate? should keep 3 columns pk , have clustered index on performanceid? (<- option i'm leaning towards right now) ok have fragmented table? there option haven't considered? looking give me best read performance, while not destroying write performance.

some actual usage information. table written in batches. feeds come in @ various times during day, processed, , table gets written to. it's going heavily read, by-day performance important around here. when fill table, should have 5 million rows, , grow @ pace of 8,000 - 10,000 rows per day.

in experience, want use int identity field clustered index key. add unique constraint 1 (it helps execution plans).

a big part of reason space - if use 3 field key clustered index, have 3 fields in every row of every non-clustered index on table (as clustered index row identifier). if plan have couple of indexes isn't big deal, if have lot of them can make big difference. more data per row, more pages needed , more io have.

fragmentation real issue can cause major performance problems, table grows.

having additional cluster key mean writes faster inserts. new rows go end of table, means existing rows won't touched or rearranged.

if want use 3 fields fk in other tables, means have them pk.

for part doesn't matter if ever directly reference clustered index key. long narrow, increasing, , unique should in shape.

edit:

as damien points out in comments, if filtering on single fields of pk, need have index on each 1 (or use first field in covering index).

Search This Blog

Score

sql server - Using a meaningless ID as my clustered index rather than my primary key -

Comments

Post a Comment

Popular posts from this blog

php - What is the difference between $_SERVER['PATH_INFO'] and $_SERVER['ORIG_PATH_INFO']? -

fortran - Function return type mismatch -

queue - mq_receive: message too long -