algorithm - Good books/articles about spatial indexes -

- August 15, 2015

i interested in literature spatial indexes. 1 in use, comparisons between them in speed, space requirements, spatial queries performance when using them etc.

i used use kind of home-grown quadtree spatial indexing (well before learned word "quadtree"). ordinary kinds of spatial data (i deal street map data), fast create , fast query, scan many leaf nodes during queries. specifically, reasonable node sizes (50-100), quadtree tended produce around 300 results point query, i.e. 3-6 leaf nodes apply (very rough ballpark; results highly variable.)

nowadays, preferred data structure the r*tree. wrote , tested implementation myself obtained results. code building r*tree slow compared quadtree code, bounding boxes on leaf nodes end organized; @ least half of query space answered 1 leaf node (i.e. if random point query, there chance single leaf node returned), , 90% of space covered 2 nodes or less. node size of 80 elements, i'd typically 80 or 160 results point query, average closer 160 (since few queries return 3-5 nodes). holds true in dense urban areas of map.

i know because wrote visualizer r* tree , graphical objects inside it, , tested on large dataset (600,000 road segments). performs better on point data (and other data in bounding boxes overlap). if implement r* tree urge visualize results, because when wrote mine had multiple bugs lowered efficiency of tree (without affecting correctness), , able tweak of decision-making better results. sure test on large dataset, reveal problems small dataset not. may decrease fan-out (node size) of tree testing, see how tree works when several levels deep.

i'd happy give source code except need employer's permission. know how is. in implementation support forced reinsertion, picksplit , insertion penalty have been tweaked.

the original paper, the r* tree: efficient , robust access method points , rectangles, missing dots reason (no periods , no dots on "i"s). also, terminology bit weird, e.g. when "margin", mean "perimeter".

the r* tree choice if need data structure can modified. if don't need modify tree after first create it, consider bulk loading algorithms. if need modify tree small amount after bulk loading, ordinary r-tree algorithms enough. note r*-tree , r-tree data structurally identical; algorithms insertion (and maybe deletion? forget) different. r-tree original data structure 1984; here's link r-tree paper.

the kd-tree looks efficient , not difficult implement, can used point data.

by way, reason focus on leaf nodes that

i need deal disk-based spatial indexes. can cache inner nodes in memory because tiny fraction of index size; therefore time takes scan them tiny compared time required leaf node not cached.
i save lot of space not storing bounding boxes elements in spatial index, means have test original geometry of each element answer query. it's more important minimize number of leaf nodes touched.

Search This Blog

Score

algorithm - Good books/articles about spatial indexes -

Comments

Post a Comment

Popular posts from this blog

how to build hyperlink for query string in php -

php - What is the difference between $_SERVER['PATH_INFO'] and $_SERVER['ORIG_PATH_INFO']? -

queue - mq_receive: message too long -