Index
index
In order to efficiently find the value for a particular key in the database, we need a different data structure: an index
A table index is a replica of a subset of a table's columns that are organized and/or sorted for efficient access using a subset of those column
B+ tree
- b+ tree:
- self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in O(log n).
- m-way search tree: every leaf node is at the same depth
- every node with m/2+1 <= k <= m keys, k keys, k+1 child
- |link(< k )|key1|link( < k2 )|key2|link3( >= k2)|...
- node is an array of key/value pairs, key is column, value differ based on whether the node is classified as inner nodes or leaf nodes.
- only stores values in leaf nodes. Inner nodes only guide the search process.
- B+TREE DESIGN CHOICES
- node size
- Merge Threshold : maybe delay merge operation
- Variable Length Keys :
- Non-Unique Indexes :
- Intra-Node Search: binary search
- optimization:
- prefix compression: store prefix and only store defferent substring, leaf node
- suffix truncation: key very different and you only need suffix to see which key to go left or right, inner node
- bulk insert
- POINTER SWIZZLING
other index
- skip list
- dynamic order-preserving index use a sorted linked list
- Multiple levels of linked lists with extra pointers that skip over intermediate nodes.
- To insert a new key, flip a coin to decide how many levels to add the new key into. Provides approximate O(log n) search times.
- First logically remove a key from the index by setting a flag to tell threads to ignore.
- Then physically remove the key once we know that no other thread is holding the reference.
- pros
- less space
- no rebalance
- cons
- not disk friendly, reverse search not good
- radix tree
- Represent keys as individual digits. This allows threads to examine prefixes one-by-one instead of comparing entire key.
- 可能的问题就是一些不好表示,比如signed int,float
- inverted index
- full text search index