Index

index

In order to efficiently find the value for a particular key in the database, we need a different data structure: an index

A table index is a replica of a subset of a table's columns that are organized and/or sorted for efficient access using a subset of those column

B+ tree

  • b+ tree:
    • self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in O(log n).
    • m-way search tree: every leaf node is at the same depth
    • every node with m/2+1 <= k <= m keys, k keys, k+1 child
    • |link(< k )|key1|link( < k2 )|key2|link3( >= k2)|...
    • node is an array of key/value pairs, key is column, value differ based on whether the node is classified as inner nodes or leaf nodes.
    • only stores values in leaf nodes. Inner nodes only guide the search process.
  • B+TREE DESIGN CHOICES
    • node size
    • Merge Threshold : maybe delay merge operation
    • Variable Length Keys :
    • Non-Unique Indexes :
    • Intra-Node Search: binary search
  • optimization:
    • prefix compression: store prefix and only store defferent substring, leaf node
    • suffix truncation: key very different and you only need suffix to see which key to go left or right, inner node
    • bulk insert
    • POINTER SWIZZLING

other index

  • skip list
    • dynamic order-preserving index use a sorted linked list
    • Multiple levels of linked lists with extra pointers that skip over intermediate nodes.
    • To insert a new key, flip a coin to decide how many levels to add the new key into. Provides approximate O(log n) search times.
    • First logically remove a key from the index by setting a flag to tell threads to ignore.
    • Then physically remove the key once we know that no other thread is holding the reference.
    • pros
      • less space
      • no rebalance
    • cons
      • not disk friendly, reverse search not good
  • radix tree
    • Represent keys as individual digits. This allows threads to examine prefixes one-by-one instead of comparing entire key.
    • 可能的问题就是一些不好表示,比如signed int,float
  • inverted index
    • full text search index