LSM
LSM
http://www.benstopford.com/2015/02/14/log-structured-merge-trees/
- idea: collect and batch updates in memory. 放满buffer再写入。删除的时候先放到tombstone里,再删除。
- merge 的想法:使用bloom filter在query的时候可以知道这个文件会不会在里面。在merge update可以prune tree。在merge的时候删tombstone。
- LSMT,memory里的数据结构是C0,disk里的是C1
- fill memory
- spill to disk. use SSTables(连续key value pair) and SSIndexes(key index pair) support random access
- Index and bloom filter in memory
- merge perform comaction
- write-ahead logs to aid recovery
- C0 is smaller and entirely resident in memory, whereas C1 is resident on disk. New records are inserted into the memory-resident C0 component. If the insertion causes the C0 component to exceed a certain size threshold, a contiguous segment of entries is removed from C0 and merged into C1 on disk.
http://distributeddatastore.blogspot.com/2013/08/cassandra-sstable-storage-format.html)
Bloom Filter
- what: one bit per hash function consumed per key. ignore collision. use bit to reprensent exist or not.
- why: secondary storage is slow to search and access
- whow: use k hash function for the item and get k array, set these all in 1 and when we query this item, we will check all k array result