Logging

  • what:recording events as they occur.
  • why: need to use log to restore
  • how: combine with checkpoint, periodically checkpoint to save complete state. increment log to maintain updates from that state.
    • synchronous logging: spend so many time to recovery
    • asynchronous: log buffered in memory, reduce overhead, but leave log entries vulnerable to loss.
      • GDV, global dependency vectors
      • interval is how many messages I got since beginning
      • if someone vector is bigger than the original one, it means it is inconsistence
  • problem: we can use prune log to reduce playback time
    • it may resend messages that having negative effects on system. Use incarnation numbers to enable recipients to do the same.
    • expensive in times

checkpoint

  • what: used to restore state.
  • why:for global consistent copy.
  • how: set point, freezing the system,no write to maintain consistency. inbound messages will not frozen but write into buffer.
    • uncoordinated checkpoint:
      • periodically recorded its state
      • need recovery line to recover
    • coordinated checkpoints:
      • record message sequences: know who sent us message since last checkpoint
      • synchronized clocks, add timestamp in all messages and act as a sequence number;:
  • problem: freeze
  • incarnation number: restarted will +1, use this to distinguish whether I should accept this message.

Children
  1. LSM