How does Flink manage and recover state?

Flink offers two methods for managing and restoring state: Checkpoint and Savepoint.

  1. Checkpoint is a mechanism in Flink used to persist the state of a job. When a Checkpoint is executed, Flink saves the job’s state to a persistent storage so that the job can be recovered in case of failure. By configuring the interval and method of persisting Checkpoints, different levels of state protection can be achieved.
  2. Savepoint is a manual triggered mechanism that takes a snapshot of the state of a job and saves it to persistent storage. This allows for more flexible state management and job version control, as Savepoint can be created while the job is running or when it is stopped.

Generally speaking, a Checkpoint is an automated state management mechanism, while a Savepoint is a manually triggered state snapshot mechanism. Using both together allows for comprehensive management and recovery of job states.

bannerAds