There is one HLog per RegionServer. All edits for all Regions carried by a particular RegionServer are entered first in the HLog.
Each HRegion is identified by a unique long int
. HRegions do not need to declare themselves before using the HLog; they simply include their HRegion-id in the append
or completeCacheFlush
calls.
An HLog consists of multiple on-disk files, which have a chronological order. As data is flushed to other (better) on-disk structures, the log becomes obsolete. We can destroy all the log messages for a given HRegion-id up to the most-recent CACHEFLUSH message from that HRegion.
It's only practical to delete entire files. Thus, we delete an entire on-disk file F when all of the messages in F have a log-sequence-id that's older (smaller) than the most-recent CACHEFLUSH message for every HRegion that has a message in F.
Synchronized methods can never execute in parallel. However, between the start of a cache flush and the completion point, appends are allowed but log rolling is not. To prevent log rolling taking place during this period, a separate reentrant lock is used.
To read an HLog, call {@link #getReader(org.apache.hadoop.fs.FileSystem,org.apache.hadoop.fs.Path,org.apache.hadoop.conf.Configuration)}.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|