The FileLink is a sort of hardlink, that allows to access a file given a set of locations.
The Problem:
- HDFS doesn't have support for hardlinks, and this make impossible to referencing the same data blocks using different names.
- HBase store files in one location (e.g. table/region/family/) and when the file is not needed anymore (e.g. compaction, region deletetion, ...) moves it to an archive directory.
If we want to create a reference to a file, we need to remember that it can be in its original location or in the archive folder. The FileLink class tries to abstract this concept and given a set of locations it is able to switch between them making this operation transparent for the user. More concrete implementations of the FileLink are the {@link HFileLink} and the {@link HLogLink}.
Back-references: To help the {@link CleanerChore} to keep track of the links to a particular file,during the FileLink creation, a new file is placed inside a back-reference directory. There's one back-reference directory for each file that has links, and in the directory there's one file per link.
HFileLink Example
- /hbase/table/region-x/cf/file-k (Original File)
- /hbase/table-cloned/region-y/cf/file-k.region-x.table (HFileLink to the original file)
- /hbase/table-2nd-cloned/region-z/cf/file-k.region-x.table (HFileLink to the original file)
- /hbase/.archive/table/region-x/.links-file-k/region-y.table-cloned (Back-reference to the link in table-cloned)
- /hbase/.archive/table/region-x/.links-file-k/region-z.table-cloned (Back-reference to the link in table-2nd-cloned)