Term Vector support is an optional on a field by field basis. It consists of 3 files.
The Document Index or .tvx file.
For each document, this stores the offset into the document data (.tvd) and field data (.tvf) files.
DocumentIndex (.tvx) --> Header,<DocumentPosition,FieldPosition> NumDocs
The Document or .tvd file.
This contains, for each document, the number of fields, a list of the fields with term vector info and finally a list of pointers to the field information in the .tvf (Term Vector Fields) file.
The .tvd file is used to map out the fields that have term vectors stored and where the field information is in the .tvf file.
Document (.tvd) --> Header,<NumFields, FieldNums, FieldPositions> NumDocs
The Field or .tvf file.
This file contains, for each field that has a term vector stored, a list of the terms, their frequencies and, optionally, position, offset, and payload information.
Field (.tvf) --> Header,<NumTerms, Flags, TermFreqs> NumFields
Notes:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|