This class performs a batch mode retrieval from a set of TREC queries.
Configuring
In the following, we list the main ways for configuring TRECQuerying, before exhaustively listing the properties that can affect TRECQuerying.
Topics
Files containing topics (queries to be evaluated) should be set using the
trec.topics property. Multiple topic files can be used together by separating their filenames using commas. By default TRECQuerying assumes TREC tagged topic files, e.g.:
<top> <num> Number 1 </num> <title> Query terms </title> <desc> Description : A setence about the information need </desc> <narr> Narrative: More sentences about what is relevant or not</narr> </top>
If you have a topic files in a different format, you can used a differed QuerySource by setting the property
trec.topics.parser. For instance
trec.topics.parser=SingleLineTRECQuery should be used for topics where one line is one query. See {@link org.terrier.structures.TRECQuery}and {@link org.terrier.structures.SingleLineTRECQuery} for more information.
Models
By default, Terrier uses the {@link InL2} retrieval model for all runs.If the
trec.model property is specified, then all runs will be made using that weighting model. You can change this by specifying another model using the property
trec.model. E.g., to use {@link org.terrier.matching.models.PL2}, set
trec.model=PL2. Similarly, when query expansion is enabled, the default query expansion model is {@link Bo1}, controlled by the property
trec.qe.model.
Result Files
The results from the system are output in a trec_eval compatable format. The filename of the results file is specified as the WEIGHTINGMODELNAME_cCVALUE.RUNNO.res, in the var/results folder. RUNNO is (usually) a constantly increasing number, as specified by a file in the results folder. The location of the results folder can be altered by the
trec.results property. If the property
trec.querycounter.type is not set to sequential, the RUNNO will be a string including the time and a randomly generated number. This is best to use when many instances of Terrier are writing to the same results folder, as the incrementing RUNNO method is not mult-process safe (eg one Terrier could delete it while another is reading it).
Properties
- trec.topics.parser - the query parser that parses the topic file(s). TRECQuery by default. Subclass the TRECQuery class and alter this property if your topics come in a very different format to those of TREC.
- trec.topics - the name of the topic file. Multiple topics files can be used, if separated by comma.
- trec.model the name of the weighting model to be used during retrieval. Default InL2
- trec.qe.model the name of the query expansino model to be used during query expansion. Default Bo1.
- c - the term frequency normalisation parameter value. A value specified at runtime as an API parameter (e.g. TrecTerrier -c) overrides this property.
- trec.matching the name of the matching model that is used for retrieval. Defaults to org.terrier.matching.taat.Full.
- trec.manager the name of the Manager that is used for retrieval. Defaults to Manager.
- trec.results the location of the results folder for results. Defaults to TERRIER_VAR/results/
- trec.results.file the exact result filename to be output. Defaults to an automatically generated filename - see trec.querycounter.type.
- trec.querycounter.type - how the number (RUNNO) at the end of a run file should be generated. Defaults to sequential, in which case RUNNO is a constantly increasing number. Otherwise it is a string including the time and a randomly generated number.
- trec.output.format.length - the very maximum number of results ever output per-query into the results file . Default value 1000. 0 means no limit.
- trec.iteration - the contents of the Iteration column in the trec_eval compatible results. Defaults to 0.
- trec.querying.dump.settings - controls whether the settings used to generate a results file should be dumped to a .settings file in conjunction with the .res file. Defaults to true.
- trec.querying.outputformat - controls class to write the results file. Defaults to TRECQuerying$TRECDocnoOutputFormat. Alternatives: TRECDocnoOutputFormat, TRECDocidOutputFormat, NullOutputFormat
- trec.querying.outputformat.docno.meta.key - for TRECDocnoOutputFormat, defines the MetaIndex key to use as the docno. Defaults to "docno".
- trec.querying.resultscache - controls cache to use for query caching. Defaults to TRECQuerying$NullQueryResultCache
@author Gianni Amati, Vassilis Plachouras, Ben He, Craig Macdonald, Nut Limsopatham