loadOntology
are made available in the ontologies
field of OBOES
. After loading one or more ontologies, invoke the loadAnnotation
method, which reads in annotation data from a specified file. Using these annotation data, accession-to-term mappings are constructed which are used when the functions getSimpleEnrichment
, getUnderRepresentedTerms
, and getCompoundEnrichment
are called. See {@link Enrichment}, {@link SimpleEnrichment}, {@link CompoundEnrichment} forinformation on working with the output from those methods.
OBOES
maintains a "minimum count" and "maximum p-value" as cut-offs to decide when a Term or Terms is significantly enriched or not. In order for a Term or Terms to be significantly enriched, the number of accessions annotated with that Term or the entire set of Terms must be at least the "minimum count." In addition, the statistically computed p-value must be at most the "maximum p-value." The "minimum count" defaults to 2; the "maximum p-value" defaults to 0.01, which is usually a standard choice to balance between Type I and Type II errors. The "minimum count" and "maximum p-value" can be manually set and retrieved using OBOES
methods. Any invocation of getSimpleEnrichment
and getCompoundEnrichment
will only consider an enrichment to be significant if it satisfies the "maximum p-value" and "minimum count" values that are currently set in the OBOES
object.
OBOES
provides additional methods that influence the statistics involved in searching for enrichment. For example, OBOES
defaults the total sample space size to the total number of unique accessions loaded through loadAnnotation
(or loadBackgroundAnnotation
, if invoked; see {@link loadBackgroundAnnotation}). The total sample space is understood to refer to the entire set of possible accessions from which the accessions in a given input sample are selected. For example, the total sample space for a microarray experiment would be the whole genome of the organism, and the sample space size is then equal to the number of genes for the organism. OBOES
also provides a variety of multiple hypothesis correction schemes, which are ways to keep the overall likelihood of encountering a false positive low, in the face of testing many hypotheses. Multiple hypothesis correction is automatically employed by getSimpleEnrichment
, since it tests all possible enriched Terms in a single invocation. They are not employed by getCompoundEnrichment
, since it tests only one hypothesis in an invocation. For this reason, OBOES
features an interface for its multiple hypothesis correction schemes (see the applyMultipleHypothesisCorrection
method) that allows you to apply various correction schemes to the results you obtain at your own discretion.
An example of basic usage of OBOES
in MATLAB, with loading of the Gene Ontology (GO), might occur as follows:
myOBOES = OBOES();
myOBOES.loadOntology('C:\data\gene_ontology.obo', 'gene ontology');
myOBOES.loadAnnotation('C:\data\organismal_gene_annotation.txt', 'gene ontology');
enrichment = myOBOES.getSimpleEnrichment(exp12_genes, 'gene ontology');
To analyze the results, you could then write statements such as:
enrichment(1).term.name
enrichment(1).p_value
and so forth. The code above assumes that the file "organismal_gene_annotation.txt" contains a mapping of gene accessions to GO terms, and that the variable exp12_genes
is an array of the accessions of the input sample (presumeably, by the name of the variable, from the output of an experiment). The OBO for GO can be found online at http://www.geneontology.org.
@see Enrichment
@see SimpleEnrichment
@see CompoundEnrichment
@see OBO_Object
|
|
|
|