GeneMerge Documentation

Publications & Manuals

The following documents describe GeneMerge.


Online Documentation

This is an abbreviated version of the GeneMerge Manual for quick reference. Please see the full manual for detailed installation, usage and other information.

Run GeneMerge
Understanding the output

Make your own gene-association file


Input

GeneMerge uses 4 input files:
1. Study set gene file
2. Population set gene file
3. Gene-association file
4. Description file
The study set is comprised of genes that are currently under investigation. The population set is comprised of those genes from which the study set was drawn, often all detected genes in a given experiment. The gene-association file links gene names with a particular datum of information using a shorthand identifier (ID). Finally, the description file contains human-readable descriptions of gene-association IDs.

GeneMerge Execution

GeneMerge is called on the command-line as follows:
./GeneMerge1.4.pl  gene-association  description  population  study  output_name

Options

To specify a custom False Discovery Rate (FDR) use:

./GeneMerge1.4.pl  gene-association  description  population  study  output_name  FDR%



Understanding the output

Output is HTML and a tab-delimited text file that can be opened in a spreadsheet program like Excel either by cutting and pasting from a text editor or importing "as tab delimited." The output file lists each gene-association term found in the study set along with it's English description, frequency in the population set, frequency in the study set, and statistical enrichment score-- uncorrected and corrected. Below is a breakdown of each column header.

GMRG_Term GeneMerge term, for example a GO identifier "GO:0001234"
Pop_freq frequency of genes in the population with this term
Pop_frac fraction of genes in the population with this term (whole numbers)
Study_frac fraction of genes in the study set with this term (whole numbers)
P P-value for over-representation of this term in the study set
Bonf_Cor_P Bonferroni corrected P-value
FDR_10 if the P-value is accepted at a False Discovery Rate = 10% (True / False)
FDR_5 if the P-value is accepted at a False Discovery Rate = 5% (True / False)
FDR_1 if the P-value is accepted at a False Discovery Rate = 1% (True / False)
FDR_X_perc if the P-value is accepted at a False Discovery Rate = custom FDR% (True / False)
Description GeneMerge term's English description
Contributing_genes All the genes that are associated with this term in the study set


Here's an example of a GeneMerge output file:

GeneAssociation File Screenshot  . . .

GeneAssociation File Screenshot2


The output file also lists the total number of population and study genes, the total number of GeneMerge terms examined, and the number of genes that have terms associated with them. Genes that have no gene-association data associated with them are listed as well. The threshold P-value for each FDR level is also reported. Finally the number of population non-singletons, i.e., the number of terms that contribute to the Bonferroni correction is also given.


Further details on creating gene-association files, installing the program, and troubleshooting can be found in the GeneMerge Manual.



How to make your own Gene Association Files

Structured text files for use with GeneMerge are available for many species and can be found here. However, it's easy to make your own gene association files for use with GeneMerge. Just use any text editor to make two text files with the following formats:

Gene Association file
genename tab functionID;
genename tab functionID;
genename tab functionID;functionID;

Description file
functionID tab description_of_function
functionID tab description_of_function
functionID tab description_of_function

Here's an example of a Gene Association file for Drosophila melanogaster

GeneAssociation File Screenshot

The FBgn numbers are Flybase gene names and the GO:XXXXXXX terms are Gene Ontology IDs for specific functions. The white-space is a single tab. Each ID is followed by a semi-colon and if more than one ID is associated with a gene then these are separated by a semi-colon.


Here's an example of a Description file:

GeneAssociation File Screenshot

The ID terms here are Gene Ontology IDs for specific functions. The human-readable functional descriptions follow after a single tab. Note these lines do not have to end in semi-colons.

You can use a text editor and spreadsheet program to make these files. The following are typical steps you can follow to create gene-association and description files using Word and Excel on a Mac:

Description files can be made along the same lines, just skip step 5. If there are no IDs for your genomic data just make them up in Excel. A list of numbers works just fine, just make sure that each function/category gets a unique ID.


For detailed installation and usage information please see the full GeneMerge Manual.


Home | Download | Online | Documentation | Usage | Gene-Association Files