C4. Nomenclature in the 21st Century: Sequence and the 'Gene'

Lois J. Maltais, Judith A. Blake, Muriel T. Davisson, Janan T. Eppig, and the Mouse Genome Informatics Staff
The Jackson Laboratory, Bar Harbor, ME USA

The flood of sequence information has caused us to rethink how we define 'gene' and to reassess how we name genes. Assigning names and symbols to genes has been based primarily on functional characterization or phenotypic expression. Genes with similar function and phenotype are assigned to 'families' (in contrast to evolutionarily defined gene families) and given a family root symbol and name. Increasingly, genes are found to have multiple functions and, therefore, could be classified as members of different families. We rely on community involvement to revise and classify these genes. Today, with the rapid identification of uncharacterized sequences, genes are grouped into families according to their sequence similarity or structure, such as domains and motifs, and given a reasonable name that provides some value to the sequence.

As we continue our rapid sequencing efforts during this era of genomics, nomenclature must keep pace with how these sequences are identified and name them accordingly. Current technologies show that a single stretch of DNA can be utilized in several different ways, for example, alternative splicing can produce two or more different products. In addition, there is alternate promoter usage. These examples pose numerous questions as to whether we should consider these as different genes and if 'all this' might be a reconsideration of the concept of a 'gene'. As a result, problems arise in assigning unique identifiers (symbols and names) for loci, genes, transcripts and proteins, as well as phenotypes, mutations, and alleles.

Organized nomenclature has become a vital tool for information retrieval and consistency between databases and it must continue with community support and interest. The Mouse Genome Informatics Nomenclature Committee works cooperatively with 1) other nomenclature committees (Human, Rat, and Zebrafish), 2) sequence databases (SwissProt, NCBI, and Unigene), 3) advisors for gene families (CD/LY, CYP450 and FOX), 4) journal editors (Nature Genetics, Genomics, and Mammalian Genome), and members of the scientific community in developing standardized nomenclature. In addition, the Mouse Genome Database (MGD, prominently displays sequence information for each gene with links to respective sequence databases.

