The 13th International Mouse Genome Conference
October 31-November 3, 1999

B16 Integrating Genome Sequence with Biology

C. J. Bult, J. A. Blake, J. T. Eppig and the Mouse Genome Informatics Group. The Jackson Laboratory, Bar Harbor, ME

Sequence similarity provides a powerful mechanism for predicting orthogonal relationships between mouse and human genes. However, it is the extension of sequence level correspondence to the detailed knowledge about the genes and their relationship to phenotype that makes the comparative genomics approach such a powerful one for predicting and understanding biological processes. As the ability to collect large amounts of complex biological information grows, integration of data about the same genomic feature from diverse sources will be key to developing new insights into human biology using mouse as a model organism.

We are developing a data integration pipeline and related databases to connect features identified in the emerging mouse reference genome sequence to the function, expression, homology, mapping and phenotypic data that are available for genes and markers in the Mouse Genome Database (MGD) and Gene Expression Database (GXD). The sequence data integration process we are implementing comprises three steps: 1) establishing equivalency with genetic markers and loci already represented in MGD and GXD, 2) nomenclature validation or assignment, and 3) mouse-human gene homology assessment. While many automated (or semi-automated) sequence annotation pipelines and data curation interfaces have been developed in recent years, similar systems for data integration are not well developed. In this poster we will outline our data integration process and the design of a workflow management system for supporting large-scale data integration of mouse genomic sequence data.

Supported by DOE grant DE-FG02-96ER62327 and NIH grant HG01559.


