17th International Mouse Genome Conference

9-12 November 2003, Braunschweig, Germany


Church DM
National Center for Biotechnology Information

Co-Authors: 2) Bailey J A, 1) Agarwala R, 2) Eichler E E
Institutions: 1) National Center for Biotechnology Information, 2) Case Western Reserve University School of Medicine

The mouse genome allows us a unique opportunity to assess Whole Genome Shotgun (WGS) and hierarchical (clone-based) assemblies. One difficulty in assembling mammalian genomes is the presence of segmental duplications. A WGS assembly of the mouse (MGSCv3) has been available for over a year. Greater than 95% of the mouse genome is also available as HTGS (BAC-based) sequence.

Based on data from January 27, 2003, we attempted to integrate 0.736 Gb of finished sequence into the MGSCv3 (NCBI Build 30). 58 of the 4740 finished BACs used in Build 30 could not be integrated. Of these, 20 BACs were unplaced because of a chromosome assignment conflict and 38 BACs were unplaced because of an alignment conflicts.

Utilizing published methods (Bailey et al. 2001; Bailey et al. 2002) we assessed the level of duplication in the MGSCv3 and in the finished sequence from Build 29. Only duplications >10kb at >90% identity were considered. Strikingly, the level of duplication in the MGSCv3 was well below the level of duplication reported for human (Human- 4.52%, MGSCv3- 0.70%). The majority of the duplication seen in MGSCv3 was found in the unplaced contigs. However, 2.71% of the finished sequence was found to be involved in duplication. Notably, the duplicated regions contain protein coding genes including transcription factors and 7Tm protein family members. Many of the BACs found to be involved in duplications were unable to be integrated into the MGSCv3. These data suggest that regions of segmental duplication are under-represented in the MGSCv3.

