International Mammalian Genome Society

The 16th International Mouse Genome Conference (2002)

Oral Presentation

Sunday 17 November

14:30 - 14:45 HRS


DM Church

Co-Authors: 1) Agarwala R, 2) Frankel WN,† 1) Schriml L,† 1) Maglott D, 1) Schuler G†and the NCBI Annotation Team
Institutions: 1) National Center for Biotechnology Information, Bethesda, MD† 2) The Jackson Laboratory, Bar Harbor, ME

The Mouse Genome Sequencing Consortium (MGSC) has released a Whole Genome Shotgun (WGS) based assembly of the mouse genome (MGSCv3). In an effort to assess this resource, we have been comparing the assembly to non-sequenced based maps, as well at to clone based sequences. Preliminary results indicate that this assembly covers >90% of the mouse genome, and shows good agreement with the WIBR genetic map. However, this coverage is not completely uniform. In a comparison to finished BAC sequence (NT contigs) overall, greater than 95% of the bases in the NT contigs were covered, but the coverage varied from 56% - 99% when looking at individual contigs. In addition, only 35 of the expected 56 loci of the non-ecotropic MLV family could be identified in this assembly. Searching the trace archive for these loci yielded the expected number of hits, suggesting that moderately repetitive sequences could be difficult to assemble using WGS methodology.† We are moving toward an integrated assembly of BAC sequenced combined with WGS contigs. Preliminary results in this effort suggest that with BAC sequence only covering approximately 25% of the genome, sequence for 35% of gaps in the MGSCv3 with map information can be found in BAC sequence. The integration of these two sequence resources will likely improve the overall representation of the genome. In addition, we are annotating the MGSCv3 assembly. The first round of annotation placed 23,419 STSs, 27,354 SNPs, and 101,665 clones onto the genome. In addition, 46,370 gene models were produced (compared to 46,038 gene models for Human build 29). 8198 of 8454 RefSeqs could be used as evidence for producing a gene model on the MGSCv3 assembly. Eight of the 256 RefSeqs that failed to produce a model on the MGSCv3 assembly produced a model on the finished BAC clones, providing further evidence of the utility of combining these resources. Genome wide comparison of the human and mouse genomes, at the mRNA level and the genomic level is ongoing.† These comparisons allow us to produce a conserved synteny map and integrate the human and mouse genomes through NCBIís LocusLink and the MapViewer resources.

