International Mammalian Genome Society

The 14th International Mouse Genome Conference (2000)


D5. Mouse BAC Ends Quality Assessment and Sequence Analyses

Shaying Zhao, Joel Malek, Lily Fu, Bola Akinretoye, Sofiya Shatsman, Maureen Levins, Stephany McGann, Keita Geer, Getahun Tsegaye, Margaret Krol, Tamara Feldblyum, Mark D. Adams, William Nierman and Claire Fraser
The Institute for Genomic Research, Rockville, MD 20850

The mouse genome sequence will facilitate the accurate annotation of the human genome and the understanding of human gene structure and regulation, and human disease. However, compared to the human, significantly fewer large scale mapping efforts have been conducted for the mouse and much less data are available to the community. BAC end sequences (BESs) are specific genome-wide markers and are essential to any strategy chosen to sequence the mouse. BAC end sequencing effort is therefore very significant for the mouse genome project.

The large scale BAC end sequencing at TIGR generates the most extensive mapping data in an inexpensive and rapid fashion. To date, we have generated 238,143 mouse BAC end sequences (mBESs) from 134,404 RPCI-23 clones with an average read length of 462 bp, representing a clone coverage of 8.8X and a sequence coverage of 3%, and providing a sequence marker every 130 kb across the genome. A total of 103,379 BACs have sequences from both ends indicating a coverage of 6.8X by the paired end clones. The average Q20 length is above 400 bp and mBESs match finished mouse genomic sequences with an identity of 99% and a length of 430 bp. This high quality sequence is important for identifying matches on the repeat-rich genome. The use of the ABI-3700 sequencers and the sample tracking system ensure that over 98% clones are associated with the correct mBESs at both ends during the entire process (clone tracking accuracy), giving mBES users a high confidence in retrieving the right clones based on sequence matches and in building genome assembly scaffolds using paired end clones.

The mBES set represents a random sampling of the genome and the analyses of this dataset will provide useful information about the mouse genome. Our results indicate that 63% mBESs contain repeats and 36% bases are repeats. Compared to the finished mouse genomic sequences (13% base L1) and human BAC end sequences (6.7% BESs and 3.8% base L1), a significantly higher fraction of the repeats is L1 (26% mBESs and 19% base). This is because the mouse L1 has paired EcoR1 sites and RPCI-23 is made with EcoRI. About 73% clones have at least one end with 100 bp contiguous unique sequences and 55% clones have both ends with 100bp contiguous unique sequences. Those will be very useful in genome assembly. About 2.5% mBESs match mouse ESTs, 0.2% mBESs match human ESTs and 0.3% mBESs match rat ESTs. Over 60% of the matches are conserved across the species. Around 0.1% mBESs match STS markers. We compared mBESs to human chromosome 21 and 22 sequences on the protein level and found that 0.2% mBESs match the human sequences with a identity of above 90% and a length of 50-400 bp. There is one match every 1 Mb human sequence. Over 50% of these mBESs match ESTs, indicating that the conserved regions are mostly coding or regulatory sequences. The analyses indicate that our high quality mouse BAC end sequences will be a valuable resource in many areas of genome research.


Abstracts * Officers * Bylaws * Application Form * Meeting Calendar * Contact Information * Home * Resources * News and Views * Membership

Base url http://imgs.org
Last modified: Saturday, November 3, 2012