The 13th International Mouse Genome Conference
October 31-November 3, 1999

B15 Genome Project Data in the EMBL Nucleotide Sequence Database

Alexandra E. van den Broek, Peter Sterk, Guenter Stoesser. EMBL Outstation-The European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK

The European Molecular Biology Laboratory (EMBL) nucleotide sequence database is a collection of primary nucleotide sequences. It is maintained by the European Bioinformatics Institute (EBI) in collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank (USA). Data are received from genome projects, sequencing centres, individual scientists and patent offices. New data are released daily into the EMBLNEW database and the contents of EMBLNEW are incorporated each quarter into the EMBL database for release to the scientific community. The EBI's World Wide Web interfaces ( and network servers offer access to the most up-to-date data collection and provides database searching (SRS) and sequence similarity facilities (e.g. Blast and FastA).

At least in sheer quantity, large-scale sequencing projects have become the major sources of new sequence data. For groups producing large volumes of nucleotide sequence data over an extended period, submission accounts can be established with the EBI. A submission protocol is agreed upon and database entries produced at their research site can be deposited and updated directly by the originating group using FTP or email. Each submission account is curated by EBI biologists, who ensure that new entries follow EMBL database annotation conventions and serve as an informed liaison between the sequencing group and the EMBL database. Groups that wish to make use of this submission procedure should contact the database at:

Sequence data produced at sequencing centres are included into the database as soon as they become available from the individual sequencing groups, and are immediately available for homology searches via network services. High-throughput sequence records are included in the HTG division and contain keywords to indicate the status of the sequencing (e.g., HTGS_PHASE1).

The progress of a number of large genome sequencing projects is monitored in the so-called genome monitoring table (Genome MOT). This table shows the total amount of finished and unfinished genomic DNA sequences deposited per year into the DDBJ/EMBL/Genbank databases for a number of model organisms. It is updated on a weekly basis and can be found at URL:

The Genome MOT reports over 23 Mb finished genomic mouse DNA and nearly 7 Mb unfinished mouse DNA (data July 1999).


