The 14th International Mouse Genome Conference (2000)

D14. Mouse cDNA Encyclopedia Project; Progress of the Sequencing of the Mouse Full-Length cDNAs

Jun Kawai, Akira Shinagawa, Piero Carninci, Yasushi Okazaki, Masayoshi Itoh, Kazuhiro Shibata, Masayasu Yoshino, Katsunori, Aizawa, Yoshifumi Fukunishi, Hideaki Konno, Jun Adachi, Rintaro Saito, Yuko Shibata, Tomoko Hirozane, Toshiyuki Shiraki, Kenjiro Sato, Norihito Hayatsu, Ayako Hara, Takahiro Arakawa, Yoshiyuki Ishii, Noriko Kikuchi, Masami Muramatsu, Yoshihide Hayashizaki
Genome Exploration Research Group, RIKEN, Genomic Sciences Center (GSC) and Genome Science Laboratory, RIKEN, Tsukuba Institute, Core Research of Evolutional Science and Technology (CREST), Japan Science and Technology Corporation (JST), Tsukuba-shi, Ibaraki, 305-0074, Japan;

RIKEN is proceeding the RIKEN mouse encyclopedia project, which consists of three phases; (1) collection of full-length cDNAs, (2) sequencing of them, and (4) mapping on the chromosomes. In the first phase, we have been arraying the mouse full-length cDNA clones from various tissues and developmental stages, classifying them based on their 3'-end sequence tags, and constructing the RIKEN non-redundant cDNA clone set. From the RIKEN non-redundant cDNA clone set, we give high priorities to novel cDNAs, and subjected them to full-sequencing project. To optimize the efficiency of sequencing, three sequencing strategies are applied, one pass for the short size clones (less than 0.7kb), Primer walking for the middle size clones (0.7 - 2.5kb), and the shotgun sequencing for the long size clones (more than 2.5kb). Current sequence quality is about 99% accurate, and the comparison of these sequences with the known gene and protein sequences can provide the prediction of the function. Furthermore, we are trying the computational correction of sequencing errors by the program, "DECODER", which estimate the codon preference and appearance of Kozak consensus sequence in ORFs, and insert/delete a nucleotide at the low accurate base to shoot the frame shift errors. So far, over 20,000 cDNA clones have been sequenced, whose average size is around 1.2 kb long. Average length of ORF is 660 bp. These sequences of the mouse full-length cDNAs are very valuable for mapping in silico, for the prediction of those functions and further investigation in genetic studies.

