International Mammalian Genome Society

logo18th International Mouse Genome Conference

17-22 October 2004, Seattle, USA


POSTER 104 - SNPLAD; A SNP DISCOVERY PIPELINE FOR IDENTIFYING CANDIDATE SNP IN PUBLIC EST TRACE FILES

Noyes HA, Amigo Lechuga J, Broadhead AM, Hughes M, Morton IG, Rennie K, Kemp SJ

University of Liverpool, Liverpool, United Kingdom

Single Nucleotide Polymorphisms (SNP) are assumed to underlie many of the differences between inbred mouse strains. It is important to identify as many SNP as possible to facilitate functional and mapping studies. Large scale SNP discovery projects are being undertaken by the Whitehead Institute and others by resequencing of shotgun genomic clones. The SNP that emerge form these random sequencing programmes are principally in intergenic regions and will be important for mapping and haplotype identification. However it is expected that relatively few of these SNP will be functional. We have screened EST reads from public databases in order to discover SNP with a higher proportion of functional variants. EST reads have been neglected as a source of candidate polymorphisms since there is only a single read associated with each sequence and hence data quality is uncertain. We have developed a SNP discovery pipeline using PolyBayes (Marth et al., Nature Genetics 1999, 452-456) to screen EST trace files for high quality variant base calls. SNP that are assigned high probabilities by PolyBayes are passed to a script which retrieves metadata about the EST libraries in which the SNP was identified. Another script identifies the genomic position of the SNP in Ensembl and retrieves 200bp flanking genomic sequences that can be used as permanent identifiers of the SNP position. The data is loaded into a MySQL database. Heuristic queries have been developed to screen the Polybayes output and the EST metadata for EST predictions that are most likely to be true positives. All publicly available mouse EST trace files will be scanned and the data submitted to public databases. Approximately 100 SNP predictions are currently being validated in vitro to evaluate the accuracy of the pipeline.

[an error occurred while processing this directive]