International Mammalian Genome Society

logo18th International Mouse Genome Conference

17-22 October 2004, Seattle, USA


ORAL PRESENTATION

TUESDAY OCTOBER 19

2.45pm – 3.00pm

EXACTPLUS : A PROGRAM FOR DETECTING SMALL CONSERVED GENOMIC REGIONS BY MULTI-SPECIES SEQUENCE COMPARISONS

Antonellis A1, Prasad AB1, Wolfsberg TG1, Program NCS2, Green ED1, Pavan WJ3

1 Genome Technology Branch, NHGRI, NIH, Bethesda, United States, 2 NIH Intramural Sequencing Center, NHGRI, NIH, Gaithersburg, United States, 3 Genetic Disease Research Branch, NHGRI, NIH, Bethesda, United States

Multi-species comparative sequence analysis is emerging as a powerful tool for identifying transcriptional regulatory elements. One obstacle in these analyses is determining the appropriate conservation thresholds that accurately identify sequences most likely to be functional. With this in mind, we developed a software tool (ExactPlus) that identifies short, identical stretches of DNA sequence in multi-species alignments. Relevant factors related to this approach include: (1) transcriptional regulatory elements are likely defined by transcription factor binding sites; (2) transcription factor binding sites may represent very short stretches of DNA; (3) sequence conservation within regulatory elements is likely highest at protein binding sites; and (4) algorithms that identify large, broadly conserved regions may not detect smaller, highly conserved functional elements. ExactPlus input includes a MultiPipMaker alignment file, the basepair length of matches to report, and the number of species that define a match. For example, fragments 6 basepairs or greater identical in at least 5 out of 7 species may be identified. ExactPlus reports a consensus sequence at each match, and a UCSC Genome Browser custom track for positioning results on available genomes.

We used ExactPlus to analyze multi-species sequences of five loci involved in melanocyte development. Importantly, experimentally validated regulatory elements have been identified for a subset of these. We established appropriate thresholds by comparing ExactPlus results obtained from human/mouse/rat alignments with: (1) the known set of regulatory elements, and (2) the ExactPlus dataset generated using additional mammalian species. Our results are relevant for identifying regulatory elements using already available genome sequences.

[an error occurred while processing this directive]