INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technology from seed


Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa

A parallel algorithm for the extraction of structured motifs

10/17/2003 - 13:30
10/17/2003 - 14:30

We present a parallel algorithm for the efficient extraction of binding-site consensus from genomic sequences. This algorithm is based on an existing approach for extracting structured motifs. A structured motif consists of an ordered collection of p boxes, p substitution rates and p-1 distances between successive boxes. The contents of the boxes, which represent the extracted motifs, are unknown at the start of the process and are found by the algorithm using a suffix tree as the fundamental data structure.

By partitioning the structured motif searching space we divide the most demanding part of the algorithm by a number of processors that can be loosely coupled. In this way we obtain, under conditions that are easily met, a speedup that is linear on the number of available processing units. This speedup is verified by both theoretical and experimental analysis.