INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technology from seed


Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa

Location Proteomics Using Machine Learning Techniques

12/18/2007 - 17:00
12/18/2007 - 18:00

Fluorescent microscopy is a method by which a labeled protein can be imaged inside a cell. Such images can be used to determine the subcellular location of the protein. I will show how machine learning techniques have been used to automate this task. I will present the basic methods used as well as some more recent work.

Design of a web based typing tool

12/06/2007 - 17:00
12/06/2007 - 18:00

ccrB typing, based on the DNA sequencing of an internal fragment of ccrB, was developed as a potential first-line SCCmec typing strategy for methicillin resistant Staphylococcus aureus, since the ccrB sequence is part of the ccrAB locus, whose allotypes are used for the definition of SCCmec types. Clustering of ccrB sequences has been shown to properly discriminate between different SCCmec types (Oliveira et al, J. Antimicrob Chemother 58(1):23-30).

Speculations on a New Approach to Modeling Biological Systems

11/29/2007 - 17:00
11/29/2007 - 18:00

Computational systems biology complements experimental biology in unique ways that are hoped to reveal insights and a depth of understanding not achievable without systems approaches. A major challenge of systems biology continues to be the determination of parameter values for mathematical models. While some models can be analyzed in symbolic form, these are few and far between, and the lack of parameter values is a true obstacle for most computational analyses of realistic biological phenomena.

An evaluation of the impact of side chain positioning on the accuracy of discrete models of protein structures

11/15/2007 - 17:00
11/15/2007 - 18:00

Discrete models are important to reduce the complexity of the protein folding problem. However, a compromise must be made between the model complexity and the accuracy of the model. Previous work by Park and Levitt has shown that the protein backbone can be modeled with good accuracy by four state discrete models. Nonetheless, for abinitio protein folding, the side chains are important to determine if the structure is physically possible and well packed. We extend the work of Park and Levitt by taking into account the positioning of the side chain in the evaluation of the accuracy.

Mining Queries

11/06/2007 - 16:00

User queries in search engines and Websites give valuable information on the interests of people. In addition, clicks after queries relate those interests to actual content. Even queries without clicks or answers imply important missing synonyms or content. In this talk we show several examples on how to use this information to improve the performance of search engines, to recommend better queries, to improve the information scent of the content of a Website and ultimately to capture knowledge, as Web queries are the largest wisdom of crowds in Internet.

Efficient learning of Bayesian network classifiers: An extension to the TAN classifier

10/25/2007 - 17:00

We introduce a Bayesian network classifier less restrictive than Naive Bayes (NB) and Tree Augmented Naive Bayes (TAN)classifiers. Considering that learning an unrestricted network is unfeasible the proposed classifier is confined to be consistent with the breadth-first search order of an optimal TAN. We propose an efficient algorithm

Compressing Web Graphs as Texts

09/28/2007 - 16:30
09/28/2007 - 17:30

The need to run different kinds of algorithms over large Web graphs motivates the research for compressed graph representations that permit accessing without decompressing them. At this point there exist a few such compression proposals, some of them very effective in practice. In this talk we introduce a novel approach to graph compression, based on regarding the graph as a text and using existing techniques for text compression/indexing. This permits accessing the graph efficiently without decompressing it, and in addition brings in new functionalities over the compressed graph.

Data Format Description Framework - A Descriptive Approach to Data Standardization

07/05/2007 - 16:00
07/05/2007 - 17:00

Data standardization is fundamentally prescriptive because no information system can solve the data integration issue without enforcing certain rules. The question is, therefore, where the rules should be prescribed. Most existing data standards prescribe the rules over the data itself. However, excessive use of such an approach can easily lead to inefficient data representation. An alternative approach enforces the conforming rules over the description of the data.

Processamento de Dados Biológicos Utilizando Tecnologias Grid

06/28/2007 - 16:00
06/28/2007 - 17:00

Actualmente, verifica-se um interesse crescente no desenvolvimento de métodos computacionais para a descoberta dos mecanismos de regulação dos genes. Neste âmbito, a identificação de motivos em genomas assume particular relevância. De igual modo, a computação segundo o paradigma de malha de computadores (Grid) tem emergido como a tecnologia de eleição em problemas que requerem computação paralela de alto desempenho. Neste relatório, abordase a aplicação de tecnologias Grid ao problema de identificação de motivos em regiões promotoras de genes.

Probabilistic Genetic Networks

06/14/2007 - 16:00
06/14/2007 - 17:00

The advent of genomics into malarial research is significantly accelerating the discovery of control strategies. Dynamical global gene expression measures of the intraerythrocytic developmental cycle (IDC) of the parasite at 1h-scale resolution were recently reported. Moreover, by using Discrete Fourier Transform based techniques, it was demonstrated that many genes are regulated in a single periodic manner which allowed to order genes according to the phase of expression. In this work we present a framework to construct genetic