Skip to main content
 

Computational prediction of TFBS sequences in the phytopathogen Pseudomonas syringae. (2011)

Undergraduate: Surojit Biswas


Faculty Advisor: Jeffery Dangl
Department: Biology


The HrpL alternative sigma factor is a transcription factor that activates the expression of multiple genes that are essential to the plant pathogenicity of P. syringae. Here, we present an algorithm to detect HrpL transcription factor binding site (TFBS) sequences de novo. The algorithm, named PHI*, first performs an ab initio open reading frame (ORF) search, in which candidate ORFs are selected according a probability based size criterion. Following an alignment of known TFBS sequences, the algorithm builds two position sensitive weight matrices (PSWM), which are then used, in concert, to probe upstream regions of candidate ORFs for potential TFBSs. Finally, for each ORF containing a putative TFBS, an operon search is performed based on a probability distribution of gene overlap and/or proximity. PHI was run on two gold standard genomes of P. syringae, B728a and 1448A. Across the genomes we report an average sensitivity score of 96.7%, specificity score of 99.5%, and accuracy score of 99.4%. In comparison to other leading bioinformatic methods, our algorithm is 0.9% more accurate in B728a and 2.0% more accurate in 1448A, which corresponds to a 39.6% reduction in false positive rate in B728A and a negligible reduction in false positive rate in 1448A. We conclude that our method performs as well or better than previous screens. Furthermore, this algorithm offers a time and monetarily efficient way to engage in draft genome analysis and large scale comparative studies.

 

Leave a Reply

You must be logged in to post a comment.