Cyanobacteria are ubiquitous photosynthetic prokaryotes, capable of producing toxic secondary metabolites, and often responsible for costly deterioration of human-managed water systems. In recent years, molecular characterization of cyanobacteria has gained increasing importance due to the limitations of traditional morphological identification methods. This approach often requires DNA sequences to be accurately aligned, in order to perform phylogenetic reconstructions, and correctly identify environmental samples.
A case study, on the effect of sequence alignment and tree construction algorithms on cyanobacteria identification, is presented. Cyanobacteria isolates (n=19), from 11 Western Australian freshwater systems, were sequenced at the highly polymorphic phycocyanin –intergenic spacer (PC-IGS) locus, and, for comparison, the small ribosomal subunit RNA gene (16S rDNA).
For each marker, ClustalW and MUSCLE alignments were obtained, and trees were constructed using maximum-likelihood, maximum-parsimony and neighbour-joining methods to determine phylogeny and isolate identity. Different gap weight settings (gap opening penalty, gap extension penalty) were used for ClustalW and MUSCLE, to determine their effect on the generated tree topologies.
Basic structures of trees and clustering patterns of the individual orders were relatively consistent and similar for both loci regardless of tree construction method, representing the underlying phylogenetic signal. However, the placement of these orders were affected, by the chosen alignment parameters, and gap treatment options (pairwise/use all sites or complete deletion), especially in the case of the PC-IGS locus. These two factors appeared critical, as they produced trees that were on average more dissimilar to each other than those produced by different tree building methods. This study demonstrates the potential impact of inaccurate phylogenetic reconstructions on cyanobacterial identification, and suggests that at least some of the disagreements between microscopy and DNA-based methods may be due to poor alignment strategies, rather than to differences in data or tree-building methods.