| Principal Coordinates Analysis |
Either way, the dataset will be read in using read.alignment
with format="clustal" and stored in a dataframe called
aln. Temporary files are stored
here.
Once the alignment has been pasted or selected, you can start the PCO
computation. Below is the R code allowing to compute the analysis on the
data submitted.
You can modify these parameters in order to compute PCO with different options for the graphics.
aln. Therefore, do
not change it or you will not be able to compute your analysis.
mat <- dist.alignment(aln, matrix = "similarity")
Starting from the alignment in aln, a distance matrix is built.
Two options are available for computing the distances: matrix =
"identity" or matrix = "similarity". The first option can
be used either with nucleotide or protein sequences. It simply counts the number
of differences between each sequences in the alignment to compute the distances.
The second option can be used only with protein sequences alignments. It uses
the Fitch (1966) distance matrix between amino acids. This matrix is based on
the number of mutations required to change an amino-acid into another one.
The second line transforms the distance matrix computed from the alignment into an euclidean matrix as PCO can only be computed on such kind of data:
dst <- lingoes(mat)
Next, the categories for the organisms considered are defined. Note that this line is specific of the alignment used in this example, therefore it must be removed or edited when using your own data.
cat <- as.factor(c(1,1,1,1,2,3,3,3,3,3,3,3,4,1,1,1,1,1,1,1,1,3,3,3,3,5,5,5,3))
The categories defined here correspond to the taxonomic groups for the
species in which the sequences has been obtained. 1:
Proteobacteria, 2: Deinococcus/Thermus
, 3: Gram positives bacteria,
4: Cyanobacteria, 5:
Yeast.
The next line corresponds to the computation of PCO itself. The options used
mean that only the three first axes of the analysis have to be taken into
consideration (see the
dudi.pco
documentation page for more information):
pco <- dudi.pco(dst, scan = F, nf = 3)
s.label(pco$li, sub = "F1xF2 map")
The last command line is also specific to the example and must be removed or
edited when submitting your own data. It allows to draw the hulls gathering the
species belonging to the same taxonomic group. The colors used correspond
to the five groups previously defined:
s.chull(pco$li, cat, optchull=1, add.plot=TRUE, col=c("red","black","green","purple","blue"))
Here, we can see that a group of four enterobacteria (Salmonella
enterica, Salmonella typhimurium, Escherichia coli and
Yersinia pestis) is separated from the other Proteobacteria. This
phenomenon could be explained either by a high evolutionary rate for this
gene in the clade of Enterobacteria or by an horizontal transfer in the
common ancestor of the four species considered.