Categorizing Host-Dependent RNA Viruses by Principal Component Analysis of Their Codon Usage Preferences

*MING-WEI SU,1 *HSIU-MAN LIN,1 HANNA S. YUAN,2 and WOEI-CHYN CHU1

1 Institutes of Biomedical Engineering, National Yang-Ming University, Taipei, Taiwan, Republic of China.
2 Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan, Republic of China.
*The first two authors made equal contribution to this work.

ABSTRACT
Viruses have to exploit host transcription and translation mechanisms to replicate in a hostile
host cellular environment, and therefore, it is likely that the infected host may impose pressure
on viral evolution. In this study, we investigated differences in codon usage preferences
among the highly mutable single strain RNA viruses which infect vertebrate or invertebrate
hosts, respectively. We incorporate principal component analysis (PCA) and k-mean methods
to clustering viruses infected with different type of hosts. The relative synonymous codon
usage (RSCU) indices of all genes in 32RNAviruses were calculated, and the correlation of the
RSCU indices among different viruses was analyzed by the PCA. Our results show a positive
correlation in codon usage preferences among viruses that target the same host category.
Results of k-means clustering analysis further confirmed the statistical significance of this
study, demonstrating that viruses infecting vertebrate hosts have different codon usage
preferences to those of invertebrate viruses. Based on the analysis of the effective number of
codons (ENC) in relation to the GC-content at the synonymous third codon position (GC3s),
we further identified that mutational pressure was the dominant evolution driving force in
making the different codon usage preferences. This study suggests a new and effective way to
characterize host-dependent RNA viruses based on the codon usage pattern.

Key words: codon usage bias, k-means clustering, principal component analysis, RNA viruses,
RSCU..

pdf