Wednesday, December 21, 2011

Ka/Ks calculate

Calculation of numbers of synonymous and                           non-synonymous substitutions per site using the method of Nei & Gojobori (1986).

Show that syn and non-syn sites evolve at different rates.

Need to calculate:     
                           S = no. syn sites
                           N = no. non-syn sites
                           Sd = no. syn differences
                           Nd = no. non-syn differences

Now define :             
              DS = Sd/S  (fraction of syn sites that differ)
              DN = Nd/N (fraction of non-syn sites that differ)

These are equivalent to D in the Jukes-Cantor model.
We can use the JC distance formula to calculate two evolutionary distances.

dS = -3/4 ln(1- 4DS/3)       (no. of syn subs per syn site)
dN = -3/4 ln(1- 4DN/3)             (no. of non-syn subs per non-syn site)

These are equivalent to the usual Jukes-Cantor d, which is the number of substitutions per site if all sites are equivalent.

For any two homologous sequences, we expect dS > dN because selection slows down the rate of non-syn subs.

If we know the time t since two species diverged, we can calculate the rates of syn and non-syn subs:  
       dS/2t   and dN/2t.  
These rates would be numbers of subs per site per million years.

If we don’t know t, we can still compare the two distances. The ratio dN/dS tells us how much slower the non-syn subs are.

Notation:    
d is sometimes called K
dS is sometimes called KS
dN is sometimes called KA (where the A means amino acid subs)

dN/dS is the same thing as KA/KS



          1   2   3   4   5
         Pro Phe Gly Leu Phe
Seq 1    CCC UUU GGG UUA UUU
Seq 2    CCC UUC GAG CUA GUA
         Pro Phe Ala Leu Val

Calculate S for each codon.
Check the genetic code -
A fourfold degenerate site counts as S = 1(N = 0)
A non-degenerate site counts as S = 0 (N = 1)
A two fold degenerate site counts as S = 1/3 (N = 2/3)

1. S = 0 + 0 + 1 = 1
2. S = 0 + 0 + 1/3 = 1/3
3. S = 0 + 0 + 1 = 1 (whether we look at Gly or Ala codons)
4. for UUA, S = 1/3 + 0 + 1/3 = 2/3
    for CUA, S = 1/3 + 0 + 1 = 4/3
              Take the average of these: S = 1 for codon 4.
5. for UUU, S = 1/3
    for GUA, S = 1
              Take average: S = 2/3

For whole sequence, S = 1 + 1/3 + 1 + 1 + 2/3 = 4

N = total number of sites - S = 15 - 4 = 11


          1   2   3   4   5
         Pro Phe Gly Leu Phe
Seq 1    CCC UUU GGG UUA UUU
Seq 2    CCC UUC GAG CUA GUA
         Pro Phe Ala Leu Val

Calculate S­d and Nd for each codon.
1. Sd = 0,      Nd = 0
2. Sd = 1,      Nd = 0
3. Sd = 0,      Nd = 1
4. Sd = 1,      Nd = 0
5. this could happen two ways
       UUU --> GUU --> GUA                route 1
              Nd = 1     Sd = 1                       Sd = 1, Nd = 1
       UUU --> UUA --> GUA                route 2
              Nd = 1   Nd = 1                       Sd = 0, Nd = 2
   Take average of these two:               
   Sd = 0.5,    Nd = 1.5

(note that if all three positions were different there would be 6 routes to average)

Total Sd = 2.5                                 Total Nd = 2.5

DS = 2.5/4 = 0.625                         DN = 2.5/11 = 0.227
dS = 1.34                                        dN = 0.271

Non-syn rate is much slower than syn rate in this example

Ka/Ks Tutorial for DataMonkey

DataMonkey is a good internet tool to calculate Ka/ks by using Yang package. It incorporates many methods such as SLAC, FEL, etc, and also some popular assumption, like HKY85-F84. Also DataMonkey is a codon-based software, and you can get positive selection in codon levels. Below are the steps showing you how to get Ka/Ks in DataMonkey.

1. go to the home page of DataMonkey(http://www.datamonkey.org/) and select "ANALYZE YOUR DATA".
2. choose file in the right format and upload your data.
3. A basic analyse will show up. Then select "Proceed to the analysis menu".

4. You can see a combobox in "Method", choose "SLAC" and press "run".

5. You can see a updating page in 30 seconds. It will show you Ka/Ks.
6. Then you select "INFORMATION: OTHER ANALYSIS" to get more information. You can get "Neighbor Joining Tree" from this page. You can also re-analyze your data by another method. all the result will keep in this page.
7. For example, you want to analyze your data by "FEL".

8.  Do the same steps as "SLAC".


9.  From "INFORMATION: OTHER ANALYSIS" page, you can see both results from "SLAC" and "FEL".

10. AT last, you can integrate both results and get a final report.

Tuesday, December 20, 2011

Ka/Ks Tutorial for Mega5.0


In order to calculate Ka/Ks, first you have to align your sequence. One of the advantages of Mega5.0 is that it is multifunctional and can be used in many bioinformatics analyses such as phylogenetic study. Therefore, a built-in alignment tool, ClustalW, is incorporated into Mega5.0 to align multiple sequences.


In this tutorial, we only focus on the application of Mega5.0 in positive selection, which calculates the ratio of Ka/Ks in target sequences.
Steps to calculate Ka/Ks:
1. Open Mega5.0àAlignàEdit/Build align

2. Copy & paste your sequencesà AlignmentàAlign by ClustalW

3. DataàExport Alignmentà Mega format (save for future use)
4. DataàPhylogenetic analysis

5. Go back to the main interface: DistanceàCompute pairwise distances

6. Now a new window shows up, and you can select different models in Substitution Model, and either Ka or Ks in the Substitutions to include


7. This table gives you the Ka, if you choose ‘Nonsynonymous only’

8. To get overall value of Ka, choose AverageàOverall


9. You can export this table as excel by clicking the ‘Export’ option in File and select XL as output format

10. You need to repeat step 6 but to choose ‘Synonymous only’ to get Ks.

11. Divide Ka by Ks and get Ka/Ks ratio.