Team:HK CPU-WFN-WYY/Model

Model

Modeling

Phylogenetic tree

Phylogenetic tree is generated by software MEGA.

Evolutionary distance(1)(2)

  • the number of amino acid substitutions per site between two homologous protein sequences
  • Various methods for estimating the number of amino acid substitutions
  • Substitution model

    • (Poisson model)Poisson-correction distance(3)
      • estimating the number of amino acid substitutions per site under the assumption that the number of amino acid substitutions at each site follows the Poisson distribution
        • d =distance(Number of amino acid substitutions per site)
        • p =proportion of different amino acids between two sequences compared
        • nd=the number of nucleotide differences
        • n=the total number of amino acids compared

UPGMA(4)

  • unweighted pair group method with arithmetic mean
  • Algorithm:
    • Compute the distance between each pair of sequences
    • Treat each sequence as a cluster by itself
    • Merge the two closest sequences. The distance between two clusters is the average distance between all their sequences:
      • Updated distance between joined cluster A∪B and a new cluster X is calculated by weighted mean:

Bootstrap value

  • Indicate the accuracy of the phylogenetic tree
  • Bootstrapping is a self-sustaining process based on the hypothesis that the sample represents an estimate of the whole population, and that statistical inference can be drawn from a large number of bootstrap samples to estimate the bias, standard error, and confidence intervals of the parameters of significance.The model developed with the randomly selected n objects is used to predict the activity of the remaining excluded compounds and the average Q2 value is calculated. A high value of bootstrapped Q2 reflects statistical significance of the developed QSAR (Quantitative Structure-Activity Relationship)model.(5)
  • Step(6):
    1. Choose a number of bootstrap samples to perform
    2. Choose a sample size
    3. For each bootstrap sample
      • (a)Draw a sample with replacement with the chosen size
      • (b)Calculate the statistic on the sample
    4. Calculate the mean of the calculated sample statistics.
  • Repetition(7)
    • A minimum might be 20 or 30 repetitions
    • Number of repetition must be large enough to ensure that meaningful statistics, such as the mean, standard deviation, and standard error

Protein model

The homology models were generated using software SWISS-MODEL. They will be used in molecular dynamics simulations in future.

Reference

(1) "(PDF) Evolutionary Distance: Estimation - ResearchGate."

(2) "Distance Estimation --MEGA manual - MEGA software."

(3) "Poisson Correction (PC) distance - MEGA software."

(4) https://www.youtube.com/watch?v=c2y9s_E2184&t=615s

(5) https://www.sciencedirect.com/topics/medicine-and-dentistry/bootstrapping

(6) "(PDF) Evolutionary Distance: Estimation - ResearchGate."

(7) https://machinelearningmastery.com/a-gentle-introduction-to-the-bootstrap-method/