Modeling
Phylogenetic tree
Phylogenetic tree is generated by software MEGA.
Evolutionary distance(1)(2)
 the number of amino acid substitutions per site between two homologous protein sequences
 Various methods for estimating the number of amino acid substitutions

Substitution model
 (Poisson model)Poissoncorrection distance(3)
 estimating the number of amino acid substitutions per site under the assumption that the number of amino acid substitutions at each site follows the Poisson distribution
 d =distance(Number of amino acid substitutions per site)
 p =proportion of different amino acids between two sequences compared
 nd=the number of nucleotide differences
 n=the total number of amino acids compared
UPGMA(4)
 unweighted pair group method with arithmetic mean
 Algorithm:
 Compute the distance between each pair of sequences
 Treat each sequence as a cluster by itself
 Merge the two closest sequences. The distance between two clusters is the average distance between all their sequences:
 Updated distance between joined cluster A∪B and a new cluster X is calculated by weighted mean:
Bootstrap value
 Indicate the accuracy of the phylogenetic tree
 Bootstrapping is a selfsustaining process based on the hypothesis that the sample represents an estimate of the whole population, and that statistical inference can be drawn from a large number of bootstrap samples to estimate the bias, standard error, and confidence intervals of the parameters of significance.The model developed with the randomly selected n objects is used to predict the activity of the remaining excluded compounds and the average Q2 value is calculated. A high value of bootstrapped Q2 reflects statistical significance of the developed QSAR (Quantitative StructureActivity Relationship)model.(5)
 Step(6):
 Choose a number of bootstrap samples to perform
 Choose a sample size
 For each bootstrap sample
 (a)Draw a sample with replacement with the chosen size
 (b)Calculate the statistic on the sample
 Calculate the mean of the calculated sample statistics.
 Repetition(7)
 A minimum might be 20 or 30 repetitions
 Number of repetition must be large enough to ensure that meaningful statistics, such as the mean, standard deviation, and standard error
Protein model
The homology models were generated using software SWISSMODEL. They will be used in molecular dynamics simulations in future.
Reference
(1) "(PDF) Evolutionary Distance: Estimation  ResearchGate."
(2) "Distance Estimation MEGA manual  MEGA software."
(3) "Poisson Correction (PC) distance  MEGA software."
(4) https://www.youtube.com/watch?v=c2y9s_E2184&t=615s
(5) https://www.sciencedirect.com/topics/medicineanddentistry/bootstrapping
(6) "(PDF) Evolutionary Distance: Estimation  ResearchGate."
(7) https://machinelearningmastery.com/agentleintroductiontothebootstrapmethod/