Team:Technion-Israel/Model

Basic Bootstrap Template



iGEM 2020 – Modeling

As in every great iGEM project, modelling was an extremely important facet of our process. When developing a product such as ACT. and while working in the wet-lab, modeling can be used to visualize and predict different real-world behaviors by making different assumptions and approximations, relying on preliminary knowledge.

We developed two main models: one to help us predict the efficiency of the product in regard to how many decoy proteins do we actually need to include in it. And the other was a protein-affinity prediction model, meant to help us optimize the ACE2 mutations in the final product. Both models enable the constitution of a circular workflow between the dry-lab and the wet-lab. The knowledge acquired in the wet-lab helps us adjust the models, and the predictions derived from the models can help us maximize our productivity in the wet-lab.

A third, minor model which was developed as a sub-routine for the “Product Efficiency” estimation is the “Proteins on Microgel Beads” model. Required due to the need for accurate protein density approximation on the protein-carriers.

Mass Transfer Model

Model motivation

The aim of this model is to quantify the effectiveness of ACT. regarding the adsorption of the viral particles (in “worst-case-scenario” concentrations) onto the gel layer containing protein-carriers (B. subtilis spores and Microgel beads) with “decoy” proteins (ACE2 and Sybodies). This would allow us to calculate the concentration of carriers that we should include in ACT.

Standing on the Shoulders of Giants – Acquiring a solid base for our model

We based our model on results presented in the paper “Mass Transfer Limitations of Porous Silicon-Based Biosensors for Protein Detection”[1] . In this paper Arshavsky-Graham S. et al. developed a theoretical model, which describes the complex mass transport phenomena and reaction kinetics in these porous silicon-based nanomaterials. Two one-dimensional models were thoroughly described: a planar model and a porous model, both describing the concentration of the target analyte as a function of time. The main difference is that the planar model assumes a planar surface and thus neglects the hindered diffusion phenomenon.

Since said article describes the bulk diffusion of a target analyte onto a surface containing tethered proteins that specifically bind the target analyte, it seemed to perfectly fit the mass transfer model we intended for our project (mass transfer and reaction kinetics). Due to time constraints we chose to construct a simulation according to the planar model rather than the porous model.

When considering the planar model for a porous surface, an approximation derives from the “unrolling” of the pores in a manner that allows the neglection of the hindered diffusion within the pores. A visual aid of a pore being “unrolled” and thus becoming a flat capturing surface can be seen in Fig. 1.


Fig. 1: A visual representation of a pore being unrolled to represent a depthless surface to be used in the planar model.


Fig. 2 shows a schematic illustration of the planar model describing the mass transfer and reaction kinetic phenomena onto ACT.


Fig. 2: A schematic illustration of the model. CA0 represents the initial concentration of viral particles in the air. bm represents the initial concentration of decoy proteins on the user’s hands. H is the length of the considered layer of air above the gel.


The Equations

In the bulk solution (infected air), the time evolution of the analyte concentration (virions), cA,bulk (z,t), is governed by Fick’s second law:



Dbulk is the diffusivity of the virions (enveloped by very small water droplets) in the air surrounding the gel-applied skin area.

The boundary conditions are:



Note that at z=0 there is a no-flux boundary condition.

At the gel surface, the following boundary condition is applied, assuming continuity between the diffusive flux and reactive flux:



Where b is the bound virion-protein complex density on the gel surface, dp is the pore diameter and Lp is the porous layer thickness. Since both fluxes relate to a different area, this equation considers the area ratio between the pore entry area and the total area of the pore for reaction while assuming :



The change in the bound complex density is described by:



bm is the initial density of free “capture” proteins in the gel. The initial conditions for the bound complex density:

To implement the boundary conditions into the code, we derived the following expressions for :

based on equation.(2):



We derive from equations (3)&(5):



Preliminary Data

Table 1: Fitting parameters used for the numerical simulations, based on the properties of the different proteins and the different carriers

Parameter

Description

Units

Simulation value

 

H

Considered height of infected air above gel

m

0.01

 

Lp

Layer thickness

m

1×10-4

 

Dbulk

Analyte diffusivity in bulk[2]

m2 s-1

2.5×10-5

 

dp

Average pore diameter[3]

m

2×10-5

 

dA

Hydrodynamic diameter of analyte (virus)

m

1×10-7

 

cA0

Initial concentration of analyte in infected air[4]

Viruses m-3

(1248-7.44×106)

 

NS

Number of spikes on each virion[5][6]

#Spikes

25±9

 

A

**Gel surface area

m2

9.31×10-2

 

bm

Density of immobilized probes

Proteins m-2

***X

 

 

 

 

ACE2

Sb#15

 

PPC

*Protein capacity of carrier[11]

#Proteins

2×104

4×104

Microgel Beads[12]

6.9×102

4.1×103

Bacillus spores[13]

Kon

Reaction Association rate

M-1 s-1

1.9×105

3.0×105

 

Koff

Reaction Dissociation rate

s-1

4.5×10-3

7.0×10-3

 

*For further explanation regarding the estimation of proteins on the Microgel beads, see relevant modelling section.

**The reference surface area chosen is according to the size of average adult male hand palms[14].

***”X” means that this parameter is unknown, and the goal of the model is to find its optimal amount

 


The Code[15][16]

The code is based on the execution of Fick's 2nd Law of Diffusion using the Method of Lines (a PDE solution). The time step is performed with MATLAB’s ODE15s and the space step is realized in the ODE function by using the second-order central difference formula (finite derivative): ; with h being the difference between the air layers (∆z).

The method of lines yields for each zi layer (width of h):

A solution vector is defined:

As is required for calculations (in equation (8) for example), it is implemented as the first-order forward numerical differentiation:

The complete solution vector is ultimately defined as:



Results and Conclusions

Fig. 3.1.A-3.4.B: show the correlation between the number of protein-carriers and the relative number of viruses that would be captured in ACT. for different initial viral concentrations.

The type of protein and type of carrier are described in each one of the Fig. titles. As for the proteins: Fig. 3.1 & 3.3 show ACE2 and Fig. 3.2 & 3.4 show Sybody. As for the carriers: Fig. 3.1 & 3.2 show Microgel Beads and Fig. 3.3 & 3.4 show B. subtilis spores. Fig. with “A” (like 3.1.A) show a 2D plot and Fig. with “B” (like 3.1.B) show a 3D plot, adding a timescale to the result seen in the 2D plot.

We see that in all of the combinations, for the “Safe-Zone” quantity of carriers, even in the worst-case scenario (as seen in table 1), all the viruses (1cm from the skin) will be caught in 30 seconds. The “Safe-Zone” is the ideal concentration of viruses in the gel, in which the product will not be saturated even in the worst-case scenario of viruses in the air or on a surface, but also, isn’t overshooting, meaning the product is not jammed with unnecessary amounts of carriers.

It can be seen in Fig. 3.4.B that 100,000 carriers on two adult hands is enough to create a “Safe-Zone” , which means the product will not be saturated even in the worst-case scenario (as seen in table 1). Two adult hands are approximately the area of 9.31×10-2 m2. For an approximated layer thickness of 1×10-4 m, we get 100,000 carriers per 9.31 ml which is 10,741 particles/ml

1 O.D600 of B. subtilis spores is equivalent to about 1.5×108 CFU/ml[17] . Which brings us to the conclusion that for Sybodies on Bacillus spores, we need to achieve much less than 1 O.D600 (7×10-5 to be exact) in order to declare the product suitable for full user protection.









With this knowledge, once we get to the product-testing phase, we can make thoughtful experiments and know the range of expected efficiency for each protein-carrier combination.

Fig. 4 shows a video illustration of the shift from 2D to 3D:


Fig. 4:A video illustration of ACE2 on Microgel Beads graph shifting from 2D to 3D


For future use and reference, the MATLAB code used for the diffusion model can be found in the attached docx file:

Modeling the Amount of Recombinant Proteins on Microgel Beads

The article discussing the Microgel beads[12], on which we based our design, does not mention the estimated amount of GFP that connected to the beads. Therefore, to assess the amount of recombinant proteins that would attach to the surface of the Microgel beads (ACE2 and Sybodies), we used a mathematical assumption, relying on three premises:

  1. Steric Hindrance is not negligible.
  2. Nickel is not a limiting factor
  3. Proteins bind with 20-50% efficiency

The mathematical assumption is based on the equilateral triangulation of a sphere. Sphere triangulation refers to the geometrical reconstruction of a perfect sphere into a geodesic polyhedron (a convex polyhedron made from triangles). The triangles are not equilateral but rather have sides of approximately equal lengths (referred here as pseudo-equilateral). The sphere that was considered for this model is approximately the size of the Microgel beads (6.5±0.8 µm in diameter according to research[12]). The “beads” was divided into triangles using “as-equilateral-as-possible sphere surface meshing” MATLAB code called “Spheretri”[18] . It was assumed that each triangle represents one incircle which stands for a sterically hindered protein. Due to the steric hindrance, for each protein with a base diameter of 2r, an incircle with a diameter of 4r was assumed. Therefore, the side length of the Pseudo-Equilateral triangles should be [19].


Fig. 5: A visual aid for the concept of using triangulation to assess protein density while considering stearic hindrance.


A script was executed, searching for the number of mesh-points on a bead-sized sphere that would result in the minimal average side length that is equal or greater than . One of the outputs of “Spheretri” is the number of faces in the triangular mesh, which translates directly to the number of triangles and therefore the number of proteins.

ACE2 with a given diameter of 2r = 0.011 µm[8], which is also the inradius of the triangles, resulted in 81,920 proteins per bead. Considering the 3rd premise, the estimated number of ACE2 on each bead is ~1.6×104-4×104.

Sybodies are more problematic since they are smaller than 9 nm (2.5 nm[10]) and therefore can enter the pores of the beads. Since the relation between the size of the recombinant proteins attached and efficacy of protein binding on these beads has not been studied, we decided to make a crude assumption for the number of Sybodies bound.

The number of ACE2 proteins on the beads will be assumed to be ~2×104 and the number of Sybodies will be assumed to be ~4×104.


Fig. 6:2×104 triangles meshed on a surface that is approximately the size of a bead (axes in µm)


Rosetta – Protein Affinity Prediction

As a good scientific methodology, the iGEM foundation suggests the “Design – Build – Test – Learn – Design” cycle for building the connection between the progress of the dry-lab and the wet-lab. This is what we aspire to achieve in this part of our project.

Building the library

Angiotensin-converting enzyme 2 (ACE2) is the cellular entry point for SARS-CoV-2. The entry process starts with binding of the viral Spike protein’s receptor-binding domain (RBD) to the ACE2.[20]

In the development stage of our product, we aimed to find a mutation of the human ACE2 WT protein that will result in a variant with a stronger affinity to the most common Spike protein of SARS-CoV-2. Relying on four different studies[21][22][23][24], we found 16 AA positions composing the ACE2 sequence that are related in some way to the bonding with the Spike protein. Those AA, known as “Hot spots” residues[25], are all located in the extracellular peptidase domain (PD) of the protein. The AA are concentrated in two areas of the PD: positions 20-100 and positions 330-400.

The construction of this sequence library was done (in MATLAB) by changing each of these AA to a different AA. While maintaining their bio-chemical characteristics (Aliphatic, Aromatic, Acidic, Basic, Hydroxylic, Sulfur containing, Amidic), we established two groups in our library designated "Variants group C" and "Variants group D".


Table 2.1: Group C variants

 

Table 2.2: Group D variants

Position

25

27

31

41

42

79

82

83

84

WT AA

A

T

K

Y

Q

L

M

Y

P

Possible changes

V

S

H

W

N

V

C

W

V

L

 

R

F

 

P

 

F

L

I

 

 

 

 

I

 

 

I

P

 

 

 

 

G

 

 

G

G

 

 

 

 

A

 

 

A

 

Position

330

353

354

355

356

357

396

WT AA

N

K

G

D

F

R

A

Possible changes

Q

H

V

E

W

H

V

 

R

L

 

Y

K

L

 

 

I

 

 

 

I

 

 

P

 

 

 

P

 

 

A

 

 

 

G


Those changes gave us 93,313 variants for the C group and 3,889 for the D group, 97,202 variants in total. For future use and reference, the full sequences are attached (in Excel files for easier use).

Decreasing the number of variants in the library was needed prior to testing the variants in the lab (for more information on this process, see our Design Page). Fig. 7 shows a visual illustration of several AA combinations that can be changed, roughly representing several mutation variants.


Fig. 7: A visual representation of the possible changes. Proteins and amino acids are as described in the legend. The 6m0j PDB file[26] was used for the structured complex. The frames were built in Pyrosetta


First screening

The first library screening was done by considering the effect that a single AA might have on the overall folding of the entire protein. As was mentioned, the positions of the mutations were taken from studies that showed their correlation with an increase in the affinity to the Spike protein. Nevertheless, we decided to avoid changing the amino acids that seemed highly important for maintaining the protein folding as exists in human cells. We therefore eliminated those mutation positions that were crucial to the whole folded complex. This choice was made since we wanted to make sure that if a virus can infect human cells, it is still likely to be captured in our product, since one of the reasons we chose the ACE2 protein is the guarantee that as long as the SARS-CoV-2 is infectious to humans, the ACE2 will be effective as a "decoy protein” in ACT.

We used the 6m0j PDB (Protein Data Bank) file[26] of the human ACE2 as a starting point. Then, to see which of the amino acids might affect the folded form and energy of the entire protein, we executed single point AA mutations and searched for radical changes in the folded structure. This kind of changes might disconnect the correlation between the "decoy" effectivity and the viral infectivity, which we desired to avoid.

The selected AA were: Lysine (K) at position 31, Tyrosine (Y) at position 83, Proline (P) at position 84 and Glycine (G) at position 354. Among that, we relinquished the change of Phenylalanine (F) in position 356 to Tryptophane (W). Fig. 8.1-8.5 were generated using PyRosetta and are added to aid in the visualisation of the AA in ACE2. In each Fig., the chosen AA is highlighted with a green halo. The ACE2 can be seen in green ribbons and the Spike as orange sticks.

PyRosetta is an interactive Python-based interface to the powerful Rosetta molecular modeling suite. It enabled us to design our own custom molecular modeling algorithms using Rosetta sampling methods and energy functions.







Table 3.1: Group C variants after reduction

 

Table 3.2: group D variants after reduction

Position

25

27

41

42

79

82

WT AA

A

T

Y

Q

L

M

Possible changes

V

S

W

N

V

C

L

 

F

 

P

 

I

 

 

 

I

 

P

 

 

 

G

 

G

 

 

 

A

 

 

Position

330

353

355

356

357

396

WT AA

N

K

D

F

R

A

Possible changes

Q

H

E

Y

H

V

 

R

 

 

K

L

 

 

 

 

 

I

 

 

 

 

 

P

 

 

 

 

 

G


Those changes reduced the library to only 2162 variants in total, composed of 1729 variants in the C group and 433 in the D group (98% and 88% decrease respectively). Once again with the intention to preserve the knowledge and help others who wish to continue working with the sequences, we attached the Excel files containing both groups.

Second screening

The second screening process was based on homology modeling using SWISS-MODEL[27][28], with PDB file 6m0j as the template, to create a folded variant PDB file. SWISS-MODEL, a service that was suggested by the UIUC iGEM 2020 team, is a fully automated protein structure homology-modeling server with the purpose of making protein modeling accessible to all researchers worldwide.

The next step was using the extracted files to replace the ACE2 chain with 6m0j in Pymol (after aligning the ACE2 variant with the WT). Lastly, the folding energies of the “patched” PDB’s were scored in PyRosetta and used for the generation of a heat-map. This kind of protein heat-map allows us to easily see the severity in which a single change (or multiple) will theoretically affect the folding energy of the binding complex, which might indicate stronger affinity between the two proteins. By comparing the folding score function results, we intended to diminish the library's size even more by selecting the variants with the most negative changes (lowest folding energy compared to the WT).

Because the Rosetta energy function is a combination of physics-based and statistics-based potentials it doesn't attempt to match up with actual physical energy units (e.g. kcal/mol or kJ/mol). Rosetta energies are on an arbitrary scale, sometimes referred to as REU (for "Rosetta Energy Unit")[29]. Fig. 9 shows an exemplary heat-map focusing on the magnitude of the change caused by single point mutations in “hot spots” containing hydrophobic AA.


Fig. 9: A heat-map of Rosetta protein folding energy scores (normalized to WT), units are REU (arbitrary Rosetta energy units). “WT” appear in Positions with the Wild-Type amino acid (A25,L79,A396,P84,G354), in all of which Δ=0. In single-point substitutions, that result in a lower-energy complex, the normalized value (Δ) is mentioned (P79, I396). Positions 84 and 354 are presented but crossed because they were dropped after the first screening (their scoring was performed as a “sanity check”).


The rest of the normalized scores for the single-point mutations are presented in tables 4.1 and 4.2.


Table 4.1: Reduced group C  variants with Normalized Rosetta Energy scores for single-point mutations.

 

Table 4.2: Group D  variants with Normalized Rosetta Energy scores for single-point mutations.

Position

25

27

41

42

79

82

W/T AA

A

T

Y

Q

L

M

Possible changes &

Normalized Rosetta Score

V

33

S

19

W

17

N

8

V

14

C

3

L

52

 

F

1212

 

P

-37

 

I

44

 

I

12

P

76

G

20

G

22

A

891

 

Position

330

353

355

356

357

396

W/T AA

N

K

D

F

R

A

Possible changes &

Normalized Rosetta Score

Q

17

H

48

E

98

Y

17

H

1

V

58

 

R

15

 

 

K

2433

L

52

 

 

I

-87

P

9

G

2396


Tables 4.1 & 4.2, as well as Fig. 9, constitute a proof-of-concept for our variant screening methodology. Just as the above models were created for the 27 possible single-point mutations, they can be created and analyzed for the rest of the 2,135 other variants. Our variant library can be scored and sorted to find the lowest energy variants, with time being the only limiting factor. Those variants will be transitioned from the dry-lab to the wet-lab, where their affinities can be empirically measured.

References
  1. Arshavsky Graham S, Boyko E, Salama R, Segal E. Mass Transfer Limitations of Porous Silicon-Based Biosensors for Protein Detection. ACS Sensors. 2020. doi:10.1021/acssensors.0c00670
  2. Gates DM. Biophysical Ecology. New York, NY: Springer New York; 1980. doi:10.1007/978-1-4612-6024-0
  3. Gong C, Shi S, Dong P, et al. In vitro Drug Release Behavior from a Novel Thermosensitive Composite Hydrogel Based on Pluronic F127 and Poly(ethylene glycol)-Poly(epsilon-caprolactone)-Poly(ethylene glycol) Copolymer. BMC Biotechnol. 2009;9(1):8. doi:10.1186/1472-6750-9-8
  4. Riediker M, Tsai D-H. Estimation of Viral Aerosol Emissions From Simulated Individuals With Asymptomatic to Moderate Coronavirus Disease 2019. JAMA Netw Open. 2020;3(7):e2013807. doi:10.1001/jamanetworkopen.2020.13807
  5. Ke Z, Oton J, Qu K, et al. Structures and distributions of SARS-CoV-2 spike proteins on intact virions. Nature. August 2020. doi:10.1038/s41586-020-2665-2
  6. Zhang L, Jackson CB, Mou H, et al. The D614G mutation in the SARS-CoV-2 spike protein reduces S1 shedding and increases infectivity. bioRxiv. 2020. doi:10.1101/2020.06.12.148726
  7. Yan R, Zhang Y, Li Y, Xia L, Guo Y, Zhou Q. Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2. Science. 2020;367(6485):1444-1448. doi:10.1126/science.abb2762
  8. Shang J, Ye G, Shi K, et al. Structural basis of receptor recognition by SARS-CoV-2. Nature. 2020;581(7807):221-224. doi:10.1038/s41586-020-2179-y
  9. Wrapp D, Wang N, Corbett KS, et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science. 2020;367(6483):1260-1263. doi:10.1126/science.abb2507
  10. Walter JD, Hutter CAJ, Zimmermann I, et al. Sybodies targeting the SARS-CoV-2 receptor-binding domain. bioRxiv. 2020. doi:10.1101/2020.04.16.045419
  11. Negri A, Potocki W, Iwanicki A, Obuchowski M, Hinc K. Expression and display of Clostridium difficile protein FliD on the surface of Bacillus subtilis spores. J Med Microbiol. 2013;62(9):1379-1385. doi:10.1099/jmm.0.057372-0
  12. Mizrahi B, Irusta S, McKenna M, et al. Microgels for Efficient Protein Purification. Adv Mater. 2011;23(36):H258-H262. doi:10.1002/adma.201101258
  13. Carrera M, Zandomeni RO, Fitzgibbon J, Sagripanti J-L. Difference between the spore sizes of Bacillus anthracis and other Bacillus species. J Appl Microbiol. 2007;102(2). doi:10.1111/j.1365-2672.2006.03111.x
  14. Tikuisis P, Meunier P, Jubenville CE. Human body surface area: measurement and prediction using three dimensional body scans. Eur J Appl Physiol. 2001;85(3-4):264-271. doi:10.1007/s004210100484
  15. Kaisare N, Practical Aspects of ODEs, as part of NPTEL Course: MATLAB Programming for Numerical Computation, India, 2015. (https://nptel.ac.in/content/storage2/courses/103106118/Week%20-%208/8-ODE_IVP2.pdf)
  16. Roche de Guzman (2020). Fick's 2nd Law of Diffusion using Method of Lines (https://www.mathworks.com/matlabcentral/fileexchange/71354-fick-s-2nd-law-of-diffusion-using-method-of-lines), MATLAB Central File Exchange. Retrieved October 22, 2020.
  17. Paidhungat M, Setlow B, Daniels WB, Hoover D, Papafragkou E, Setlow P. Mechanisms of Induction of Germination of Bacillus subtilis Spores by High Pressure. Appl Environ Microbiol. 2002;68(6):3172-3175. doi:10.1128/AEM.68.6.3172-3175.2002
  18. Peter Gagarinov (2020). Spheretri (https://github.com/pgagarinov/spheretri), GitHub. Retrieved October 19, 2020.
  19. Weisstein, Eric W. "Equilateral Triangle." From MathWorld--A Wolfram Web Resource. https://mathworld.wolfram.com/EquilateralTriangle.html
  20. Perlman S, Netland J. Coronaviruses post-SARS: Update on replication and pathogenesis. Nat Rev Microbiol. 2009;7(6):439-450. doi:10.1038/nrmicro2147
  21. Procko E, The sequence of human ACE2 is suboptimal for binding the S spike protein of SARS coronavirus 2. bioRxiv Prepr Serv Biol. 2020;2:21-24. doi:10.1101/2020.03.16.994236
  22. Devaux CA, Rolain JM, Raoult D. ACE2 receptor polymorphism: Susceptibility to SARS-CoV-2, hypertension, multi-organ failure, and COVID-19 disease outcome. J Microbiol Immunol Infect. 2020;53(3):425-435. doi:10.1016/j.jmii.2020.04.015
  23. Inducible K, Natesh R, Schwager SLU, Sturrock ED. Crystal structure of the human angiotensin-converting enzyme – lisinopril complex. Nature. 2003;421(1995):551-554.
  24. Wan Y, Shang J, Graham R, Baric RS, Li F. Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus. J Virol. 2020;94(7):1-9. doi:10.1128/jvi.00127-20
  25. Ozdemir ES, Gursoy A, Keskin O. Analysis of single amino acid variations in singlet hot spots of protein–protein interfaces. Bioinformatics. 2018;34(17):i795-i801. doi:10.1093/bioinformatics/bty569
  26. Lan J, Ge J, Yu J, et al. Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor. Nature. 2020;581(7807):215-220. doi:10.1038/s41586-020-2180-5
  27. Waterhouse A, Bertoni M, Bienert S, et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 2018;46(W1):W296-W303. doi:10.1093/nar/gky427
  28. Guex N, Peitsch MC, Schwede T. Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: A historical perspective. Electrophoresis. 2009;30(S1):S162-S173. doi:10.1002/elps.200900140
  29. Units in Rosetta. https://www.rosettacommons.org/docs/latest/rosetta_basics/Units-in-Rosetta. Published 2020. Accessed October 10, 2020




Footer

Department of Biotechnology & Food Engineering
Technion – Israel Institute of Technology
Haifa 32000, Israel

  • igem2020.technion@gmail.com