Harvard College iGEM MOTbox

MOTbox

Our student team members this year are Frank D’Agostino, Rahul Subramaniam, Robert Shekoyan, Maya Razmi, Siva Muthupalaniappan and David Cao. Our student team leaders are Zehan Zhou and Aaron Hodges. All team members are undergraduate students at Harvard College. Our mentors/instructors are Anastasia Ershova and Olivia Young, both graduate students in Dr. William Shih's lab at the Harvard Wyss Institute for Biologically Inspired Engineering and Harvard Medical School. Our primary investigators (PIs) are Dr. Jia Liu at the Harvard John A. Paulson School of Engineering and Applied Sciences (Harvard SEAS), and Dr. Alain Viel in the Molecular and Cellular Biology department of the Harvard Faculty of Arts and Sciences (Harvard FAS).

Abstract

MOTbox is a COVID-19 therapeutic that couples machine learning and DNA origami to design an optimized anti-SARS-CoV-2 antibody and deliver its mRNA sequence to immune cells in infected patients. It is intended to serve as an interim treatment in a pandemic scenario that can be manufactured cheaply and quickly with limited lab access while a vaccine is developed. Using ensemble machine learning and differential evolution algorithms, we optimized anti-SARS-CoV-2 antibody sequences to enhance binding affinity and therapeutic potential. We designed and computationally validated a novel DNA origami nanostructure to selectively deliver the optimized antibody sequences to immune cells for rapid antibody production in vivo. The high potency of the optimized antibodies and the specificity of DNA origami delivery reduce the minimum therapeutic dose, also reducing treatment cost. Our work is a proof-of-concept of a rapid, cost-effective antibody treatment for COVID-19 that can also be extended to treating other emerging diseases.

COVID-19 and Pandemic-Era Research

Due to our own college’s restrictions and our lack of lab access, we were inspired to create a project that allowed other researchers to address the pandemic with only minimal access to labs. In particular, we wanted to create a technology that could help treat the COVID-19 pandemic but could also be generalized to other health crises with minimal wetlab work. Our goal was (and still is) to design our project to be as computationally driven as possible, so minimal wetlab experimentation would be needed to produce a finished treatment. This was the catalyst for our project idea.

After conducting literature reviews of treatments being developed for COVID-19, the concept of antibody therapy caught our eye because of its simplicity and potential. In short, antibody therapy involves the transfusion of antibodies against SARS-CoV-2 into infected patients. The antibody transfusions not only neutralize the virus and slow its rate of spread but also label virions for the immune system to destroy. Reviews by Abraham and Marovich et al. helped convince us of the potential that antibody treatments had in improving health outcomes in infected patients. However, there were a few key challenges to antibody treatment that our project needed to address in order to be viable: identification of therapeutic antibodies, the time-intensive process of antibody expression and purification, and the overall cost of antibody production.

Proposed Implementation

Our iGEM project MOTbox aims to establish a novel therapeutic application of machine learning and DNA origami for the efficient, scalable, and lasting treatment of COVID-19. In a healthcare landscape, given expiated regulatory hurdles and pre-established efficacy of the mode of treatment, we envision that our primary beneficiaries will be COVID patients and the medical professionals responsible for delivering treatment. The machine learning component of our project encompasses principles of precision medicine and is designed to optimize host immunity against the virus, providing a fast-tracked path towards active immunity. In addition, MOTbox provides an advantage over traditional antibody regimens in that a smaller dosage of antibody may be required for efficient delivery, over nonspecific circulation of free antibody. All of these qualities will be beneficial to frontline health workers administering rapid point-of-care treatment, as well as help expedite the recovery of those currently suffering from COVID infections.

Looking to the Future

After our partnership with UIUC, we saw a lot of potential for an effective therapeutic that could be rapidly and cheaply available to everyone. Since UIUC is focusing on the mutations of SARS-CoV-2 over time, their model is extremely dynamic. As such, this means that their model is generalizable to other diseases that mutate quickly, such as influenza or HIV. While talking, both our team and UIUC discussed the potential of doctor’s have an ensemble of DNA origami therapeutics on site, each with a different mRNA sequence for a different strain of a virus. Then, the doctor would be easily able to treat the patient depending on what strain of the virus they have, rather than having to get new treatment for the strain. Since mRNA sequences are cheap to produce, this would be a cheap and fast alternative to current methods. Additionally, UIUC’s algorithm would allow us to create many possible designs, and allow doctors to use ones they think will work the best. If a certain one is not effective, then we can reiterate and further improve the sequences and DNA origami nanostructure. As we transition into Phase 2, we are working to experimentally validate antibodies both of our teams produced. We are also looking to validate each other’s sequences with our models. This will allow us to better assess and understand the strengths and weaknesses of our model. We are extremely excited, and working with UIUC thus far has been an amazing experience. Hopefully we’ll be able to get lunch in person next summer!

Overview

In the wake of the pandemic, we wanted to build a robust, scalable, and inexpensive platform that would help researchers in antibody design and validation. As such, we wanted to shift the focus of antibody design from a lab validation problem to a computational optimization problem. By incorporating quantitative techniques to identify and extract patterns from previous lab-validated data, we seek to provide a tool for researchers to efficiently evaluate candidate sequences and make sure that their limited lab resources are used as effectively as possible.

Our project addresses all of these challenges by identifying a therapeutic antibody in silico and then delivering the mRNA sequence for this antibody to a patient’s own B cells, where the antibody can be translated and produced as a treatment. We chose to use machine learning (specifically ensemble learning) to identify a novel therapeutic antibody because of the power that machine learning has to improve on existing antibodies, as described in a review by Graves et al. Our goal was to create a machine learning algorithm that could iterate on existing antibodies and biochemically determine an antibody sequence that would be likely to neutralize the SARS-CoV-2 virus. We also needed to identify a delivery method that could deliver the mRNA sequence for our novel antibody to B cells in vivo. We specifically wanted to deliver the mRNA sequence instead of the fully translated antibody because nucleic acids (such as mRNA) are often significantly cheaper and less labor-intensive to produce and purify compared to full proteins. We eventually decided to use DNA origami as the delivery method for our antibody sequence. DNA origami is a method for constructing interactive nanostructures out of DNA. In short, DNA origami involves folding a long single-stranded piece of DNA known as the “scaffold” into a particular shape, with shorter DNA strands known as “staples” being used to fix the scaffold in the desired shape. There are a wide variety of computational tools that can be used to design and simulate DNA origami nanostructures, and our goal was to employ these tools to design and validate a DNA origami structure that could deliver our antibody sequence to its intended target.

The design process for our DNA origami delivery vehicle was driven by two main requirements: (a) the structure should carry out the desired programmable behaviors and (b) the structure should be able to fit several antibody mRNA molecules inside so each vehicle can deliver multiple payloads.

In terms of programmable behaviors, we needed our structure to be able to respond to two different stimuli:

Bind only to B cell receptors and no other cells.
Disassemble automatically when taken up by the cell and release the mRNA payload.

We addressed each of these requirements using the framework and constraints of DNA origami and nanotechnology.

Antibody Design and Pipeline

Using data from CoV-AbDab, a coronavirus antibody database by the Oxford Protein Informatics Group in conjunction with some accessory databases, we obtained a set of training and testing data that scales with the current data available. We bootstrapped this to the R package Interpol, which gave us information about 531 different biochemical properties of our sequences in the form of a numerical vector. After drastically reducing dimensionality of the data with a PCA-based analysis, we trained a random-forest regressor to assign a score to any given CDR-H3 sequence. This score was meant to represent the capability of an antibody with the selected CDR-H3 sequence to effectively bind and neutralize SARS-CoV-2.

It has been shown in the literature that a stacked-model approach, where the results of one model are continuously fed into another model, performs extremely well in optimization scenarios with latent (hidden) variables. We implemented a stacked-model approach to optimize a set of candidate antibody sequences to bind and neutralize SARS-CoV-2. This was accomplished by running a Differential Evolution simulation (the ‘optimization model’) with our random forest regressor (the ‘scoring model’) as the objective function to optimize. The differential evolution simulation starts with a “population” of seed sequences and simulates directed evolution by iteratively inducing mutations in the sequences and selecting for the ones which improve predicted binding/neutralization. Over many simulated epochs, the fitness of the population increases until convergence is reached.

DNA Origami Design

In order to transport our antibody mRNA sequence to B cells for translation, we designed a DNA origami structure from scratch that could package the mRNA and protect it in its journey through the bloodstream. We decided to create a large box-shaped structure composed of two identical C-shaped subunits that bind to each other to form the full box structure. When the box enters the acidic cellular endosome, it automatically disassembles because of pH-responsive DNA i-motif strands that hold the box together.

We used caDNAno2 to design our structure and generate staple strands. We also identified staple strands that could be used to anchor the mRNA payload inside the box based on the turning and position of the DNA helix (red and yellow strands in image at left).

Antibody Design Results

By performing a PCA analysis and also qualitatively observing which biochemical properties are most relevant to our goals, we were able to reduce the 531-dimensional AAIndex data given by Interpol into a feature vector of only 9 dimensions. This alone was a notable success, as it made it feasible (from a computational complexity standpoint) to incorporate properties such as hydropathy, free energy, and conformational accessibility into our scoring model.

Remarkably, this dimensionality reduction combined with our differential evolution algorithm allowed us to directly optimize our sequences in a manner efficient enough to produce visible results in a reasonable time on modest hardware. This means that our platform is accessible to anyone wishing to do a similar optimization task, not just those who have access to a supercomputer.

This is a visualization of our model’s sequence optimization over 500 epochs, with seed sequences taken from the CoV-AbDab database. All the seed sequences were experimentally shown to have some capability in binding and neutralizing SARS-CoV-2. We see that over many iterations, the differential evolution algorithm was able to consistently and meaningfully optimize seed sequences in a computationally efficient manner.

DNA Origami Results

We conducted three different types of simulations to test and validate our DNA origami structure. The first simulation was a Brownian motion/diffusion simulation, which was intended to determine the probability that a harmful nuclease enzyme enters the structure through one of its cutouts and damages the payload. We found that only 40% of the time does the nuclease enter within one timestep in the simulation, and we believe that this percentage is an overestimate due to certain simplifying assumptions made in our simulation.

We then conducted a finite element analysis of our structure, which determines its mechanical rigidity using empirical material properties of DNA. The simulation treats the DNA structure as a uniform entity that fluctuates randomly due to thermal energy. We used the open-source webserver CanDo to run our simulations. After each simulation, we iteratively improved our staple design to reduce the deformation of the structure. After 13 iterations, we surpassed our goal of reducing deformation by 33%. The image above shows selected iterations and the max deformation for each.

Finally, we conducted a molecular dynamics simulation of our structure, which simulates the electrostatic forces and potentials on our structure at discrete points in time on the near-molecular level. This simulation allows us to gain a finer understanding of the stability of our structure and tells us how stable the structure would be in a physiological environment. We ran our molecular dynamics simulation using the open-source webserver oxDNA.org and our simulation parameters mimicked conditions normally found in blood. We found that at body temperature, our structure is not very stable. In particular, many of the short staple strands in the structure fall off due to thermal fluctuations, leading the structure to unravel as seen in the animation above. We are currently working on lengthening the staples in our structure and modifying its design to improve its stability in physiological conditions.

Future Directions

We recognize that experimental validation is critical for any project. To that end, we have listed a few of the experiments we are currently planning for validation in the lab. There are three key steps to our planned experiments, each of which will be done with lab safety protocol in mind.

1. Antibody Validation

In order to validate the mRNA sequence, we must first show that it produces a viable protein. Thus, we plan to conduct an In Vitro Translation (IVT) Assay that translates mRNA sequences in vitro. Our optimized machine-learning sequence will be compared against a literature sequence for viability. We will then test the efficacy of that protein through an enzyme-linked immunosorbent assay (ELISA). Given positive binding results, we plan to then quantify binding affinities (binding curve, and estimated dissociation constant) between antibody candidates and spike protein using the direct ligand-receptor interaction ELISA method detailed by Syedbasha et. al., 2016.

2. DNA Origami Assembly and Validation

To experimentally verify the MOT Box design, we first plan to test assembly of the design by running a series of folding assays at varying temperatures and magnesium concentrations (Engelhardt et al. 2019). We will also conduct reactions at a gradient of folding temperatures. We will conduct gel electrophoresis to validate size of folding objects (with 2% agarose gel loaded with the DNA). Next, in order to validate structure, we will use negative-staining Transmission Electron Microscopy (TEM). This will involve adsorption of the origami objects onto a TEM grid, prepared for TEM by coating and evaporation of a collodion plastic film, followed by staining of the adsorbed objects with 2% aqueous uranyl-formate and imaging with the electron microscope.

3. Antibody Sequence Delivery to B Cell Cultures

After validating the structure itself, we plan to test release mechanisms by conducting a series of assays that will test the pH-triggered i-motif hinge as well as the temperature sensitive mRNA handle (Daljit Sing et al. 2019). Finally, we plan to test the entire system in vitro utilizing Ramos Blue Cells.

Human Practices

In creating any therapeutic device, we needed to answer a few critical questions. In particular, we needed to know how people perceive synthetic biology in order to determine whether or not they would ever use a DNA origami vaccine. Our team sent out a survey to 205 people through Survey Monkey Audience, asking them about their background knowledge of synthetic biology, and whether or not they would consider utilizing a treatment that involved it.

After conducting this survey, we were able to determine that our therapeutic was something that individuals would be likely to use, so we moved forward with our design. Since some individuals voiced that they would not be entirely comfortable using a treatment derived in part from machine learning, we will be sure to validate our experimental design in the lab in the future.

Some participants expressed concerns with utilizing DNA as a delivery system.The idea of using DNA to delivery things in the body can seem strange. A typical education teaches you that DNA simply stores the genetic information of the cell, and so it can be hard to visualize how an abstract concept such as DNA origami could play out. In order to alleviate this, we wish to send informational newsletters to people to teach them more about DNA origami and machine learning. Additionally, because of this comment, we worked especially hard to research the effects of DNA origami, and the literature strongly agrees that DNA origami structures in the body are harmless and are passed easily.

Additionally, the use of machine learning is still a new concept to many people, and AI is a widely debated topic. To alleviate concerns, we wrote a small ethical statement about how ML should be used in the context of medical discovery/optimization/implementation, how it can be abused, and how we can regulate it so that it cannot be abused on our Wiki.

Attributions

We would like to thank our amazing mentors, Ms. Ershova and Ms. Young, for readily providing feedback and guidance in finding project design and simulation resources over the course of the project. Additionally, we would like to thank our PIs, Drs. Liu and Viel for providing regular feedback and assistance in choosing and honing a project idea.

We'd like to thank our team’s student board members (Zehan Zhou, Aaron Hodges, Teagan Steadman, Ralph Estanboulieh and Joshua Lui) created lectures about synthetic biology and presented a lecture at each regular meeting. We would also like to thank Dr. Melissa Hancock of the Harvard SEAS Active Learning Labs, who gave us basic training in basic synthetic biology wetlab techniques, including PCR and gel electrophoresis.

We would like to thank Harvard SEAS for graciously funding our team’s research this year. We would also like to thank our team coordinator, Ms. Jessa Piaia for coordinating funding with the Harvard SEAS administration and managing our team’s funds.

We would like to thank our NEGEM volunteer judges, Dr. Nikki Thadani, Dr. Nikhil Gopalkrishnan, Mr. Chris Wintersinger and Ms. Anastasia Ershova for listening to and providing written feedback on teams’ presentations. We would also like to thank UIUC iGEM, Cornell iGEM, MIT iGEM and Purdue iGEM for attending the NEGEM conference and presenting their work to be judged. In terms of project presentation and iGEM deliverables, Frank D’Agostino created our Project Promotion Video and Project Description video. We would like to thank Mr. Casey Cann from the Harvard Derek Bok Center for Teaching and Learning for meeting with us and volunteering assistance and feedback in video editing. Our team wiki content and coding as well as the judging form were completed by all of the student team members as well as the student team leaders. Finally, we would also like to thank Mr. Adam Zewe from Harvard SEAS for interviewing us about our project and publishing an article about our work in the Harvard Inside SEAS newsletter as well as the Harvard Crimson publication.

Selected References

Abraham, J. (2020). Passive antibody therapy in COVID-19. Nature Reviews Immunology, 20(7), 401–403. https://doi.org/10.1038/s41577-020-0365-7
Adamson, A. S., & Smith, A. (2018). Machine Learning and Health Care Disparities in Dermatology. JAMA Dermatology, 154(11), 1247. https://doi.org/10.1001/jamadermatol.2018.2348
Anastassacos, F. (2019). Towards the therapeutic application of DNA origami [Doctoral dissertation]. http://nrs.harvard.edu/urn-3:HUL.InstRepos:42013166
Bujold, K. E., Hsu, J. C. C., & Sleiman, H. F. (2016). Optimized DNA “Nanosuitcases” for Encapsulation and Conditional Release of siRNA. Journal of the American Chemical Society, 138(42), 14030–14038. https://doi.org/10.1021/jacs.6b08369
Galitsky, B. A., Gelfand, I. M., & Kister, A. E. (1998). Predicting amino acid sequences of the antibody human VH chains from its first several residues. Proceedings of the National Academy of Sciences, 95(9), 5193–5198. https://doi.org/10.1073/pnas.95.9.5193
Hein, A., Cole, C., & Valafar, H. (2020). An Investigation in Optimal Encoding of Protein Primary Sequence for Structure Prediction by Artificial Neural Networks. ArcXiv.org.
Vayena, E., Blasimme, A., & Cohen, I. G. (2018). Machine learning in medicine: Addressing ethical challenges. PLOS Medicine, 15(11). https://doi.org/10.1371/journal.pmed.1002689
Zhang, S., Jiang, H., Xu, M., Hou, J., & Dai, L. (2015). A Fixed-Size Encoding Method for Variable-Length Sequences with its Application to Neural Network Language Models. ArcXiv.org.

Team:Harvard/Poster

Harvard College iGEM MOTbox

Abstract

Looking to the Future

Selected References