This page describes the motivations behind our decisions to use machine learning and DNA origami as the central techniques for our project.
In the wake of the pandemic, we wanted to build a robust, scalable, and inexpensive platform that would help researchers in antibody design and validation. As such, we wanted to shift the focus of antibody design from a lab validation problem to a computational optimization problem. By incorporating quantitative techniques to identify and extract patterns from previous lab-validated data, we seek to provide a tool for researchers to efficiently evaluate candidate sequences and make sure that their limited lab resources are used as effectively as possible.
Machine Learning (ML) techniques are the current gold standard for these sorts of extensive computational analyses, which is why we gravitated towards them for our project. In addition, the current scientific literature indicates that machine learning techniques can be an extremely powerful tool for antibody design and validation, and remarkable progress has been made in this area over the past few years. The review paper A Review of Deep Learning Methods for Antibodies highlights many of these discoveries, and was a key source of inspiration for our project.
The term “Machine Learning” describes a wide variety of computer algorithms that iteratively “learn” patterns in input data in order to make predictions about unknown test data. In the context of our project, this works in four key steps:
Training: We give the system input data in the form of antibody sequences that have varying capabilities in binding and neutralizing SARS CoV-2. We also train it on accessory databases that link an antibody’s sequence to its essential biochemical properties. This essentially “teaches” the system about what sorts of antibodies tend to be successful against SARS CoV-2 and which are not.
Validation: We cross-reference the model’s scores with existing lab-validated data to ensure that our system can correctly apply its learned patterns to unknown sequences.
Prediction: We use our validated model to generate predictions about how well a given antibody sequence will perform against SARS CoV-2.
Optimization: Now that we have a way to generate a “score” for each sequence, we optimize a given set of sequences by applying targeted mutations to improve SARS CoV-2 binding and neutralization. We use a differential evolution (DE) approach to accomplish this.
With the ever-increasing breadth of resources regarding computationally driven antibody design available to us, we were excited to implement all of the above steps in a complete pipeline and deliver a proof of concept result. The following sections will expand on each of the above steps and provide some additional insight into our design and implementation process.
DNA origami is a burgeoning technology that uses DNA as a building material to construct nanostructures that can be easily assembled and programmed. In our specific project, we use DNA origami to deliver an antibody sequence to a specific type of cell in the body. There are a few other technologies that accomplish similar purposes; here is a list of them and why we deemed each of them unsuitable for our project:
Liposomes: Liposomes are small vesicles made out of long-chain phospholipids arranged in a spherical form. They are easily assembled and can encapsulate and transport a wide variety of cargos, ranging from nucleic acids to drug molecules to entire proteins. Liposomes were our primary alternative to DNA origami as a delivery method for our antibody mRNA sequence. However, we decided against using liposomes because of our need to specifically deliver our antibody mRNA sequence to B cells - while liposomes can be engineered to display proteins that would allow for cell-specific delivery, the process for doing so is less precise and not as easily controlled as it is for DNA origami.
Viral Vectors: Viruses, especially adenoviruses and lentiviruses, have been used as a means of delivering nucleic acids for decades. Viruses self-assemble themselves, replicate easily and deliver genetic material with high efficiency, making them tempting for delivering our antibody mRNA. However, there are several drawbacks to viral vectors. Firstly, the potential for viral material integrating into the host cell’s genome and causing disease is still a major risk factor that has not been fully addressed even today. Secondly, the process of generating viruses with custom genetic material is very expensive and not easily conducted in an undergraduate lab. Finally, there already exist viruses that selectively infect B cells (such as the Epstein-Barr virus) but most of these viruses are pathogenic and likely too hazardous to handle in an undergraduate lab.
Metal/silica nanoparticles: These are nanoscale particles made of either various metals (gold, silver, iron, etc.), polymers (PLGA) or other substrates (silica, carbon, etc.). Nucleic acids can be attached to the surfaces of these nanoparticles via conjugation chemistry and they can be injected into the bloodstream to deliver genetic material to cells. These nanoparticles are durable and not as easily degraded or destroyed as liposomes or viruses, making them quite efficient at delivering their cargo. However, specifically targeting these nanoparticles to certain cell types is an ongoing challenge, and some types of nanoparticles are known to trigger immune reactions unless coated with another biocompatible polymer.
Further, there are a number of advantages to using DNA origami nanostructures that make them suitable for our specific application:
Ease in self-assembly: DNA origami relies on the sequence specificity in interactions between DNA molecules. If two DNA strands do not have sufficiently complementary sequences, they will not bind to each other. When a DNA origami structure is assembled, this assembly relies on the sequence-based binding between the scaffold strand (the long ssDNA strand that serves as the primary building material) and the staple strands (the short ssDNA strands that hold the scaffold in a particular shape). Because this binding is completely sequence-specific, a properly designed DNA origami structure will assemble itself automatically into its final shape when the scaffold and staples are mixed since the assembly solely requires the complementary sequences to bind to each other. This assembly process can be done over several hours in a test tube with nothing more than a thermal cycler, and assembly can be verified easily using transmission electron microscopy (TEM).
Programmability: Because the interactions between DNA strands are highly predictable, it is easy to design a DNA origami structure that reacts to environmental conditions in a specific fashion. These environmental conditions could range from temperature to pH to proteins to even other nucleic acid sequences. In the case of our project, we want our DNA origami structure to exhibit two distinct programmable behaviors. Firstly, we want it to specifically bind to B cells (and no other cells) and be taken up by these cells to minimize off-target effects and maximize payload delivery. Secondly, we want the structure to release its antibody mRNA payload only once it has entered the cell. Both of these conditional statements can be easily accomplished with DNA origami due to its highly predictable nature and would be quite difficult to accomplish using the alternative techniques listed above.
Low cost: Compared to other polymers or delivery vehicles, nucleic acids are much cheaper and often more easily synthesized, both using chemistry (oligonucleotide synthesis) and using biological enzymes (various DNA/RNA polymerases). This makes a DNA-based delivery vehicle considerably cheaper than the alternatives listed above.
Biocompatibility: Because of the prevalence of DNA in the body, a DNA origami delivery vehicle would be more easily tolerated by the body than some of the alternative delivery methods listed above. DNA is also easily degraded and eliminated by the liver and kidneys with no toxic or harmful byproducts. Finally, DNA origami has been shown to be non-immunogenic in mouse models (Shih lab, unpublished work).
For these reasons listed above, we selected a DNA origami delivery vehicle as the means for delivering our computationally determined antibody sequence to B cells. In the following pages, the DNA origami subheading will describe the work done to design and simulate the delivery vehicle.