TECHNICAL DETAILS
An in-depth view of the work done and the organisation of our code.
User Guide
A quick and easy guide to SOAP Lab
Technical Details
Inner workings of SOAP Lab
Background
Information on technologies used
Implementation
Description of our implementation
sbol parser
The SBOL Parser interprets and parses the assembly intent from genetic designs described using the SBOL standard (SBOL Version 2.3), and produces the appropriate data files for downstream script generation software. The SBOL Parser
primarily utilizes pySBOL2 for the processing of SBOL files, and Plateo to plan and simulate the set-up of the
laboratory environment for assembly protocols.
The assembly plan is inferred at the highest level of a hierarchical design - the root Component Definition. Each root Component Definition is treated as the construct to be assembled, and their corresponding Components are treated
as parts that make up the construct. Currently, the SBOL Parser assumes only one level of assembly, and therefore any Components containing more nested designs are assumed to already be fully assembled beforehand.
A major feature of the SBOL standard (SBOL Version 2.2 and later) is the ability to describe large combinatorial design spaces through the use of Combinatorial Derivations. Based on the enumeration feature supported in SBOL Designer,
the SBOL Parser is capable of expanding Combinatorial Derivations into individual Component Definitions describing each construct variant. This facilitates the construction of large genetic libraries and complex genetic designs,
making the experimental workflows such as Design of Experiments more tractable.
Each root Component Definition is distributed into the wells of construct plates, which are Plate objects provided by Plateo. The list of parts used across all assemblies is summarized and their corresponding Component Definitions
are then similarly distributed into part plates. For the purposes of the software pipeline we have developed, the Plateo constructs are converted into CSVs as downstream input to the script generation software.
Generating CSVs
generate_csv()
function. This is used to generate the input CSVs to downstream script generation software that produces the scripts for BASIC, GoldenGate (MoClo), and BioBricks assemblies on the Opentrons. The process of generating the CSVs is as follows:
- Get list of constructs from the SBOL Document (enumerating Combinatorial Derivations if necessary)
- (Optional) Remove constructs with repeated parts
- Take a random sample of constructs if the size of the list of constructs is greater than the desired number of constructs to be assembled
- Distribute constructs and parts into respective Plateo Plateo objects
- Create CSVs from Plateo Plate objects
sbol_parser_api.ParserSBOL.generate_csv(
assembly:str,
part_info: Dict[str, Dict[str, Union[str, int, float]]] = None,
repeat: bool = False,
max_construct_wells: int = 96,
num_runs: int = 1
)
Returns:
- Dict[str, List[str]]:Dictionary of constructs and parts/linkers paths. Keys: 'construct_path', 'part_path'
Raises:
- ValueError: If assembly is invalid.
Enumeration
The enumeration functionality is based on the Java implementation of the same functionality used in SBOL Designer, with minor changes to improve the human readibility of the Component Definition Display IDs generated from enumeration.
The purpose of enumeration is to expand the condensed SBOL representation of a combinatorial design space into the set of elements it comprises.
sbol_parser_api.ParserSBOL.enumerator(
derivation: CombinatorialDerivation
)
Parameters:
- derivation (CombinatorialDerivation): A Combinatorial Derivation to be enumerated. Enumeration is based on strategy assigned to the Combinatorial Derivation.
Returns:
- List[ComponentDefinition]: List of Component Definitions specifying the enumerated constructs
Filter
The purpose of the filter is to constrain the design space of assembly constructs based on user-defined parameter constraints. Currently, the filter is used to remove constructs that contain repeating parts that may lead to homologous
recombination and are therefore undesirable. Future development of the SBOL Parser will focus on an improved adaptive implementation of the filter with more tunable parameters. This will allow the SBOL Parser to be responsive to
upstream learning and modelling applications.
filter_constructs(
all_constructs: List[ComponentDefinition]
)
Parameters:
- all_constructs: List of constructs to filter
Returns:
- List[ComponentDefinition]: List of filtered constructs
Filling Plateo Plates
SBOL objects describing the parts or constructs to be assembled are stored in Plateo classes such as Wells and Plates. The objective was two-fold: to provide a standard description of labware and experimental set-up as an alternative
to unstandardized CSV inputs, as well as to pass Plateo objects directly to downstream applications without the need for an intermediary data format such as CSV or JSON. The current implementation of the SBOL Parser contains an
in-built Plateo parser to generate the requisite CSVs for downstream script generation software.
filter_plates(
all_content: List[ComponentDefinition]
content_name: str,
num_plate: int = None,
plate_class: plateo.Plate = None,
max_construct_wells: int = None,
part_info: Dict[str, Dict[str, Union[str,int,float]]] = None
)
Parameters:
- all_content (List[ComponentDefinition]): List of constructs or parts
- content_name (str): Type of well content ("construct" or "part")
- num_plate (int): Number of plates to be filled (default: 1)
- plate_class (plateo.Plate): Class of Plateo Plate (default: Plate96)
- max_construct_wells: Maximum number of filled wells on each plate
- part_info: Dictionary of parts their associated user-defined information
Returns:
- list: List of Plateo plates
Raises:
- ValueError: If parameters given are not feasible to carry out
SBOL Parser API
Refer to our API reference for the SBOL Parser for the full documentation.
AUTOMATED SCRIPT GENERATION
To produce executable Opentrons OT-2 version 2 scripts, the output of the SBOL parser was taken as input to python files responsible for generating assembly-specific scripts. These scripts were created for BASIC, MoClo, and BioBrick assembly
methods. Previous software tools such as DNABot and the DAMP Lab's MoClo workflow were adopted and
updated to work Opentrons OT-2 v2. The new scripts were also designed to accommodate more user defined parameters, such as different labware, which could be specified through the web-app, without any user interaction with code. In addition,
detailed metainformation was added as an output to the assembly specific python files, informing the user of well positions and intermediate steps in a readable format.
For easy integration into the front end or other tools, the entire script generation procedure of an assembly method can be run by calling a single function. The script generation function takes in constructs as a file of comma separated values
(a CSV), and one additional CSVs of parts, or parts and linkers in the case of BASIC assembly. Additional input parameters include the folder to save the scripts in, and the labware, pipettes, and modules the user intends to use. The module
option was added after watching the DAMP Lab presentation at the International Workshop of Biodesign Automation, in which they discussed the thermocycler module introduced by Opentrons and suggested that Opentrons protocols be adapted
to include it. The importance of being able to choose labware and pipettes was emphasised by our wet lab team when testing on an Opentrons, discovering that we were missing some of the labware and pipettes we needed, and that one the pipette
mounts was broken.
After taking in the inputs, the script generation function uses the information on the parts and constructs files to specify intermediary products, required reagents, and additional information the user may want.
After these processes are complete, the protocols and metainformation can be created. The protocols are created by inserting transfer instructions and parameters such as labware at the top of a template script. The template scripts themselves
only use the Opentrons OT-2 v2 API, and no other modules, to ensure that they can be run without requiring additional downloads, so that we satisfied our accessibility requirement. Unfortunately, this severely limits the amount of tracking
and debugging we are able to do for the user. The metainformation is stored as a CSV and is entirely meant for the user. We chose to provide very detailed metainformation as a compromise between accessibility and user confidence - we wanted
to show the users the steps we had planned and where each sample would be positioned, without using any additional modules in our Opentrons protocols.
BioBricks
- Digests:
biobricks() must determine the digests needed to be created as intermediary steps between parts and constructs.
Each part must be mapped to a digest, and each digest must be mapped to a construct or constructs. - Master mixes:
Three master mixes must be defined, for use in upstream digests, downstream digests, and plasmid digests. Per digest, each master mix has 1 μL of one enzyme, 1 μL of another enzyme, and 5μL of NEB Buffer 10X. The enzymes are EcoRI and SpeI for the upstream master mix, XbaI and PstI for the downstream master mix, and EcoRI and PstI for the plasmid master mix. Enough must be made so that there is 7 μL per digest plus two digests worth of dead volume (14 μL) - Reagents:
The reagents needed and their volumes and wells are specified in tandem with defining the master mixes. The reagents needed are upstream master mix, downstream master mix, plasmid master mix, T4 Ligase Buffer 10X, and T4 Ligase. - Transformation reactions:
The function must determine the transformation reactions needed to occur, combining competent cells with assemblies and control cells with water. - Protocols:
Information must be inserted into templates, producing protocols. This includes dictionaries of transfers, labware information, and other parameters such as whether to use the thermocycle module. - Metainformation:
Detailed information is saved in a CSV format.
MoClo
MoClo script generation calls the function moclo_function() and returns five output paths (or one if there is an error).
Script generation for MoClo has several requirements:
- Reagents and master mixes
moclo_function() must define master mixes and reagents, the reagents being ligase, restriction enzyme (e.g. BsaI or BpiI), buffer, and water. Water is stored in the first well of the trough, and given a required volume of 1.5 mL as this is guaranteed to be both significantly above the actual required volume and below the maximum volume per well (2.2 mL). The volumes of ligase, restriction enzyme, and buffer are dependent on which master mixes need to be created, and reagent wells are mapped to master mix wells. More than one master mix for the same number of parts per assembly if often needed, as there is not sufficient master mix per well to supply more than a few assemblies. More than one master mix for the same number of parts per assembly if often needed, as there is not sufficient master mix per well to supply more than a few assemblies. - Master mixes
Three master mixes must be defined, for use in upstream digests, downstream digests, and plasmid digests. Per digest, each master mix has 1 μL of one enzyme, 1 μL of another enzyme, and 5μL of NEB Buffer 10X. The enzymes are EcoRI and SpeI for the upstream master mix, XbaI and PstI for the downstream master mix, and EcoRI and PstI for the plasmid master mix. Enough must be made so that there is 7 μL per digest plus two digests worth of dead volume (14 μL) - Agar plate
The locations of each assembly's transformation on an agar plate must be specified, and saved as a CSV. - Protocols
Information must be inserted into templates, producing protocols. This includes dictionaries of transfers, labware information, and other parameters such as whether to use the thermocycle module. - Metainformation
Detailed information is saved in a CSV format.
Future plans
General
- Add optional real-time error tracking and prediction to our Opentrons protocols
- Integrate Plateo, a tool by the Edinburgh Genome Foundry that allows tracking of plates and wells
BioBricks
- Add agar plating to our transformation protocol
- Enable multi-level constructs in one run
- Add alternative BioBricks standards, such as the Silver and Gold standards
MoClo (GoldenGate)
- Enable multi-level constructs in one run, embracing the true potential of MoClo
- Implement variations of MoClo such as CIDAR MoClo - a more efficient version of MoClo
BASIC
- Incorporate DNA methylation, a way of preventing digestion of linkers and thus enabling more complex combinatorial design
frontend
To incorporate SBOL Designer into our frontend, we used the WebSwing Framework, which allowed us to run multiple sessions of SBOL Designer from the frontend. Further, we also extended SBOL Designer to swap the look and feel of it to make it
more modern and similar to Material UI.
As part of extending the SBOL designer we tried to have it directly interface with our frontend, through JSLink, this did work during testing, however, due to the concurrency of Webswing, which
also used JSLink, and SBOL Designer. For this reason their interaction became unstable and unreliable causing the WebSwing server to crash as well as cause errors due to the JSObject from JSLink being loaded by different class loaders.
We are working to fix this bug in the framework and SBOL Designer interaction, however, as we were not able to fix it in time, we decided to have SBOL Designer manage its input and output through Webswing independently of the front end.
We
included the functionality of SBOL Validator by interfacing with the SBOL Validator API and making a user friendly interface for people to use the powerful functionality of the API. Our interface allows to check the compliance
of an SBOL 2.0 file with the rules of the standard, allowing to check the URI compliance, the best practices and giving exhaustive error messages for people to edit and improve their external file.
Our choices to have functional and
intuitive interfaces for our users made us adopt the material UI library as well as its principles. We collaborated together in the react framework by assigning components to each other and interfacing them, with an attempt to make the
file organisation as intuitive and easy to maintain as possible.
Software development
We utilised GitHub as our version control provider, as it is easy to use and access, and also had the added benefit of its
GitHub Actions Feature. This feature we utilised by creating a continuous integration pipeline for the backend that would run our tests in our backend repo on every commit. Furthermore, we had continuous deployment of our frontend through
Heroku.
SBOL Designer and SBOL Validator
We incorporated SBOL Designer, since it was one of the more powerful programs for the creation and manipulation of genetic contructs in the SBOL
format. It uses SBOL Visual symbols and is able to integrate online repositories such as SynBioHub and the igem part registry.
SBOLDesigner is released under the Apache 2.0 License.
SBOL Validator enables us to check file validity with respect to SBOL compliance, and allows for data conversion. SBOL Validator/Converter is also made freely available under the Apache 2.0 license
backend
Django Backend
We choose to build our backend using the python framework Django because of team members experience working with this framework as well as the potential integration possibilities such as pySBOL2. For ease
of connection to the frontend react app we combined our endpoint together using graphene, a python library to integrate GraphQL technologies. For more detailed information please visit our GitHub (submodule
link of our page)