Team:Imperial College/Software

TECHNICAL DETAILS

An in-depth view of the work done and the organisation of our code.

User Guide

A quick and easy guide to SOAP Lab

Technical Details

Inner workings of SOAP Lab

Background

Information on technologies used

Implementation

Description of our implementation

sbol parser

The SBOL Parser interprets and parses the assembly intent from genetic designs described using the SBOL standard (SBOL Version 2.3), and produces the appropriate data files for downstream script generation software. The SBOL Parser primarily utilizes pySBOL2 for the processing of SBOL files, and Plateo to plan and simulate the set-up of the laboratory environment for assembly protocols.

The assembly plan is inferred at the highest level of a hierarchical design - the root Component Definition. Each root Component Definition is treated as the construct to be assembled, and their corresponding Components are treated as parts that make up the construct. Currently, the SBOL Parser assumes only one level of assembly, and therefore any Components containing more nested designs are assumed to already be fully assembled beforehand.

A major feature of the SBOL standard (SBOL Version 2.2 and later) is the ability to describe large combinatorial design spaces through the use of Combinatorial Derivations. Based on the enumeration feature supported in SBOL Designer, the SBOL Parser is capable of expanding Combinatorial Derivations into individual Component Definitions describing each construct variant. This facilitates the construction of large genetic libraries and complex genetic designs, making the experimental workflows such as Design of Experiments more tractable.

Each root Component Definition is distributed into the wells of construct plates, which are Plate objects provided by Plateo. The list of parts used across all assemblies is summarized and their corresponding Component Definitions are then similarly distributed into part plates. For the purposes of the software pipeline we have developed, the Plateo constructs are converted into CSVs as downstream input to the script generation software.


Generating CSVs

The main workhorse of the SBOL Parser is the generate_csv() function. This is used to generate the input CSVs to downstream script generation software that produces the scripts for BASIC, GoldenGate (MoClo), and BioBricks assemblies on the Opentrons. The process of generating the CSVs is as follows:
  1. Get list of constructs from the SBOL Document (enumerating Combinatorial Derivations if necessary)
  2. (Optional) Remove constructs with repeated parts
  3. Take a random sample of constructs if the size of the list of constructs is greater than the desired number of constructs to be assembled
  4. Distribute constructs and parts into respective Plateo Plateo objects
  5. Create CSVs from Plateo Plate objects
sbol_parser_api.ParserSBOL.generate_csv(
     assembly:str,
     part_info: Dict[str, Dict[str, Union[str, int, float]]] = None,
     repeat: bool = False,
     max_construct_wells: int = 96,
     num_runs: int = 1
)

Parameters:

  • assembly (str): Assembly type. Currently accepts the values "basic", "moclo", and "bio_bricks" 
  • part_info (Dict[str, Dict[str, Union[str, int, float]]]) : Dictionary of information regarding parts to be assembled. If no information is provided, the default value of concentration is 0, and the plates and wells are automatically assigned. Structure: {(display ID): {'concentration':..., 'plate':..., 'well':...}}
  •  repeat (bool): If False, removes constructs that contain repeated components. (default: False)
  • max_construct_wells (int): Number of wells to be filled in the constructs plate. (default: 96)
  •  num_runs (int) : Number of runs (i.e. construct plates) to be created. (default: 1)

Returns:

  • Dict[str, List[str]]:Dictionary of constructs and parts/linkers paths. Keys: 'construct_path', 'part_path'

Raises:

  • ValueError: If assembly is invalid.

Enumeration

The enumeration functionality is based on the Java implementation of the same functionality used in SBOL Designer, with minor changes to improve the human readibility of the Component Definition Display IDs generated from enumeration. The purpose of enumeration is to expand the condensed SBOL representation of a combinatorial design space into the set of elements it comprises.

sbol_parser_api.ParserSBOL.enumerator(
     derivation: CombinatorialDerivation
)

Parameters:

  • derivation (CombinatorialDerivation): A Combinatorial Derivation to be enumerated. Enumeration is based on strategy assigned to the Combinatorial Derivation.

Returns:

  • List[ComponentDefinition]: List of Component Definitions specifying the enumerated constructs

Filter

The purpose of the filter is to constrain the design space of assembly constructs based on user-defined parameter constraints. Currently, the filter is used to remove constructs that contain repeating parts that may lead to homologous recombination and are therefore undesirable. Future development of the SBOL Parser will focus on an improved adaptive implementation of the filter with more tunable parameters. This will allow the SBOL Parser to be responsive to upstream learning and modelling applications.


filter_constructs(
     all_constructs: List[ComponentDefinition]
)

Parameters:

  • all_constructs: List of constructs to filter


Returns:

  • List[ComponentDefinition]: List of filtered constructs

Filling Plateo Plates

SBOL objects describing the parts or constructs to be assembled are stored in Plateo classes such as Wells and Plates. The objective was two-fold: to provide a standard description of labware and experimental set-up as an alternative to unstandardized CSV inputs, as well as to pass Plateo objects directly to downstream applications without the need for an intermediary data format such as CSV or JSON. The current implementation of the SBOL Parser contains an in-built Plateo parser to generate the requisite CSVs for downstream script generation software.

filter_plates(
     all_content: List[ComponentDefinition]
     content_name: str,
     num_plate: int = None,
     plate_class: plateo.Plate = None,
     max_construct_wells: int = None,
     part_info: Dict[str, Dict[str, Union[str,int,float]]] = None
)

Parameters:

  • all_content (List[ComponentDefinition]): List of constructs or parts
  • content_name (str): Type of well content ("construct" or "part")
  • num_plate (int): Number of plates to be filled (default: 1)
  • plate_class (plateo.Plate): Class of Plateo Plate (default: Plate96)
  • max_construct_wells: Maximum number of filled wells on each plate
  • part_info: Dictionary of parts their associated user-defined information

Returns:

  • list: List of Plateo plates

Raises:

  • ValueError: If parameters given are not feasible to carry out

SBOL Parser API

Refer to our API reference for the SBOL Parser for the full documentation.


AUTOMATED SCRIPT GENERATION

To produce executable Opentrons OT-2 version 2 scripts, the output of the SBOL parser was taken as input to python files responsible for generating assembly-specific scripts. These scripts were created for BASIC, MoClo, and BioBrick assembly methods. Previous software tools such as DNABot and the DAMP Lab's MoClo workflow were adopted and updated to work Opentrons OT-2 v2. The new scripts were also designed to accommodate more user defined parameters, such as different labware, which could be specified through the web-app, without any user interaction with code. In addition, detailed metainformation was added as an output to the assembly specific python files, informing the user of well positions and intermediate steps in a readable format.

Fig. 1: Inputs of the Script Generation Function for different assembly types

For easy integration into the front end or other tools, the entire script generation procedure of an assembly method can be run by calling a single function. The script generation function takes in constructs as a file of comma separated values (a CSV), and one additional CSVs of parts, or parts and linkers in the case of BASIC assembly. Additional input parameters include the folder to save the scripts in, and the labware, pipettes, and modules the user intends to use. The module option was added after watching the DAMP Lab presentation at the International Workshop of Biodesign Automation, in which they discussed the thermocycler module introduced by Opentrons and suggested that Opentrons protocols be adapted to include it. The importance of being able to choose labware and pipettes was emphasised by our wet lab team when testing on an Opentrons, discovering that we were missing some of the labware and pipettes we needed, and that one the pipette mounts was broken.

After taking in the inputs, the script generation function uses the information on the parts and constructs files to specify intermediary products, required reagents, and additional information the user may want. After these processes are complete, the protocols and metainformation can be created. The protocols are created by inserting transfer instructions and parameters such as labware at the top of a template script. The template scripts themselves only use the Opentrons OT-2 v2 API, and no other modules, to ensure that they can be run without requiring additional downloads, so that we satisfied our accessibility requirement. Unfortunately, this severely limits the amount of tracking and debugging we are able to do for the user. The metainformation is stored as a CSV and is entirely meant for the user. We chose to provide very detailed metainformation as a compromise between accessibility and user confidence - we wanted to show the users the steps we had planned and where each sample would be positioned, without using any additional modules in our Opentrons protocols.

BioBricks

  • Digests: 
    biobricks() must determine the digests needed to be created as intermediary steps between parts and constructs. 
    Each part must be mapped to a digest, and each digest must be mapped to a construct or constructs.

  • Master mixes: 
    Three master mixes must be defined, for use in upstream digests, downstream digests, and plasmid digests. Per digest, each master mix has 1 μL of one enzyme, 1 μL of another enzyme, and 5μL of NEB Buffer 10X. The enzymes are EcoRI and SpeI for the upstream master mix, XbaI and PstI for the downstream master mix, and EcoRI and PstI for the plasmid master mix. Enough must be made so that there is 7 μL per digest plus two digests worth of dead volume (14 μL)

  • Reagents:
    The reagents needed and their volumes and wells are specified in tandem with defining the master mixes. The reagents needed are upstream master mix, downstream master mix, plasmid master mix, T4 Ligase Buffer 10X, and T4 Ligase.

  • Transformation reactions:
    The function must determine the transformation reactions needed to occur, combining competent cells with assemblies and control cells with water.

  • Protocols:
    Information must be inserted into templates, producing protocols. This includes dictionaries of transfers, labware information, and other parameters such as whether to use the thermocycle module.

  • Metainformation:
    Detailed information is saved in a CSV format. 

Steps included in the automated assembly protocol and transformation protocol for BioBricks assembly

MoClo

MoClo script generation calls the function moclo_function() and returns five output paths (or one if there is an error).
Script generation for MoClo has several requirements:

  • Reagents and master mixes
    moclo_function() must define master mixes and reagents, the reagents being ligase, restriction enzyme (e.g. BsaI or BpiI), buffer, and water. Water is stored in the first well of the trough, and given a required volume of 1.5 mL as this is guaranteed to be both significantly above the actual required volume and below the maximum volume per well (2.2 mL). The volumes of ligase, restriction enzyme, and buffer are dependent on which master mixes need to be created, and reagent wells are mapped to master mix wells. More than one master mix for the same number of parts per assembly if often needed, as there is not sufficient master mix per well to supply more than a few assemblies. More than one master mix for the same number of parts per assembly if often needed, as there is not sufficient master mix per well to supply more than a few assemblies.

  • Master mixes
    Three master mixes must be defined, for use in upstream digests, downstream digests, and plasmid digests. Per digest, each master mix has 1 μL of one enzyme, 1 μL of another enzyme, and 5μL of NEB Buffer 10X. The enzymes are EcoRI and SpeI for the upstream master mix, XbaI and PstI for the downstream master mix, and EcoRI and PstI for the plasmid master mix. Enough must be made so that there is 7 μL per digest plus two digests worth of dead volume (14 μL)

  • Agar plate
    The locations of each assembly's transformation on an agar plate must be specified, and saved as a CSV.

  • Protocols
    Information must be inserted into templates, producing protocols. This includes dictionaries of transfers, labware information, and other parameters such as whether to use the thermocycle module.

  • Metainformation
    Detailed information is saved in a CSV format. 



Steps included in the automated protocols for BASIC assembly



Future plans

General

  • Add optional real-time error tracking and prediction to our Opentrons protocols
  • Integrate Plateo, a tool by the Edinburgh Genome Foundry that allows tracking of plates and wells

BioBricks

  • Add agar plating to our transformation protocol
  • Enable multi-level constructs in one run
  • Add alternative BioBricks standards, such as the Silver and Gold standards

MoClo (GoldenGate)

  • Enable multi-level constructs in one run, embracing the true potential of MoClo
  • Implement variations of MoClo such as CIDAR MoClo - a more efficient version of MoClo

BASIC

  • Incorporate DNA methylation, a way of preventing digestion of linkers and thus enabling more complex combinatorial design

frontend

To incorporate SBOL Designer into our frontend, we used the WebSwing Framework, which allowed us to run multiple sessions of SBOL Designer from the frontend. Further, we also extended SBOL Designer to swap the look and feel of it to make it more modern and similar to Material UI.
As part of extending the SBOL designer we tried to have it directly interface with our frontend, through JSLink, this did work during testing, however, due to the concurrency of Webswing, which also used JSLink, and SBOL Designer. For this reason their interaction became unstable and unreliable causing the WebSwing server to crash as well as cause errors due to the JSObject from JSLink being loaded by different class loaders. We are working to fix this bug in the framework and SBOL Designer interaction, however, as we were not able to fix it in time, we decided to have SBOL Designer manage its input and output through Webswing independently of the front end.

We included the functionality of SBOL Validator by interfacing with the SBOL Validator API and making a user friendly interface for people to use the powerful functionality of the API. Our interface allows to check the compliance of an SBOL 2.0 file with the rules of the standard, allowing to check the URI compliance, the best practices and giving exhaustive error messages for people to edit and improve their external file.
Our choices to have functional and intuitive interfaces for our users made us adopt the material UI library as well as its principles. We collaborated together in the react framework by assigning components to each other and interfacing them, with an attempt to make the file organisation as intuitive and easy to maintain as possible.

Software development
We utilised GitHub as our version control provider, as it is easy to use and access, and also had the added benefit of its GitHub Actions Feature. This feature we utilised by creating a continuous integration pipeline for the backend that would run our tests in our backend repo on every commit. Furthermore, we had continuous deployment of our frontend through Heroku.

SBOL Designer and SBOL Validator
We incorporated SBOL Designer, since it was one of the more powerful programs for the creation and manipulation of genetic contructs in the SBOL format. It uses SBOL Visual symbols and is able to integrate online repositories such as SynBioHub and the igem part registry.
SBOLDesigner is released under the Apache 2.0 License.
SBOL Validator enables us to check file validity with respect to SBOL compliance, and allows for data conversion. SBOL Validator/Converter is also made freely available under the Apache 2.0 license


backend

Django Backend
We choose to build our backend using the python framework Django because of team members experience working with this framework as well as the potential integration possibilities such as pySBOL2. For ease of connection to the frontend react app we combined our endpoint together using graphene, a python library to integrate GraphQL technologies. For more detailed information please visit our GitHub (submodule link of our page)