The idea to use information from different online databases originated from the lack of absence of one all-inclusive database. While popular databases contain a wealth of information, the idea of having to choose between different datasets does not fit within the concept of Forbidden FRUITS: If we want to make ‘any compound’ with ‘any microbe’, we should at least have the ability to choose any compound. Choosing a database restricts our choice of ‘any compound’, as not all compounds are available in all databases.
New “standard” format
With this idea in mind, the first step was to create a new format for our new database. This database should be able to store the data from different databases, so we had to take this into account while designing our database. While flexibility is an important property, the FF algorithm should always be able to use this information. Therefore, there were some necessary standard properties for each entry of the database.
Parsing of databases
Every database has to be parsed after downloading. As each database has a different format, a unique parser had to be created for every database. We started by making parsers for the following databases: KEGG, BiGG, ModelSeed and MetaNetX. The format consisted of 4 entry types: Compounds, Compartments, Metabolites and Reactions. Each of these entry types stores information in properties. This universal format makes comparing (matching) and joining entries easier.
The Compounds, Compartments and Reactions entry types all have the following properties: identifier (from the original database), source (name of original database), names (list of possible names) and aliases (dictionary of identifier in other databases). Next to these, the Compounds and Reactions types also have a meta property, which can store data that is database specific. This meta-information can be used later by the algorithm. The Reactions type also contains a stoichiometry property, that stores the stoichiometric coefficients of every Compound of the Reaction into a dictionary. Metabolites is a combination of Compounds and Compartments, and stores the identifier of both entry types. Every Metabolite has an identifier that is the combination of the Compound and the Compartment.
Merging of databases
The next step was to design the merging methods. The first merger is able to merge databases based on identifier and aliases information. While already effective, the simplicity of this method led us to extend it with a new function. The (optional) name matching function uses the name information to find possible matches. This is especially useful when merging databases with little to alias information. This straight forward merger we named the Simple merger.
Continuing with using more information to match databases, a new idea was formed. Identifiers and aliases are great for finding matches, but for verifying that two reactions are similar they do not give sufficient evidence. This is because there are a lot of reactions in different databases that are highly similar, but have different stoichiometric coefficients or use slightly different compounds. Therefore we needed a different method to verify the reaction matches. Using the stoichiometric property of the Reactions entry type, we can check that the compounds and coefficients of two reactions are similar. We were even able to extend this method to use the stoichiometry to find matching reactions that didn’t have the same identifier or aliases. This new merger was called the Stoichiometric merger. While it has a stricter filter on reactions, it is able to pick up matches that are impossible to find with the Simple merger.
An important part of the design and build process of both the parsers and mergers was testing. The database files that we are using are extensive and it's impossible to check every single match on mistakes. Therefore we designed custom toy databases that include every exception we were able to find in the databases or could think of.
The current version of the merger is able to load and merge JSON datafiles of the KEGG, BiGG, ModelSeed and MetaNetX databases, as well as SBML datafiles containing computational models of biological systems. In Forbidden FRUITS these are the metabolic networks of the microbe of choice.
Network transformation based on Gene-Protein-Reaction Associations
Forbidden FRUITS (FF) obtains its genetic strategies by running constraint-based analysis on a computational network of stoichiometric matrix of reactions. The two methods built in this part enable FF the ability to parse the Gene-Protein-Reaction (GPR) associations information out of SBML models and extend a stoichiometric network of reactions with the GPR associations obtained, thereby FF is able to generate more accurate results from constraint-based analysis and better strategies in later steps.
Systems Biology Makeup Language (SBML) is an XML-based biochemical reaction network format which computationally standardizes the information of models . The GPR associations parsing method built in this project currently only supports version 2 of SBML level 3 Flux Balance Constraints(“fbc”) package, with the aid of a python API library LibSBML . Since there’s no available public package to read out the GPR associations, the method was built ab initio. The core of this method is a recursive function that goes level by level while keeping track of the current node, the gene products in each branch, and the output. This parsing method can be used not only by FF, it can also be used by any user of SBML files (see more detail on Contribution Page).
The transformation method divides the reaction stoichiometry into multiple pseudo-reactions, by taking the reversibility and GPR associations into account, and adds corresponding gene products to all possible stoichiometries as compounds being consumed. To keep the balance of the matrix, pseudo-reactions responsible for making the gene products are also added for all gene products presented. Figure above shows an example of transformation of the stoichiometry of a reversible reaction R3 with knowing its three GPR associations.
We have validated the methods with Parsimonious Flux Balance Analysis on iAF1260 model (see the results).
The concept of a cheap-lunch strategy (see design; results) was born from ongoing research at our host group. In 2019, the group published their research on growth-coupled production of fumarate in Synechocystis PCC6803 , which confirmed that growth-coupled production of native compounds could be sustainably implemented. From these results, a new research project that focuses on the production of malate using fumarate as a substrate was started. Here, the concept of having a growth coupled compound that could be used as substrate for producing a different target compound was born.
Growth-coupled “cheap-lunch” strategies
Putting together the idea of growth-coupling non-native metabolites with this new concept results in what we call cheap-lunch strategies; which is also inspired by the idea that not many chemical reactions are biologically available in microorganisms, but with the implementation of this type of strategies, they can be incorporated in a genetic engineering strategy for growth-coupled production.
Incorporation of “cheap-lunch” strategy finder
Then, once directly growth coupled strategy-finding was implemented in Forbidden FRUITS, designing an approach to implement cheap-lunch strategies was possible. Following the process that was conceptualised in the design phase of this project led to the successful incorporation of a cheap-lunch strategy-finder to our API. But this was only achieved by progressively adding steps to the functions created within the class, for which a debugging tool was used.
From each debugging step, improvements could be identified that would make the cheap-lunch strategy-finder achieve its goal efficiently by incorporating information from other classes in the API. The current version of the finder is the result of this iterative process and it is ready to search for strategies in the universal database provided by the merger and the network transformation tool.