One of Synthesis Navigator functions, Pathway Finder, is built upon Pathlab, which is from Tongji-Software 2019 iGEM team’s project. Based on Pathlab, we updated the search algorithm, improved the search efficiency, and used statistical means to filter incorrect results, which increased the credibility of the results.
This year, our team expand the data source and add some new, and important properties into the database. The amount of data in our project was significantly increasing compared with the database last year, especially on enzymes and compounds' properties, which is 20 times more than that of last year. Meanwhile, We add some new columns in the table to support the new function: Metabolism Simulation. In addition, we have removed the extremely incomplete data and fill in the missing data to reduce the search space. It means we have more useful data and less redundancy now.
Compared with last year’s project, we added several new functions, and the functions can work in a pipeline to perform the design, engineer, test, and improve cycle. Among these functions, the most significant improvement is the Metabolism Simulation, which allows users to explore the metabolism of a cell, to find what could be changed in a cell to improve their designed pathway, and to evaluate whether their design gives desirable result. More details of this part can be found in the description and design section. Other improvements to our function are the faster-searching-speed and the reverse search function. Compared with our last year’s project, the searching speed is about half, and when dealing with a larger dataset, our project this year can gain a larger improvement. The reverse search can now give out results more than one step in depth, which is more useful in real scene analysis.
As to the Metabolism Simulation, we did not adopt traditional ways like a flux-based algorithm or dynamics-based algorithm. Instead, stochastic-based methods are designed to simulate real metabolism in a cell. We designed a random walking model in our metabolism network, giving enough accurate and reasonable results with little data like the rate constants required by traditional methods. In this way, much information in the metabolism network is fully used, and the algorithm is not that data-dependent, thus making our tool more robust. (Of course, with more data, the results could be more accurate.) The improvements in speed and stability of our searching heavily rely on the A*, Dijkstra, and Yen’s k shortest way algorithms. which performs better in large datasets, that is, if the dataset gets bigger, the efficiency of algorithms is higher.
One feature of our years’ project is the scalability of our software. Nowadays, computers are getting more powerful, often with six or more cores in a CPU, but many tools used by people can only utilize one core, which is undesirable. Most of the algorithms in our tool (stochastic-based methods, A*, Dijkstra, and Yen’s k shortest way) are scalable. That is, they can run in parallel, with little or no modification. In this way, for those who have a workstation or a server, they can obtain very accurate results in a concise time with our tool deployed on their machines.
Our improvement above on the Pathway Finder and the Metabolism Simulation makes an innovative contribution for subsequent teams that want to continue developing in these areas.