Searching pathway from metabolism graph is the basis of the later work, and the graph algorithm is the core of the pathway search, which determines the speed and accuracy of the pathway search. We have changed our path search algorithm thoroughly in this year with NetworkX, a python library to replace the hand written DFS and Greedy search in the past year’s project, since standard algorithm libraries are more stable and fine-tuned.Despite Yen’s k shortest way algorithm , A* and Dijkstra algorithm are also used in the simple search function, where the starting and target compound are given.
Dijkstra algorithm is an algorithm for finding a single source shortest path with perfect time complexity. The main idea of Dijkstra is to divide the nodes into two sets: those whose shortest path length has been determined and those whose path length has not been determined. So in the first set, we started with only one. Then repeat these operations: Relaxation is performed on all outgoing edges of nodes that have just been added to the first set. From the second set, select the node with the shortest length and move to the first set. The algorithm ends when the second set is empty. The time complexity of Dijkstra supported by the priority queue is O(mlogm).Dijkstra algorithm is also used in reverse search, when only the target compound is given.
figure1.Graphical representation of Dijkstra
The A* algorithm defines a valuation function f(x) = g(x) + h(x) for the current state x, where g(x) is the actual cost of getting from the initial state to the current state, and h(x) is the estimated cost of the optimal path from the current state to the target state.
We start by adding the initial state (s,0) to the priority queue. Each time we take the smallest state of the valuation function f(x), enumerate all outgoing edges of the node x to which the state reaches, and add the corresponding child states to the priority queue. When we visit a node the k-th time, the corresponding state of g(x) is the k-th shortest path from x to that node.
figure2.Graphical representation of A^*
1.4 Yen’s algorithm
First, we solve the shortest path A^1 from start point to destination with other algorithms, such as Dijkstra. For each pair (s , t) in A^1 ,note that distance from s to t (a sub-path of A^1) is set to positive infinity for computations, and then repeating calculate the shortest path from start point to destination. Shortest path found after each pair(s , t) has been assumed to be positive infinity is A^2. Repeat this operation from A^i to A^(i+1),and finally we can get A^k, as the k-th shortest path.
We need to rank the pathways that we find to help us decide which synthetic pathway to finally choose.
The method used here to determine the weights is the more objective entropy weighting method, which at its core is based on the degree of change in the indicator values.
According to the entropy theorem, if an indicator has the lowest entropy value, i.e., the more significant the difference between the values, then it provides the most useful information to the decision maker and should be given a higher weight.
2.2 Normalization of data
2.2.1 For the treatment of KM and Sequence indicators：
Since KM and toxicity are negative indicators, they are normalized as follows.
figure3.Normalization of KM and toxicity
When the KM_value and toxicity_value are larger, the normalization result for this item is smaller.
Since KKM is a positive indicator, it is normalized as follows.
figure4.Normalization of KKM
The larger the KKM_value, the larger the normalization result for that item.
2.2.2 For the treatment of temperture and PH indicators:
Since temperature and pH are not simply greater or lesser for a given reaction, the optimum temperature and pH corresponding to each reaction are related to the reaction itself. Therefore, when normalizing the two indicators, temperature and PH, we choose to use frequency to reflect the degree of appropriateness.
The suitability of a path is measured by the number of paths whose temperature and pH fall in the same interval as its temperature and pH.
Therefore, we choose
figure5.Normalization of temperature
figure6.Normalization of PH
as a normalization for temperature and PH.
The result of such a count is represented on the bar graph as 9 rectangles of different heights, with the nodes on the x-axis being the equivalents of temperature and PH, and the y-axis being the number of paths whose temperature (or PH) falls in this interval.
figure7.normalization for temperature and PH
2.3 Calculating weights
The normalized results form the normalized matrix: Y, with dimensions m*n,m is the number of paths and n is the number of indicators. Use the following flowchart to represent the process of solving the weights. Where w denotes the final indicator weight obtained.
figure8.steps of Calculating weights
3.The Metabolism Simulation
Finding the pathways is only the first step, there are still gaps before we can finally get started with experiments, such as how to create the optimal growth environment for the chassis organisms to maximize the yield of the target product. By reviewing the literature, we finally decided to use Random Walking Model
3.2 Random Walking Model
The approximate steps of the model are as follows：
For a simple organism, there are four reactions.
figure9.four reactions in a simple organism
These four reaction can be represented as a follow:
figure10.a straightforward representation of reactions
Each cycle, a reaction occurs (provided that the product of the logarithm of the number of reactants divided by the product of the logarithm of the number of the products is greater than the weight of the reaction), and the number of the corresponding compound changes.
figure11.Reactions occur with changes in molecular number.
After multiple cycles, the metabolism of the organism will be simulated.
In practice, the reactions will be recorded in a matrix, where each row represents a reaction, each column represents a compound, and each element is the number of molecules that the compound reacts through this reaction. This is then added to the original compound molecular number.
figure12.Changes in the number of various compounds can be recorded in matrix
FBA does not require dynamics and can be computed very quickly even for large networks.
3. Jeffrey D Orth , Ines Thiele, Bernhard Ø Palsson. What is flux balance analysis?[J]. Nat Biotechnol.2010 Mar;28(3):245-8.
4. Steinn Gudmundsson , Ines Thiele. Computationally efficient flux variability analysis[J].BMC Bioinformatics. 2010 Sep 29;11:489.
6. Jin Y. Yen, Finding the K Shortest Loopless Paths in a Network, Management Science, Vol. 17, No. 11, Theory Series (Jul., 1971), pp. 712-716