Attractive field Model
This part will show you a comprehensive simulation of our project. Considering that we are not able to act the real experiment now to simulate how much benefit it could bring to the system if we try to degrade PAN produced by locusts, we still gave a model for the whole process.
Again, our project design is to establish an attractive pit by placing engineering strain A to produce 4VA to attract locusts. Due to the negative effect made by PAN on locusts’ aggregation, we have also designed engineering strain B to decompose it.
The first part of this model is about how effective strain A is or how much more attractive our pit is. First of all, we need to answer how pheromone diffuses in our pit. By researching we have found that scientists usually use Gauss plume model to simulate the process of pollution diffusion . Comparing those models about pollution diffusion to the pheromone diffusion process, we decided to apply gaussian as well to fit the real situation of pheromone diffusion.
Figure 2 shows the Gaussian we used to describe the pheromone diffusion. In this equation we give σ x and σ y value of 1 to fit the shape of real scene diffusion. The equality of these two constants or the standard error on x and y direction indicate that we neglect the influence of wind, temperature or other environmental disturbance, which means this diffusion will be in the same scale in any direction.
In this equation we use constant A to represent the central concentration of our pit. In another word, A indicate the amount of 4VA our engineering strain could produce constantly whose unit is concentration (ng/μL)
Based on our research on fermentation made by engineering strains, we chose 5000 as the central concentration since it is more likely to cause a favorable gradient distribution in our case. Finally, we obtain the diffusion model with A=5000 and both σ=1. This model will likely show us the real diffusion scene of 4VA. The unit of x and y due to the value of A. In our diffusion model, which A is setted to be 5000, we set x and y as 150m per unit based on the known data .
First, we defined the factor used to describe the word 'attractive'. We have obtained the experiment data that about how attractive, which is represented by how much more time locusts are willing to spend in 4VA area rather than controlled area, 4VA was setted to be in different concentration as F igure 4 shows .
By applying the definition formula that Figure 5 shows where T0 and T1 are the time locusts spent in controlled area and 4VA (treated) area. data calculated as table 2 shows.
|attractive factor A||0.03||0.63||0.48||0.48||0.6||0.42|
We find that the attractive factor we defined seemingly reached a ceiling after concentration of 0.5ng/μL. This trend reminds us of the function form like Michaelis-Menten equation. By applying the same form of equation with independent variable setted as concentration of 4VA and dependent variable as attractive factor. The fitted function and graph are shown in Figure 6 and 7.
Now applying the relationship between concentration of 4VA and attractive factor to the initial diffusion model. We sketch the Figure of our attractive pit in representation of attractive factors. As Figure 8 shows our pit illustrates great ability of attraction in given range.
If we count the area that gives an attractive factor more than 0.1 as the area that is effective to attract locusts, This will give us the area of 1608000㎡ or the area of 225 standard soccer field that has an attraction effect for locusts.
Decompose PAN produced by locusts
According to our prediction, generally locusts will be attracted and aggregate in our attractive pit. The specific number of locusts that we could control in this area depends on the total number of locusts plague in local surrounding areas. As locusts aggregate, great amount of PAN they produce become a problem both for environment and our attraction effect, because PAN is tested to be the compelling pheromone in locusts. Here mainly consider the attraction effect, we target to figure out that with our engineering strains used to decompose PAN, how much will our attraction pit be more effect than it is full of PAN produced by locusts.
Wanting to estimate the compelling effect caused by PAN, similar as attractive factor, we act the same formula as attractive factor then we could gain the compelling factor of PAN as table 3 shows.
|compelling factor C||-0.2||-0.48||-0.53||-0.75|
Indeed, locusts could also produce 4VA when aggregate together, but the amount of it is simply too small (0.1ng/locusts/0.5h) which is about 1/50000 comparing to our central concentration so we just neglect the extra effect made by 4VA. Extra 4VA could barely add attraction factor or reduce it.
The calculated compelling factor shows the similar trend that it seems to reach a ceiling after certain concentration. Therefore, we use the same method that fitting the curve using equation shows in Figure 12 and fitting graph is shown in Figure 11.
By calculating the final attractive factor like Afinal=A+C, we sketch the new graph as Figure 13 shows.
We could clearly see that as time goes by PAN will have tremendous negative effect on our attractive ability. This again prove our thought to decompose PAN with designed engineering strain.
To be more specific we sum up the effective attraction area before and after PAN added as table 2 shows.
|Before PAN Added||After PAN added|
|Effective attraction area (㎡）||1608000||1211500|
This way we could say that with PAN produced by locusts we attracted, the attractive pit will be less and less effective. Finally, the attractive ability could only maintain about 3/4 attractivity compared to its initial status. The difference between these two status is about 400000㎡ which is about the area of 56 standard soccer field, which is the last thing we want.
This model gives us an insight into the benefit of decomposing PAN, which could enhance and maintain the attractivity of our attraction pit at a relatively high level.
Although we are not able to complete engineering strain that produces 4VA, by applying our model we could still make the conclusion that our designed attractive pit is effective in attracting and controlling locusts.
After aggregation of locusts, PAN will be primarily served as compelling pheromone which could curtain the attraction ability of our attractive pit. By applying our model, we could know that our pit is only less than half effective than it is initially as locusts aggregating, which the lost is too much too bear. So, if we use our engineering strains to decompose PAN, we could maintain the attractive ability in a high level rather than decrease to one half. This proves that our project design is correct and effective.
Considering further development of this model, we are about to make our diffusion model more complete by adjusting constants if we could access experiments to gain data about our pheromone’s accurate property. Then we could add more environmental factors like wind, temperature and landform....... This way will make our model much more consistent with real scene and increase its accuracy a lot.
We assumed that the affected variables are the number of bacteria produced, the production cost of the bacteria, the labor cost during the period, the transportation cost during the period, the maintenance cost during the period, and the farmers' production input. The total cost is Y.
First, we established a model based on LASSO regression to predict the change of the variables in each year, and get the annual independent variable cost value, so that the total cost can be obtained. The profit model is established based on the total cost and the impact of the locust plague on the total income. Its locust plague coefficient used a logistic regression model to simulate the change of the coefficient and obtain a coefficient in order to calculate the total return.
Cost prediction model based on LASSO regression:
In statistics and machine learning, the Lasso algorithm (least absolute shrinkage and selection operator) is a regression analysis method that performs feature selection and regularization at the same time, and aims to enhance the prediction accuracy and interpretability of statistical models.
The Lasso regression method can be used to improve the accuracy of prediction and the interpretability of the regression model. The model modifies the process of model fitting. Only a subset of the covariates is selected and applied to the final model instead of all covariates.
In statistics, linear regression is a linear method of modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). The case of an explanatory variable is called simple linear regression. For multiple explanatory variables, this process is called multiple linear regression.
Lasso regression can use regularization to make the sum of the absolute values of the regression coefficients less than a certain fixed value, that is, force some regression coefficients to 0, effectively choosing a simpler model that does not include the covariates corresponding to these regression coefficients, and has better accuracy and robustness, and easy to select important features in high-dimensional space.
Suppose a sample includes various events, and each event includes a covariate and an output value. Among them is the output value, which is the predicted target value. The calculation goals are——L1 regularization.
In a linear regression model, in order to optimize the target function (minimize the sum of squares of errors), the data must meet many assumptions to obtain unbiased regression coefficients and minimize model variance. However, in reality, the data is very likely to have multiple characteristic variables, which makes the model assumptions false and causes over-fitting problems. In this case, regularized regression is required to control the regression coefficients to reduce model variation and out-of-sample errors., Improve model generalization ability.
LASSO regression uses L1 regularization, which can increase the penalty equal to the absolute value of the coefficient. This regularization can result in sparse models with very few coefficients, and some coefficients can become zero and eliminated from the model. Larger penalties result in coefficient values close to zero, which is ideal for generating simpler models. On the other hand, L2 regularization (such as ridge regression) does not lead to the elimination of coefficients or sparse models. This makes LASSO regression more robust than ridge regression.
In order to add L1 regularization to the model, we modified the cost function in the model as Figure 15 shows.
Among them, the data matrix represents the result that needs to be predicted: the observation value, the parameter weight of the model, and the weight value for regularization.
Through processing, the biggest feature of using L1 regularization is to sparse the matrix and perform feature selection under a larger sample size, which can effectively improve the generalization ability of the model. The sparsity produced by L1 can better make long-term reliable predictions for the model.
In order to solve the problem, we collected the data in recent years to substitute and fit each independent variable, and analyzed the relationship between each variable. According to the calculation, the total prediction accuracy was 91%. The sub-prediction results are as follows: Figure 16 shows the changes in the number of bacteria produced, and Figure 17 shows the changes in labor costs during the period.
After calculation, the total cost is 15.5, in tens of millions RMB, for a total of 155 millions RMB.
In order to establish a fitting model based on the logit function of logistic regression, we can establish: logistic regression, which can help to predict classification.
Logistic regression can help predict classification.
Logistic regression prediction is discrete (only specific values or categories are allowed). We can also view the probability score of the model classification. In order to map predicted values to probabilities, we used the sigmoid function.
The function maps any real number to another value between 0 and 1.
In machine learning, we used sigmoid to map predictions to probabilities.
The graph of the function is shown by Figure 18.
Its shape is similar to the development and reproduction the engineering strain, so it can be fitted with a function to solve its influencing factors on profit. In order to fit and optimize the results, the gradient descent algorithm can be used to solve the problem.
The algorithm flow is:
The gradient descent method is an optimization algorithm that minimizes a function by iteratively moving in the steepest descent direction defined by the negative gradient. In machine learning, we used gradient descent to update the parameters of the model. Parameters refer to the coefficients in linear regression and the weights in neural networks.
Use the following three-dimensional graph in the context of the cost function. The optimization goal is to move from the high mountain in the upper right corner (location of high cost function) to the deep blue ocean in the lower left corner (location of low-cost function). The arrow indicates the steepest descent direction (negative gradient) from any given point, which reduces the cost function as quickly as possible.
From the top of the mountain, take the first step downhill in the direction specified by the negative slope. Next, recalculate the negative gradient (via the coordinates of the new point) and take another step in the direction it specifies. By repeating this process, until we reach the bottom of the graph, or reach a point where we can no longer move down-a local minimum.
Learning rate: The size of these steps is called the learning rate. With a high learning rate, we can cover more land at each step, but we may exceed the lowest point because the slope of the hillside is constantly changing. The learning rate is very low, and we can confidently move in the direction of the negative gradient, recalculating it often. A low learning rate is more accurate, but calculating the gradient is very time-consuming, so it takes a long time to reach the bottom.
Loss function: The loss function tells us how good our model is at predicting a given set of parameters. The cost function has its own curve and gradient. The slope of this curve tells us how to update the parameters to make the model more accurate.
Finally, let's run gradient descent with the new cost function. There are two parameters in the cost function that can be controlled during the training process: m (weight) and b (bias). Therefore, we need to consider the impact of each factor on the final prediction, so we need to use partial derivatives. Calculate the partial derivative of the cost function for each parameter, and store the result in the gradient.
Finally, after multiplying the coefficient and the profit, the fitting prediction is solved, and the profit is about 315 millions RMB, which is 315 millions RMB in total.
Therefore, the profit becomes positive, so our experiment is of high value on the economic impacts.
in this part we establish groups of ODE to simulate our sensor which is able to respond to PAN and express GFP protein. Step by step we give the predict of GFP expression with change of PAN concentration.
First, without any assistance from other proteins, GFP, which is represented by DNA_RNAP or NahR_DNA_RNAP or PAN_NahR_DNA_RNAP in this model since all of them should be proportional to GFP, could still express itself through this primary route. This part should only be a tiny part of its expression though. There k2:k1 is given value of 3500 which is a little greater than 3100 that is discovered to be the dissociation constant of nonspecific DNA_RNAP.
Then considering that there are some of the transcription factor which is NahR in this case could be expressed and bond with DNA to promote the expression of GFP. Thus, the dissociation k6 could be supposed to be much less than k2 in previous case since with the help of NahR even though there are little amount, the rate of dissociation will be much smaller meaning more GFP could be expressed.
Figure above shows that with the increase in concentration of NahR how DNA_RNAP(D in figure) NahR_DNA_RNAP(N in figure) and the sum of them change. This trend is consistent with our expectations which predict that the higher the concentration of transcription factor, which is NahR in this case, is the more GFP will expressed.
Finally, we include PAN into this model which will is able to activate the expression of NahR. By applying process above we lastly gain figure like this
Now we consider concentration of PAN as the independent variable. Plot as
This figure is consistent with our expectation that with the increasing concentration of PAN, the expression of GFP will increase in total.
|k1||2||Klumpp S, Hwa T. Growth-rate-dependent partitioning of RNA polymerases in bacteria. Proc Natl Acad Sci U S A. 2008 Dec 23 105(51):20245-50. doi: 10.1073/pnas.0804953105. p.20247 left column bottom paragraphPubMed ID19073937|
|k2||7000||Klumpp S, Hwa T. Growth-rate-dependent partitioning of RNA polymerases in bacteria. Proc Natl Acad Sci U S A. 2008 Dec 23 105(51):20245-50. doi: 10.1073/pnas.0804953105. p.20247 left column bottom paragraphPubMed ID19073937|
In this part, team RDFZ_China offered us crucial help in producing the code and completing it. We appreciate the selfless help from team RDFZ_China.