**Modeling**

**The neural network model for promoter activity prediction**

The activity of pheromone-responsive promoters varies greatly. Currently there is a lack of algorithms for predicting the expression levels of these promoters at different pheromone concentrations. Therefore, we constructed a neural network model, used the backpropagation algorithm (BP-ANN Model) to predict the promoter activity after pheromone treatment

^{1}.The neural network model requires much data, while the characterization database of pheromone-responsive promoters has not yet been created.

We found ten pheromone-responsive promoter sequences of Saccharomyces cerevisiae and the corresponding fluorescence intensity of GFP characterized by 2014 iGEM team UCSF_UCB (http://parts.igem.org/Part:BBa_K1346009). In the construction of this model, the Jiangnan_China team offered help (Please visit our Collaborations page for details: collaborations).

Firstly, we did a clustal sequence alignment of 10 promoter sequences to make them the same length for subsequent processing (https://www.ebi.ac.uk/Tools/msa/clustalo). (attachment：Result of sequence alignment.txt )

**Fig. 1 Result of sequence alignment.txt**

Then we used python and C++ to process the data, and finally converted ATCG bases into numbers. (attachments: matlab_readlines.py, convert into numbers.cpp, processed sequences.docx）

**Fig. 2 Processed sequences.docx**

Then we used the neural network toolbox of Matlab 2018a to determine the number of hidden layers and run the model.

We are interested in the highest expression activity of pheromone-responsive promoters. The highest pheromone treatment concentration in the data set is 3000nM. We used the promoter characterization data at this concentration as the output of the training model, and the program ran as follows:

**Fig. 3 The number of hidden layers is 1500.**

**Fig. 4 The well-trained model1.**

The well-trained model1 shows the correlation coefficient value of the training set is 0.77936, and the correlation coefficient value of the test set is -1. The result is not ideal because our data set only has 10 promoters, and the lack of data may affect the accuracy of the neural network.

We also considered the expression activity of the promoter when induced by different concentrations of pheromone. The output of the model above was changed to a 1*7 vector, which represents the fluorescence intensity when the pheromone concentration goes from low to high. The results are as follows:

**Fig. 5 The number of hidden layers is 2000.**

**Fig. 6 The well-trained model2.**

After changing the output to a 1*7 vector, the result is better than single output. The well-trained model2 shows the correlation coefficient value of the training set is 0.91044, and the correlation coefficient value of the test set is 0.72843.

We have characterized three natural promoters and two engineered promoters. The predicted activity of these five promoters are shown in the figure below. (p

*prm1*is the natural p*prm1*charaterized by 2014 UCSF_UCB. )**Fig. 7 Predicted results and pprm1 in the training set**

According to the predicted results and p

*prm1*in the training set, we got the conclusion: p*fus2*>p*fig1*>p*prm1*, p*prm1*Ultra > p*prm1*> p*prm1*Pro. The experimental data is shown in the figure below:**Fig. 8 Experimental results: p**

*fus2*> p*fig1*> p*prm1*, p*prm1*> p*prm1*Ultra > p*prm1*ProThe experimental results showed that in the natural promoters, from strong to weak is p

*fus2*>p*fig1*>p*prm1*. The experimental results were consistent with the model prediction results. In the engineered promoters, from strong to weak, however, p*prm1*> p*prm1*Ultra > p*prm1*Pro. Besides, some points of data are obviously not in line with the experiment. The model prediction results have defects, and some of the results do not conform to the general laws of characterization and induction. After we find a suitable promoter library in the future, the neural network model will become more accurate.**Molecular dynamics model of pheromone-responsive promoter expression**

Considering that the pheromone response of

*S. cerevisiae*is regulated by signaling cascade, the molecular dynamics model can be used to simulate the process of activation of GPCR by pheromone and the expression of GFP reporter gene^{2}. The mathematical model guides the promoter characterization experiments, allowing us to obtain a relatively accurate pheromone concentration profile of the promoter through as few experiments as possible.According to MWC Model

^{3}：*G*is the concentration of G protein,

*G*is the concentration of activated G protein,

^{*}*Phe*is the concentration of pheromone，K

_{1}, K

^{*}

_{1}are the binding constants of pheromone to G protein and activated G protein respectively. Equation (1) describes the process of activation of GPCR by pheromone.

We ignored the intermediate process of the signaling pathway for it does not affect fluorescence intensity

^{4}.Transcription of GFP gene can be described as:

is the concentration of mRNA of GFP, is the transcription rate without pheromone induction, is the transcription rate after activation of signaling pathway，

*N*is the concentration of DNA of GFP， is the degradation rate of mRNA of GFP._{GFP}GFP translation can be described as:

*P*is the concentration of GFP， is the translation rate of GFP， is the degradation rate of GFP.

_{GFP}When the reaction reaches equilibrium，，we can get the relationship between

*P*and_{GFP}*Phe*:We successfully applied the molecular dynamics model to the drawing of the pheromone characteristic curve of p

*prm1*. In order to determine the parameter values, we designed a set of experiments to obtain the fluorescence intensity of GFP when induced by different concentrations of pheromone. There is a linear relationship between GFP concentration and fluorescence intensity, therefore fluorescence intensity is used to characterize the activity of promoters. (The fitting used cftool toolbox in MATLAB, the fitting file is in the attachment: Fitting.sfit, and the drawing code is in the attachment: Fitting.m）**Fig. 9 Experimental data.**

**Fig. 10 Parameter value.**

**Fig. 11 Fitted curve.**

Signaling pathways constructed based on different promoters have different fluorescence intensity characteristic curves. For p

*prm1*, we have obtained the characteristic curve of its fluorescence intensity. This result can guide the experimental design to determine how to design the pheromone concentration gradient when we want to obtain a specific fluorescence intensity.**Kinetic modeling to describe CRISPRa zygotic regulation system**

In the future, we will try to introduce CRISPRa components into type-a and type-α yeasts to achieve gene expression regulation after cell fusion. We have used the ordinary differential equations to model the process. The purpose of this model is to explore how to set the initial state of cell fusion to obtain greater GFP expression and display fusion efficiency.

**Fig. 12 CRISPRa zygotic regulation strategy.**

Type-a cell：The constitutively expression of dCas9 and scRNA consists of simple production and degradation. The expression of MCP-VP64 is induced by pheromone.
The transcription of mRNA of dCas9 and scRNA：

、 is the concentration of mRNA of dCas9 and scRNA， is their transcription rate， is their DNA concentration， is their degradation rate of mRNA.

The transcription rate of mRNA of MCP-VP64：

is the transcription rate without pheromone induction. is the transcription rate after activation of signaling pathway,

*M*is the concentration of pheromone._{phe}The translation rate of dCas9 and VP64：

Type-α cell：dCas9、scRNA、MCP-VP64 constitutes a trimer, which binds to the promoter of the GFP gene.

The transcription rate of GFP:

*P*is the concentration of the trimer.

_{Trimer}The translation rate of GFP:

When the reaction reaches equilibrium, making (1)~(7) equal to 0，we can get these equations：

(8)~(14) describe the relationship among the dCas9, MCP-VP64, scRNA, pheromone and GFP concentration.

By setting different initial protein, scRNA and pheromone concentrations, a series of experimental data can be obtained. And the least squares fitting method can be used to fit the relationship between initial protein, scRNA, pheromone concentration and GFP concentration.

After completing relevant experiments, we will verify the accuracy of the model and make necessary modifications.

**References**

[1] Meng, H., Wang, J., Xiong, Z., Xu, F., Zhao, G., and ONE, Y. W. J. P. (2013) Quantitative Design of Regulatory Elements Based on High-Precision Strength Prediction Using Artificial Neural Network.

[2] Alon, U. (2006) An Introduction to Systems Biology --Design Principles of Biological Circuits.

[3] Marzen, S., Garcia, H. G., and Phillips, R. (2013) Statistical mechanics of Monod-Wyman-Changeux (MWC) models,

*J Mol Biol 425*, 1433-1460.[4] Shaw, W. M., Yamauchi, H., Mead, J., Gowers, G. F., Bell, D. J., Oling, D., Larsson, N., Wigglesworth, M., Ladds, G., and Ellis, T. (2019) Engineering a Model Cell for Rational Tuning of GPCR Signaling,

*Cell 177*, 782-796 e727.