Team:DTU-Denmark/Software

Summary

Here we present Morphologizer and Mycemulator - two integrated, but modular software tools enabling a full pipeline of morphological characterisation of fungi, going from microscopic images of fungal strains to growth simulations. Our graph based image analysis tool, Morphologizer, enables many ways of analyzing mycelial structure, while our growth simulation tool, Mycemulator, gives insight into local substrate consumption and time dynamics.

In addition, we provide a large image data set with growth rates to use as a baseline. The flowchart below outlines the inputs, outputs and inner workings of the two tools.


Flow chart illustrating the central steps in our two tools and key inputs/outputs. (a) Morphologizer requires microscopic images of mycelia, where the hyphae are sufficiently pigmented (e.g. by a fluorescent marker) to be filtered out from the background with a threshold. The mycelium should be relatively planar for best results. We refer to our novel microscopy protocol [ref experiments]. (b) Before a graph can be constructed, the Morphologizer processes the image. First the image is binarized with a threshold, then edges are smoothed with a median filter. The binary image is then skeletonized, making each hyphae 1 pixel wide. From the skeleton the graph is constructed using sknw library by Yan Xiaolong. From the graph, various morphological parameters are calculated - see details on this in the Model page.. (c ) Morphologizer outputs a .csv file containing the branching frequency and the distribution parameters for the angle and curvature distribution. Other parameters are calculated, but (d ) Our growth model integrates growth rates from our bioreactor data. The user can use the default growth rate, or provide their own. (e) Mycemulator takes in parameters from Morphologizer, and a growth rate, which is based on bioreactor data in our project. Based on these parameters, the model runs the shown loop until a set number of iterations. (f) Mycemulator outputs by choice of the user either a gif and an mp4 video of simulations or a pdf of time-dependent simulation images. Position coordinates of the hyphal elements can also be extracted in a dataframe.


Key usability aspects

Input flexibility
Mycemulator is very flexible in regards to input, allowing the user to use as many parameters as they have available to make an accurate growth simulation. In case the user is unable to estimate one or more parameters we provide our estimates based on ATCC 1015 as default values.

Modularity
Our two tools are modular and can be run combined for the most accurate simulation or individually to suit other purposes. Running Morphologizer separately enables extraction of features of fungal growth for characterisation or comparison of different strains. If the user cannot obtain microscopic images of the fungus of interest, Mycemulator can be run individually and in this case the default parameters are based on our data for ATCC 1015, including the parameters estimated by the Morphologizer by us.

Full pipeline
Together with our microscopy protocol, Morphologizer and Mycemulator together comprise a full pipeline for characterising the morphology of a fungal strain. This pipeline can help scientists characterise their own strains - from the initial microscope slide set-up, to analyzing the resulting images, to making growth predictions.

Easy to use
To accommodate wet-lab scientists, both Morphologizer and Mycemulator are well-documented, written in a user-friendly manner, and have README files associated with each script. Furthermore, the scripts are both exclusively written in the high-level Python programming language and use common input and output formats (csv). We also provide our own data set as an example data set in our Github repository. By including a command-line argument parser, the two tools can be embedded into analysis pipelines easily.


Key features

Data-based analysis
Our tools are based on experimental data. As we found that most existing previous simulations of mycelia did not look that similar to microscopic images of fungi, our main interest was to base it on real image data - which we succeeded in.

Graph-based analysis
Morphologizer converts the mycelium into a graph and estimates parameters based on this graph. By conducting the analysis this way, we ensure that the tool is not restricted to analyze images of specific fungal species, and actually not even to fungi. In addition, graphs are informative data structures, making many different types of data analysis possible. Many natural patterns would be able to be analyzed by Morphologizer.

Novel morphological parameters
Some parameters extracted from microscopic images by Morphologizer are, to the best of our knowledge, not extracted by any other tool. An example is the distribution of hyphal curvatures which we use to model chemotropic behaviour.

Industrial relevance
Our tools enable the user to simulate the growth of mycelia with different morphologies, and under different growth conditions relevant to industry. An example is the possibility to use different substrate availability models. Solid-state growth can be simulated by lowering or removing diffusion. Continuous submerged fermentations can be emulated with high diffusion and constant substrate infusion at the perimeter.

SignalPrepper – Computation of Synthetic Signal Peptides

In order to take advantage of the beneficial compounds produced by filamentous fungi, it is crucial to know signal peptides able to export these products. Commonly, a given signal peptide is known to function for the export of not only one, but several proteins within a species. However, there is still a bottleneck when working with the secretion of multiple proteins in the lab as you need to have many different signal peptides in stock in order to test the secretion of different proteins. Therefore, it would be very beneficial to know of a single signal peptide able to transport all, or at least the majority, of proteins within a species.

Several attempts at either computing signal peptide sequences for a specific protein of interest or validating proteins as signal peptides have been made (Wu et al., 2020). Yet, to the best of our knowledge, computing synthetic signal peptides functioning for the majority of proteins within a species has not been tried. We decided to make a model that, for a given species, can compute synthetic signal peptides approximating typical signal peptides found in the genome of this species. The aim of these signal peptides is thus to be able to lead the secretion of the majority of proteins within the species for which they were computed. Of relevance to our project, the model was used to find typical signal peptides for Aspergillus niger.

To avoid replicating work which has already been done, SignalPrepper applies the acknowledged SignalP algorithms. SignalP 3.0 (Nielsen and Krogh, 1998) was applied for annotating the N, H and C regions of signal peptides and the newest version of the algorithm, SignalP 5.0 (Almagro Armenteros et al., 2019), was applied for predicting both the probability of computed synthetic sequences being signal peptides as well as the certainty of their cleavage sites.



Structure of SignalPrepper


Composition of a signal peptide (SP) following a protein sequence. The signal peptide consists of an N-terminal region, closely followed by a H- and C-region (Moog, 2018).


SignalPrepper is given the entire genome of the species of interest as input. SignalP 3.0 is run on the entire genome in order to extract sequences from proteins marked as putative signal peptides. SignalP 3.0 annotates the N, H and C regions of the signal peptides, which are then used to collect frequencies of the distribution of the observed amino acids. Furthermore, for the critical positions of the C region right before the cleavage site, the most frequent amino acid of each position was noted and applied in the computation of all synthetic signal peptides. The rest of the positions were sampled randomly from the distributions of the three regions and joined together to form potential synthetic signal peptides. The distribution of the amino acids for each of the three regions are seen in the figure below. Notice, that the critical positions of the C region right before the cleavage site is not included in the histograms. The most frequent length of the three regions were used for all synthetic sequences, which for A. niger was 18 amino acids.

The histograms of the native signal peptides showed a composition of hydrophobic amino acids (Leucine (L), Alanine (A), Methionine (M), etc.) which corresponds well with signal peptides found in other eukaryotic species (Moog, 2018). Many signal peptides found in nature have a hydrophobic area - typically alanine (A) or leucine (L), in the H- and C-terminal region, and since this distribution of amino acids are used to make the synthetic signal peptides, the synthetic signal peptides will likely have a distribution similar to this.

In order to assess the likelihood of the computed sequences being signal peptides the newest and most accurate version of SignalP was applied. The peptide sequences were joined together with the glycoamylase enzyme to constitute a signal peptide protein complex which is required for signalP analysis. Only sequences predicted to be signal peptides with more than 90% certainty as well as with correctly predicted cleavage sites (between the synthetic sequence and the glucoamylase protein) were considered. These were then sorted by the probability of their most likely cleavage site.






Histograms showing the distribution of the different amino acids in each region of the native signal peptide (n=1482).



Results

The SignalP 5.0 predictions of the top 5 signal peptides are seen in the table below.
Signal peptide Signal peptide probability Probability of cleavage site
SigPower 0.983838 0.8784
SigPilot 0.979291 0.8716
SigPineapple 0.970783 0.8662
SigPeanut 0.966822 0.8629
SigPuppy 0.966085 0.8620

These 5 signal peptides were selected for further analysis and you can read more about their story in our Design page.




References

  1. Wu, Z., Yang, K., Liszka, M., Lee, A., Batzilla, A., Wernick, D., Weiner, D. and Arnold, F., 2020. Signal Peptides Generated by Attention-Based Neural Networks. ACS Synthetic Biology, 9(8), pp.2154-2161.
  2. Nielsen, H. and Krogh, A., 1998. Prediction of signal peptides and signal anchors by a hidden Markov model. N Proceedings / ... International Conference on Intelligent Systems for Molecular Biology ; Ismb. International Conference on Intelligent Systems for Molecular Biology, 6, pp.122-130.
  3. Almagro Armenteros, J., Tsirigos, K., Sønderby, C., Petersen, T., Winther, O., Brunak, S., von Heijne, G. and Nielsen, H., 2019. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nature Biotechnology, 37(4), pp.420-423.
  4. Moog, D., 2018. In Silico Tools for the Prediction of Protein Import into Secondary Plastids. Methods in Molecular Biology, pp.381-394.