# Project modeling

## Our model is on the iGEM 2020 Software Github!

In our project a model of fish schools has been central, the path that led us here is elaborated on in the Engineering page and Integrated Human Practices page. Here we discuss the model we have implemented, the results and how we obtained them. For our project we are using a discrete stochastic model which is based on individuals. Our model is an extension of an existing model [1] which it self is based on [2]. There also exists a continuous version of this model [3]. [1] presents a model for two dimensions, our model follows the same principles but can be used in 3 dimensions. There is of course a big concern caused by this transition, as 3 dimensions introduce more degrees of freedom. But the exact parameters in this model are not “scientific”, rather they are tuned such that the model exhibits behaviors seen in fish schools, for example fish migrating in the same direction. The initial values are random, causing the steady state to vary greatly. As an example, the model can have a varying number of schools.

## Description of model

Our model simulates a fish school. This model is very interesting for our purposes because of its tunable parameters. Each fish has a radius of attraction, repulsion and orientation, with associated weights. With these parameters we can approximately represent the symptoms of many different diseases, such as lethargy represented with less reactive fish by increasing the self weight parameter. As seen in the model description this parameter is the weight of a fish’s own direction. Avoidance behavior can be simulated by increasing the repulsion radius, erratic swimming can be introduced by increasing randomness. These are only approximations of symptoms, but they might give a good indication of what is the best way to detect these symptoms, and of the degree of illness that is required for detection, or how many fish need to be sick.

In our simulations we have made an effort to use measurements that are simple, so that they hopefully contain less information. If our system needs little information to detect differences, then it's a good indication that this might be the case in the real world. All parameters that are kept constant are found on the bottom.

## Existing extensions

Some authors have extended such a model in literature, trying to make it more accurate. [4] is a good example of this, introducing models of perception, maximum turning angles, variable speed, etc. We unfortunately have not had time to implement these extended versions, but hopefully they could be implemented in the future and extended upon even further.

## Detection system

The structure of what we call a detection system is seen in the graphic below.

Here the sensors could be cameras, and feature extraction could be an estimate of school velocity over time. The classifier would then get a vector of these estimates, coined a feature vector. Given a set of classes, for example “sick” and “healthy” fish, the classifier would try to classify the feature vector as belonging to the sick or healthy classes.

A classifier is an algorithm that implements classification, that is it takes in some data and assigns it a category or class. The basic principle in statistical classification is seen below.

Here x_1 and x_2 are features, and the space their Cartesian product spans out is refereed to as feature space. Each point here is a feature vector, where blue and green belong to two different classes. The goal of a classifier is to find a decision boundary, here the red line. This allows us to classify new data by knowing on which side of the line it falls in feature space. In our case, these features would have 50 dimensions, making a graphical representation quite impossible.

The same principle is also used by the classifier we implemented, a neural net. With machine learning we do classification by what is called supervised learning, where we beforehand have a set of data with an assigned class for training. When we input some of this data, consisting of feature vectors, the classifier (for example a neural net) will give some output. Since we know the class in the training set, we have a preferred output, say 0 for blue and 1 for green. Some smart people have figured out that if you define a loss function as (target-targetOutput)^{2 }one can define functions called optimizers which attempt to minimize this loss. The smart thing here is that the human does not need to know exactly how the classifier tells these two apart, while insight into the data is important one can almost apply this concept blind-foldedly. In our case we can easily create training data, and train a classifier without knowing how it tells the classes apart.

Our detection system consists of a simple neural net where the number of inputs is the number of measurements per simulation. The structure of this neural net was found by trying different configurations on our own. All the data is normalized to the interval [0,1] with min-max feature scaling. Our measurement is defined as

Containing relatively little information, this measurement is just a scalar additive of all the positional components of all N fish. All positional components are used as absolute numbers, placing them only in the positive side of their axis. Our neural net uses the normal sigmoid activation function, mean square error loss function and the ADAM optimizer. The ADAM optimizer has done well in literature, and outperformed the classic gradient descent in our case. More information on ADAM can be found here [5], we used the parameters that the authors advised on page 2 of their article.

Each node calculates an activation function while the arrows represent weights. For the final output we simply classify on the intervals (-infinity, 0.5] and (0.5, infinity).

During the simulation, we take 50 measurements which are fed straight to the neural net after normalization, so our input layer has 50 nodes.

Neural net architecture

Layer | Input layer | Hidden 1 | Hidden 2 | Hidden 3 | Hidden 4 | Hidden 5 | Output layer |

Number of nodes | 50 | 40 | 30 | 15 | 10 | 5 | 1 |

Our classifier is trained on 1600 samples, requiring 1600 simulations, 800 “sick” and 800 “healthy”. The number of training iterations over these samples is set to 1000. The number of test samples is 10 % of the 1600 samples, meaning we had 160 test samples. The set of feature vectors is shuffled before being divided into training and test sets.

**Results**

The most natural part of our simulations to classify is the steady state, at this point the fish are actually schooling. But there is another phase, the transient one that turns out to also be very interesting. There is a big difference between the ease of classification when it comes to these two phases of the simulation. In the start of a simulation there is no order, the system of fishes then undergo a phase transition to an ordered school system. After this phase transition the fish are organized, for example they could be migrating in a common direction. It turns out that it is much easier to classify the fish if one has the additional information contained in this phase transition.

Steady state:

The part of the simulation which was steady state was determined visually and was found to be between 30-100% of the simulation, meaning we take our measurements between the 600^{th} and 2000^{th} iteration. We then take 50 samples of our measurement evenly distributed throughout this part of the simulation.

Parameters for “healthy” fish

Parameter | value |

Radius of attraction | 150 |

Radius of orientation | 55 |

Radius of repulsion | 14 |

Attraction weight | 1.4 |

Repulsion weight | 0.7 |

Self weight | 1 |

Orientation weight | 2 |

In our model these parameters give a migratory solution, in other words the steady state is that most or all fish move together in a defined direction. These “healthy” fish are used as our baseline and we want to see if we can tell them apart from other fish whose parameters we have changed, coined “sick” fish.

Parameters for “sick” fish during steady state classification

Parameter | value |

Radius of attraction | 150 |

Radius of orientation | 45 |

Radius of repulsion | 10 |

Attraction weight | 0.7 |

Repulsion weight | 0.35 |

Self weight | 3 |

Orientation weight | 1.5 |

The sick fish in this simulation are less reactive and have a lower affinity to orient with others and move towards others. While the choices of parameters might seem a bit arbitrary the important thing here is that the change in parameters does not change the type of steady state that we end up in. So the system with these parameters has a migratory solution. For us to say that we are comparing them this is most important part. Some solutions might be circular or still, comparing them to each other and saying one detects the difference has no value. To clarify, if the parameters changes in such a way that the steady state becomes stationary or circular then these are just two legitimate solutions to the system. So telling them apart does not mean schooling performance has changed.

Our classification results are evaluated with the terms accuracy, precision and recall. Accuracy is defined as where TP is true positive, TN is true negative, FP is false positive and FN is false negative. True refers to a correct classification and false refers to a wrong classification.

Precision is defined as we then construct what is called a confusion matrix as seen below.

Classified\True | Healthy | Sick | Precision |

Healthy | TP | FP | |

Sick | FN | TN | |

Recall |

With this setup we routinely get an accuracy of over 65%, in the case shown below we had a good run, getting an accuracy of 71.25%. These changes are caused by differences in initialization of the neural network, the initial weights are always random in the beginning.

Classified\True | Healthy | Sick | Precision |

Healthy | 61 | 33 | 0.65 |

Sick | 13 | 53 | 0.80 |

Recall | 0.82 | 0.62 | Accuracy: 0.7125 |

Transient

Here we sample between 0-70% of the simulation, and we use the same parameters for sick and healthy as above. We then have the same interval size, but the additional information is contained in the phase transition. While the accuracy here is also variable we routinely see it go over 74%.

Classified\True | Healthy | Sick | Precision |

Healthy | 68 | 23 | 0.75 |

Sick | 15 | 54 | 0.78 |

Recall | 0.82 | 0.70 | Accuracy: 0.7625 |

These differences in accuracy might be a good indication that it would be the same case with real fish, and that also in the real world there would be extra information contained in the phase transition . The phase transition here is a transition between two different global behaviors. In our case we go from not schooling to schooling, in the real world it might be interesting to look at other phase transitions such as disruptions caused by feeding. The exact characteristic of these phase transitions is not known by us, only by the “black box” classifier, it might for example be the time it takes to form a school. One could maybe impose a phase transition in fish by feeding or making them change habitat routinely. This could for example be a system where some gates open and the fish are pushed slowly into another tank: doing this back and forth might allow for frequent observation of phase transitions. While our accuracy might not be perfect, it is not of great concern. This is because in the real world, one could measure and evaluate many times, and if the number of “sick” classifications is high then one could contact a veterinarian or follow some procedure. It makes sense that in the early phases of a disease there is little information to indicate its presence. Therefore, early classification would come with some unreliability. But we are sure others can take this much further. Another result that we believe to be important is that these changes are very hard to spot as humans, even with such drastic parameter changes. As mentioned in our other texts most of the behavioral symptoms described in literature are quite obvious to humans in a disease situation. They have been documented and used by for example veterinarians, but we believe that some smaller changes changes might actually occur much earlier and our project shows that technology can pick up these changes much better than humans at an early stage.

## Parameters that are kept constant

Name | Value |

Length of the sides of cubic tank (boundary) | 1000 |

Scalar multiple for initial normally distributed fish position | 100 |

Amplitude for normally distributed velocity noise (mu=0, sigma=1) | 0.3 |

Amplitude for normally distributed position noise (mu=0, sigma=1) | 1 |

Number of fish | 100 |

Number of dimensions | 3 |

Time step | 0.3 |

Number of steps | 2000 |

Mean velocity | 1 |

Max velocity | 3 |

Initial variance of velocity | 1 |

Number of measurements | 50 |

**Bibliography **

** **

[1] Alethea Barbaro , Bjorn Birnir, Kirk Taylor, (2006) “Simulating The Collective Behavior of Schooling FishvWith A Discrete Stochastic Model” funded byThe National Science Foundation,The Research Fund of The University of Iceland, https://www.mrl.ucsb.edu/sites/default/files/mrl_docs/ret_attachments/research/KTaylor.pdf

[2] T. Vicsek, A. Czirok, E. Ben-Jacob, I. Cohen, and O. Shochet. Novel type of phase transition in a system of self-driven particles. Physical Review Letters, 75(6): 1226-1229, 1995

[3] Uchitane, T., Ton, T. V., & Yagi, A. (2015). An Ordinary Differential Equation Model for Fish Schooling. arXiv, 1508.05597. Retrieved from https://arxiv.org/abs/1508.05597v2

[4] Lisette de Boer (2010) “What makes fish school?”,Master Thesis ,Mathematical Institute, Leiden University

[5] Kingma, D., & Ba, J. (2014). Adam: A Method for Stochastic Optimization. International Conference on Learning Representations. Retrieved from https://www.researchgate.net/publication/269935079_Adam_A_Method_for_Stochastic_Optimization