Team:IISER-Pune-India/Software

DRY LAB

Overview


Malaria is a life-threatening disease caused by Plasmodium parasites that are currently gaining resistance to antimalarial drugs. The current gold standard for malaria diagnosis is microscopy-based diagnosis performed by trained pathologists and lab technicians, who analyze blood smear images to reach a diagnosis. These methods, however, are time-consuming, expensive, inaccessible, elaborate, require the presence of an expert and are prone to human errors.

Our objective is to present a far-reaching, robust solution that involves minimum human intervention, without compromising on accuracy and keeping in mind the fragilities of the Global healthcare system. Our solution is to incorporate Artificial Intelligence to overcome these hindrances and use Modern deep learning techniques to automate the process with high accuracy and alleviate the need for trained personnel. This would make the diagnosis of malaria easier and accessible to more people worldwide. We propose the use of an entirely automated Convolutional Neural Network (CNN) based model for the diagnosis of malaria from the microscopic blood smear images.

Our deep learning-based model can detect malarial parasites from microscopic images with an accuracy of 95.45%

For the practical validation of model efficiency, we deployed the miniaturized model in a server-backed web application. Data gathered from these environments show that the model can be used to perform inference under 1 s per sample in online (web application) mode, thus engendering confidence that such models may be deployed for efficient practical inferential systems. We envision that using this software, batch-processing of Blood Smear Images could be performed to evaluate and report results in just seconds, which would otherwise take days.

We call our software: DeLeMa Detect- DEep LEarning for MAlaria Detection


Fig 1 : A Brief Overview of how we built DeLeMa Detect

Our Goals


  1. Build a Deep Learning Classification model for Malaria Diagnosis

  2. Develop a Web API that is fast, secure and reliable for processing blood smear images and communicating results.

  3. Deploy the Deep Learning model at the backend of the Web API.

Some Characteristics of the Software

The Model must be scalable and be accessible across all platforms (smartphones and websites across the world). This imposes a constraint on the size of our model and the processing power required.

We wanted the model to provide results as accurately as possible without highly compromising on processing power, usability and the false positivity rate.

The Model is used at the backend of a Web App that is deployed on the cloud. For easy and fast access, the application must be lightweight and low-boot time.


Describing the dataset


The data for our model and software comes from researchers at the Lister Hill National Center for Biomedical Communications (LHNCBC), part of the National Library of Medicine (NLM), who have carefully collected and annotated the publicly available dataset of healthy and infected blood smear images. They used Giemsa-stained thin blood smear slides from 150 P. falciparum-infected and 50 healthy patients, collected and photographed at Chittagong Medical College Hospital, Bangladesh. [1] The smartphone's built-in camera acquired images of slides for each microscopic field of view. The images were manually annotated by an expert slide reader at the Mahidol-Oxford Tropical Medicine Research Unit in Bangkok, Thailand. The dataset is balanced with 13,779 malaria and 13,779 non-malaria (uninfected) cell images.

Rajaraman, et al., developed six pre-trained models on a dataset to obtain an impressive accuracy of 95.9% in detecting malaria infected vs. non-infected samples. [1] Our focus was to start with some simple Machine Learning algorithms for binary classification, and then move on to some simple CNN models from scratch and a couple of pre-trained models using transfer learning to see the results we could get on the same dataset. We used open-source tools and frameworks, including Python and TensorFlow, to build our models.


Preliminary Exploratory Data analysis


Some images from the dataset

Fig 2(a) : Infected images


Fig 2(b) : Uninfected images

To understand the distribution of our high-dimensional dataset, we used unsupervised learning algorithms like Principal component Analysis (PCA) and visualization algorithms like t-SNE (t-distributed stochastic neighbor embedding) to visualize the dataset of images in two and three dimensions. The Python notebooks can be found on our Github repository . We found the dataset to be very noisy and even advanced algorithms like tSNE could not define definite clusters for each class of images (infected v/s uninfected). Since the distance between any two points of different classes in the hyperspace was so little and indistinguishable, we hypothesized that the dataset might contain misclassified images.

Fig 3 : PCA results

Machine Learning Model



Fig4 : A light-hearted comic on building models. [8]

We believe that a Good Model is able to explain most / if not all of the variables in the data and is a result of a continuous process of (Design → Analyse → Test → Repeat). Therefore, we started with the simplest Machine Learning classification algorithms, namely, Logistic Regression, K Nearest Neighbours, Random Forests and Naive Bayes Estimation, to classify the images. We preprocessed each image to 32x32x3 pixels and broadcasted each image into a 3072 vector. We then trained our classification algorithms to see how well they performed. (Link to our Python Notebooks)

Our results are summarized in table (1). We found that the ML models could not comprehend the complex features of the dataset and suffered from high bias (Underfitting). To increase the accuracy and goodness of fit, we had to develop a more complex model or tune every hyperparameter in each model for better scores. We chose to proceed with the former because of the richness and precision that Deep Learning models provide.


The Deep learning model


We initially started with simple machine learning classification models to estimate the model performance on the complex dataset. For our deep learning models, we split the original dataset into 80/20 training and validation sets. We also performed Data Augmentation to prevent overfitting. Since our task was binary classification, we used the Binary Cross entropy loss function :

Binary Cross-Entropy/Log Loss

We then built a simple Convolutional Neural Network for the same classification task. We performed hyperparameter tuning using a random search algorithm to find the optimum hyperparameters from search space to yield the best performance on the validation dataset. We found the Adam Optimization algorithm [2] performed the best.

Architecture of Custom CNN:

We then developed a custom convolutional neural network to classify blood smear images and obtained a validation accuracy of 94.89%. We noticed that even though the model size was small, it could not perform well on test data. One way to achieve higher accuracy is to build a more complex model and train longer. However, our dataset of 27,000 images is very small when compared to conventional datasets (containing millions of images) used for Deep Learning. Hence, we decided to use Transfer learning, wherein we modify an existing network, extensively trained on millions of other images for our small dataset.

Fig 5 : Performance Results of our Custom Convolutional Neural Network


Why transfer learning?


Just like humans have an inherent capability to transfer knowledge across tasks, transfer learning enables us to utilize knowledge from previously learned tasks and apply it to newer, related ones. Transfer learning consists of taking features learned on one problem and leveraging them on a new, similar problem.[3] For instance, features from a model that has learned to identify protein motifs and binding regions based on structure images may be useful to kick-start a model meant to identify blood smear images.

Our goal was to use a model that had a considerably small size of few Megabytes, had a very good accuracy and precision rate, and used less processing power. Based on these constraints, we trained across multiple famous deep learning models, namely, ResNet50, VGG16, VGG19, Mobilenet_v2. After analyzing each model's results, we decided to use Mobilenet_v2 for deployment.

A Convolutional Neural Network architecture identical to ResNet50 [4] was used by Team Heidelberg 2017 to develop DeeProtein (a deep neural network trained on ~10 million protein sequences and able to infer sequence-function relationships)[5]. We used the same model architecture and added a single dense neuron for Malaria Blood-smear image classification. In our Python notebook, we used the original weights to classify images.

The pre-trained Mobilenet_v2 is a family of general-purpose computer vision neural networks designed by Google AI, specifically with mobile devices in mind to support classification and detection built on the ImageNet database with a lot of diverse image categories. Therefore, the model should have learned a robust hierarchy of features, which are spatial-, rotational-, and translation-invariant with regard to features learned by CNN models. “The ability to run deep networks on personal mobile devices improves user experience, offering anytime, anywhere access, with additional benefits for security, privacy, and energy consumption using depthwise separable convolution as efficient building blocks.” [6] Hence, the model can act as a good feature extractor for new images suitable for computer vision problems like malaria detection.


Understanding the Mobilenet_v2 model


Mobilenet V2 is a convolutional neural network that is 53 layers deep (much more than our custom CNN) and introduces two new features to the architecture of neural networks

[6]
  1. Linear bottlenecks between the layers
  2. Shortcut connections between the bottlenecks

The basic structure is shown below. The intuition is that the bottlenecks encode the model’s intermediate inputs and outputs while the inner layer encapsulates the model’s ability to transform from lower-level concepts such as pixels to higher-level descriptors such as image categories. Finally, as with traditional residual connections, shortcuts enable faster training and better accuracy.

Fig 6 : Overview of MobileNetV2 Architecture. Blue blocks represent composite convolutional building blocks as shown above (6).

The transfer learning workflow that we followed to build our model is as follows:

  1. Take layers from a previously trained model (Mobilenet_v2, ResNet50 etc).
  2. Freeze them, so as to avoid destroying any of the information they contain during future training rounds.
  3. Add some new, trainable layers on top of the frozen layers. They will learn to turn the old features into predictions on a new dataset.
  4. Train the new layers on your dataset.

A final step is fine-tuning, which consists of unfreezing a part of the entire model (in our case, the layers 50-53 of the model), and re-training it on the new data with a very low learning rate. This can potentially achieve meaningful improvements, by incrementally adapting the pre-trained features to the new data.


Solving the problems of Modern Deep Learning

How did we avoid overfitting on the training data?

  1. By overfitting, the model becomes very well adapted to the training data, but it fails to generalize well to new data
  2. Data Augmentation
    • To prevent overfitting on the training dataset, we introduced random flips, rotations, zooming, and shearing on the training dataset to increase the diversity of the training dataset.
  3. Using Dropout layers
    • Dropout is a Regularization technique that randomly drops certain neurons in a network during every step of training.[7]
    • When we drop different sets of neurons, it’s equivalent to training different neural networks. So, the dropout procedure is like averaging the effects of a large number of different networks. The different networks will overfit in different ways, so the net effect of dropout will be to reduce overfitting.
    • In dropout, the network can’t rely on one feature but will rather have to learn robust features that are useful.

Hyperparameter tuning

Hyperparameter tuning is an important aspect of modern deep learning where many hyperparameters (Learning rate, Mini-batch size, number of hidden units, number of layers, etc.) determine the overall model accuracy and performance. We performed a Random grid search in hyperparameter space to determine the best hyperparameters. This was performed using Keras Tuner.


Evaluating Performance metrics


The Receiver Operating Characteristic (ROC) curve summarises the prediction performance of a classification model at all classification thresholds. The ROC curve particularly plots the False Positive Rate (FPR) on the X-axis and the True Positive Rate (TPR) on the Y-axis.

  • TPR (Sensitivity) = TP / (TP + FN)
  • FPR (1 - Specificity) = FP/ (TN + FP)

The “steepness” of ROC curves is very important since it is ideal to maximize the true positive rate while minimizing the false positive rate.




ML Model Accuracy F1-score AUC
Logistic regression 0.67 0.67 0.73
K-Nearest Neighbours 0.61 0.60 0.62
Random Forest 0.75 0.75 0.85
Naive Bayes Classification 0.62 0.55 0.62



Fig 7 : ROC curves for our Machine Learning Models


Deep Learning Results

Model Accuracy(%) F1-score Model Size
Custom CNN 94.89 0.95 9MB
ResNet50 95.20 0.95 91MB
VGG-16 95.81 0.92 56MB
Mobilenet_v2 95.43 0.96 16MB




Fig 8 :Results of Training the Mobilenet_V2 Model (A Final Validation accuracy of 95.43%)

DeLeMa Detect


Complex models such as our neural network will end up taking more time to compute, more time to load into memory on a cold start, and may prove more expensive to run on all kinds of devices. So, we decided to deploy the lightest model of Mobilenet_v2 as a Web Application for testing. We build a miniaturized backend framework using FLASK (a micro-web framework written in Python)

We have deployed this miniaturized testing version of the Mobilenet_v2 based model as a Web Application on Heroku, a cloud-based platform as a service where blood smear images can be uploaded for immediate results. Since it is a testing site and being run free of cost, the application may not always load on every browser. To tackle this issue, we have uploaded all files required for locally testing and using DeleMa-Detect, under our GitHub Repository.

App deployed on Heroku: DeLeMa Detect

Here is a set of infected and non-infected imaged that you can use to test on the app!
Images


Future Thoughts


A Model with 95-96% accuracy for blood smear classification may not be the best for Medical Diagnosis in the Market but our model size and processing speed can over-power the Heavyweight Classifiers. For improving the performance, we would need more data from the foldscope device that we build which could not be tested rigorously due to the COVID-19 pandemic. New data from this will play a vital role in improving the Validation and Test set accuracy.

We envision that in the future, batch processing of blood smear images can take place which will improve the speed of testing by many folds without any loss of accuracy and with minimum human intervention.

We also aim to build a Web API, on top of our already deployed application, that can be accessed from every mobile device, laptop, and desktop around the world. A Batch of Blood Smear Images could be sent to the API and results, along with probability scoring will be delivered to the sender in a matter of seconds.


(Since the Model sizes of ResNet-50 and VGG16 were greater than the sizes accepted on GitHub, we have uploaded all models on a public Google Drive link . The .h5 file can be downloaded and used for future purposes)

Tutorial on how to Upload images and test our software :




Some Resources that we found particularly helpful for our building and testing our software:

  1. Transfer Learning
  2. ROC Curves
  3. Deploying Models
  4. Transfer Learning
  5. FLASK

References


[1]:Rajaraman, S., Antani, S. K., Poostchi, M., Silamut, K., Hossain, M. A., Maude, R. J., Jaeger, S., & Thoma, G. R. (2018b). Pre-trained convolutional neural networks as feature extractors toward improved malaria parasite detection in thin blood smear images. PeerJ, 6, e4568. doi.org/10.7717/peerj.4568

[2]:Zhang, Z. (2018). Adam Optimizer for Deep Neural Networks. 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS). doi.org/10.1109/iwqos.2018.8624183

[3]: Team, K. (2020, May 12). Keras documentation: Transfer learning & fine-tuning. Keras.Io. Link

[4]:He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778)

[5]: Team:Heidelberg/Software/DeeProtein - 2017.igem.org. (2018, October 27). IGEM.Org. Link

[6]:MobileNetV2: The Next Generation of On-Device Computer Vision Networks. (2018, April 3). Google AI Blog. (Link)

[7]:Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, Journal of Machine Learning Research, 2014.

[8]: XKCD