Software
Overview
In order to find the reasonable protease to cleave the SARS-CoV-2 spike protein, while remmaining unharmful to the hACE2 protein. We developed a program that combines the strengths of web scraping and regex sequence matching in order to derive a satisfying choice for our wet lab experiments. Though, our wet lab results are not 100% definite, we are sure that this software can be helpful to anyone who is trying to find adept proteases for certain experiments.
Web Crawler
By making the most out of easy to implement python packages, we are able to implement a python program that crawls data from the MEROPS website, which is the main protease databank worldwide right now. MEROPS collects findings from different researches and organizes them into a structured form, where all the cleavage sites of a protease that were proved to happen were listed down. Therefore, we crawled and parsed the data to store them all in a .csv file where we only store the information we need, which includes the MEROPs ID of the protease, the name of the protease, and the 8 amino acid cut site sequences of that particular protease. We store the data for later sequence matching which will be introduced in the next section.
Sequence Matcher
Here, in view of speeding up the process of sequence matching where speed is of much importance. We decided to develop the program using C++. Our program starts by requiring the user to pass in two .fasta sequences as arguments, and then it searches through the merops database that is crawled from the internet in the previous section. At last, it outputs all the protease that cut and their cut sites on the two proteins respectively. On the other hand, it also stores the proteases that cut only one of the two sequences which is what most users are looking for just like in our case. Therefore, by implementing it this way, our program becomes more versatile and portable. It can deal with different sequences using different protease databases(they need to be stored in the same format however), making it a ever more powerful tool for future iGEM teams or anybody who is interested in protein structure-related investigations.
Check Out our Software on Github !