Modular Design of Multi-Epitope DREP Vaccines: TNBC Experience

Introduction

Our step-by step approach for the modularity of our pipeline.

It has been a real controversial issue regarding the selection of the Epitopes used in our vaccine design. We had to carefully identify and select TNBC Hotspot neoantigens and thereby analyze the associated mutations of these genes, expressed in the TNBC cases presented. Accordingly predict the immunogenic T-cell and B-cell epitopes using Custommune. Further steps in our plan included Validation of these epitopes concerning the use of different computational aspects to assess their response and degree of immune system enhancement (e.g.: IFN-Gamma response).

HOTSPOT Neoantigens Retrieval

Inspiration

During the development of tumors, they obtain mutations that later on give rise to neoantigens which can be recognized by T cells.

Figure (1)
T-Cell recognizing neoantigen on tumor

Main problem

Cancer cells regardless of their type have hundreds of thousands of neoantigens.

And that enormous diversity limited our ability to develop a “one fits all” drug without doing very costly measures such as sequencing of whole cancer genomes which was only useful for personalized neoepitope targeting for that specific patient.

Figure (2)
Tumor cells obtaining mutations during its development

Our Focus

This year our main focus was on breast cancer and more specifically triple negative breast (TNBC) cancer subtype. So, we focused on finding a specific group of neoantigens and epitopes that are highly expressed and preferably limited only to TNBC.

Data retrieval

First we used The Gene expression omnibus (GEO) to search for datasets that contained samples of patients affected by triple negative breast cancer. Doing so we were able to find 21 datasets listed in figure(6) containing genomic samples of TNBC patients.

Then we were able to retrieve a huge and form a huge dataset from multiple databases in literature, This dataset contained neoantigens that were associated with all breast cancer types.

After that we collected data about most frequently mutated genes in TNBC from the TCGA.

Figure (3)
Hotspot TNBC neoantigen pipeline

Method

These previously mentioned datasets were filtered according to the more vicious traits of The tumor to know the most expressed genes in these samples ,and then according to logfc & t value by which we were able to minimize the data in them.

Finally using the final 3 datasets we tried to find intersections between them using tools such as genevenn, and also from literature mining all of which by we detected a list of shared neoantigens to work on

(CD79A,CDH1,NCOA1,PDE4DIP,MYC,JAK1,NCOA2,PICALM,TRRAP,ATRX,MSH6,PRKAR1A,FLI1,MAP2K2,MET,SPEN,TFE3,ARNT,SMARCB1,IKZF1,NR4A3,RUNX1,AR,NTRK1,RBM10,MSH2,MUC1, Lck, PSA, PAP, MRP3 PTHrP, HNRPL, WHSC2, SART3, CypB, UBE2V, EGFR)

Figure (4)
Datasets intersection using Gennvenn tool

GeoDataset	Top Genes	GeoDataset	Top Genes
GSE135565	SCUBE2 GRB14 MPP7 TPD52 MMP16	GSE83937	MIR3663HG FGFR2 ATAD3B GSTT1 WIPF3
GSE27447	EGR2 CHRM3 PTCHD1 GFRA3 EPHA7	GSE86839	ADGRF1 AC005219.1 CALB2 NRK DMBT1
GSE59614	FOS PMEPA1 ACAT2 LSS INSIG1	GSE88715	EPCAM DHCR7 KRT15 CREB3L4 CD24
GSE79332	DMBT1 ACSL5 IFITM1 TLR3 CTSS	GSE90505	CXCL9 MMP1 IL4I1 S100A8 MMP9
GSE79721	ZBTB4 DENND4B EVA1A RNF214 BIVM	GSE90564	MAGEC2 PDE1C OR6X1 PDE1C CFI
GSE96653	C2CD4C ASB17 OR10H4 SH2D3C ZNF266	GSE95700	TRAPPC10 ITCH PRRC2C FAM160B2 STRN3
GSE103091	AFDN GNAQ JAK1 MIR1248 TMX2-CTNND1	GSE103668	DNAJC22 TPD52L3 ZDHHC11B DST ZNF107
GSE106100	DCDC2B GALR2 TMPRSS11F IGBP1P1 PIP	GSE114269	AFAP1-AS1 GZMB BIRC3 CXCL13 UBD///GABBR1
GSE97342	NXPH1 GREM2 SLC30A2 SIX2 IL17C	GSE118539	RBM5-AS1 MARS1 ETFA FAM151A TSNARE1
GSE88847	NELL2 IFIT1	GSE106782	CFHR1 C11orf49 CTDSP1 WDR89 SEPP1
GSE114359	LINC01191 AC110769.1

TNBC Neoantigens Retrieval Process

It has been widely recognized that T-cell epitope prediction is a big challenge because of the high degree of MHC polymorphism and disparity in the volume of data on various steps encountered in the generation and presentation of T-cell epitopes in the living systems.

After searching for the most common TNBC neoantigens through different Datasets and filtering them according to those related to the Egyptian population, we identified a set of promising TNBC neoantigens that we worked on.

Identifying common mutations in TNBC neoantigens

We took each neoantigen to be examined in TCGA revealing usual sites of mutation in specific TNBC cases that were selected under special conditions and circumstances.

Figure: Showing sequence of antigen selection, neomutation identification and corresponding epitopes prediction.

TNBC Cases Presentation and neomutation retrieval using GDC Data portal:

We worked on a group of 173 TNBC cases that have been classified into Four groups, BL1, BL2, M and LAR.

Figure: Representing the total 173 TNBC cases retrieved for further identification of common neomutations present in each case

Sequence analysis and selection of Immunodominant regions of each TNBC neoantigens using Custommune:

Custommune tool was used to Analyse the DNA sequence of these neoantigens and predict Candidate epitopes and screening them for a highly ranked recognition score by multiple HLA alleles.

Custommune Simulates predicted escape mutations and calculates the HLA-affinity for mutated epitopes Providing a final, ready-to-use list of epitopes. It also builds a Consensus sequence of each Input of TNBC neoantigens that is beneficial later on for B-cell epitopes prediction.

Neoepitope-Discovery and Computational Validation.

Figure: Showing in-silico approaches and computational tools for epitope prediction.

Identification Of MHC I & MHC II Epitopes And Their Prediction Using Custommune

Figure: HTL & CTL Epitopes PredictionUsing Custommune.

Neo-epitopes discovery by Custommune

Custommune predicts the T-cell epitopes of the presented neomutations and how it ranks these epitopes based on their custoscore and an epitope scoring method, called the IC50, in addition to other special filtration and scoring parameters, as shown in the equation below:

S = 10000 * 〖"(IC50)" 〗^(-1) - DFIRE + EscapeM *500 + CScore *1000 + LocationScore * 500 – SDaffinities + DOverlap *500

Custommune assesses the Location Score of the probable mutations in each epitope and locates them in the evolutionary conserved regions, giving them C-score, as well as analyzing the previously reported escape mutation for each peptide. After that, custommune calculates D-Overlap through comparing these escape mutations with Literature and other Clinical trials.

Eventually, it estimates the Standard-Deviation affinity and docking of each epitope to a set of HLA-alleles by Affinity robustness that we call the D-Fire score.

Figure: Predicted Cytotoxic T-Cell epitopes of the 17 TNBC antigens.

Figure: Predicted Helper T-Cell epitopes of the 17 TNBC antigens.

Prediction of linear B-cell epitopes

Anticipating B-cell antigenicity is useful and urgent in our circuit plan. Be that as it may, it is confounded to anticipate B-cell epitopes by computational apparatuses. Now and then, characterizing a T-cell epitope may bring about the ID of a B-cell epitope, since B-cell epitopes have appeared to colocalize with T-assistant epitopes. Utilizing in-silico procedures and B-cell epitope forecast instruments, we had the option to get excellent b-cell epitopes from IEDB.

Figure: Illustrating the steps of B-cell epitopes retrieval and analysis using IEDB.

Custommune analyzes the DNA sequence and gives us Consensus sequence by which we can predict the B-cell epitopes via Random forest algorithm using IEDB that is trained on both epitope and Non-epitope A.A. sequence and show us here with the yellow color above a threshold of 0.5, the promising B-Cell epitopes compatible with our vaccine .

After retrieving the Consensus sequence from custommune, it's put into iedb.com to obtain the predicted linear peptides for B cells then evaluating them to be added to our circuit.

Figure: Evaluating the score of predicted B-cell Linear peptides using IEDB.

Validation of the Epitope prediction algorithm “Custommune” using Published Experimental & Clinical Data.

Figure: This (Design-Build-Test) Cycle Illustrates the validation steps for epitopes prediction algorithm through Integrated practices.

The way that peptide-MHC complexes stimulates the adaptive immune response by producing clonotypic T cells illustrates the significance of playing out various measures to more readily characterize the role of different immunogenic peptides. This in turn emphasizes the importance of utilizing HLA-transgenic mouse lines over in silico or in vitro methods to validate the previously predicted peptide epitopes.

Figure: Experimental correlation and Validation of MUC1 epitopes immunogenicity

Evaluation of epitopes immunogenicity and cytotoxicity using Pearson Correlation Coefficient

Calculating R-value of our predicted epitopes using Pearson correlation coefficient, we plotted a correlation between results for IFNγ Elispots showing CD8-specific MUC1 responses and IC50 of the corresponding peptide pool (RKNYGQLDI-RRKNYGQLDI-RRKNYGQLDIF-RRKNYGQLDIFPARD) from Custommune.

Other Neoantigens:

Implication for immunomonitoring of new (Lck, PSA, PAP, and MRP3 PTHrP, HNRPL, WHSC2, SART3, CypB, UBE2V and EGFR) epitopes in vitro and in clinical trials

We collected data about a mixture of 19-peptides obtained from an early phase II clinical trial on 14 advanced metastatic triple-negative breast cancer (mTNBC) patients refractory to systemic chemotherapy to develop a new type of cancer vaccine.

In this trial the expression levels of the 11 vaccine antigens that code these 19 peptides were examined by IHC staining of primary breast cancer (n = 20, including 5 TNBC) and metastatic breast cancer tissues (n = 20, including 5 TNBC).

From that review we noticed that 4 of these peptide epitopes (PTHrP, HNRPL, WHSC2, and SART3 ) antigens were expressed in all breast cancer detected.

While CypB, UBE2V, EGFR, Lck, and MRP were communicated in 70%, 60%, half, 10%, and 0% of primary tumors, and 100%, 100%, 30%, 10%, and 10% of metastatic tissue, individually. Interestingly, neither PSA nor PAP was expressed in any breast cancers tested.

Figure: Early phase II clinical study testing 19‐peptide cancer vaccine monotherapy on 14 advanced metastatic triple‐negative breast cancer (mTNBC) patients. DOI 10.1111/cas.14510

In addition, It was reported that Lck, PSA, PAP, and MRP3 were expressed in breast cancer tissues, although the frequency of expression was lower than that of other TAAs.

Our Final Proposed Vaccines’ structure

We implemented these epitopes into one Confined Design. We categorized all the epitopes with their linkers that we collected into 3 vaccine models then we added to them essential fragments that include

An SGP-promotor A Beta-Defensin adjuvant, Pan HLA-DR binding epitope (PADRE), our sets of epitopes for each Class and finally an innate immunity stimulant Heat Shock Proteins, all linked by our optimized Linkers

Figure: Showing Immune-modulating adjuvant B-Defensin and PADRE (Pan HLA-DR epitopes) sequence were added with epitopes sequence to enhance the immunogenicity. All the epitopes, adjuvants and PADRE sequence were joined by linkers.

Computational Simulation of & the Immune profiles of our Final 3 forms of TNBC Vaccines according to their IFN-γ response.

Figure: This (Design- Build-Test) Cycle depicts the Computational assessment and Immunogenicity validation of the proposed vaccine versions.

We Scanned the IFN-gamma response of our proposed vaccine Isoforms according to the Support Vector Machine algorithm, or what we call the SVM-Score.

Figure: Percentage of epitopes predicted to have positive INF response.

We also estimated and assessed the immunogenicity and efficacy of each one of our 3 proposed vaccines using the C-IMMSIM simulation model to describe both Humoral and Cellular response of our presented epitopes at a mesoscopic scale.

Figure: These diagrams show serum Immunoglobulins, including IgM and IgG isotypes, where the 3rd vaccine was found to be of highest Immunoglobulins levels.

We can also observe the cytokines & Interleukins, released by each version, noticing that the 2nd vaccine is associated with the least IL-2 Levels , even less than the other 2 vaccines.

Figure: Analyzing the total count of B-Lymphocytes, that are classified here into Memory and Not-memory B-Cells.

The 1st vaccine recorded a remarkable increase in Ig-M secreting plasma cells, reaching its peak within one week of injection.

Figure: This plot shows the total and memory T-helper Cells that are classified into Active, Resting, Anergic and Duplicating CD4-T cells, with no difference among the 3 versions. But regarding the Regulatory T-Cell Count. The 2nd Vaccine shows the least amount of Active T-reg.Cells expressed.

Meanwhile, Studying the difference among the 3 vaccines, regarding the stimulated CD8-Cell populations, MQs , NKCs and Dendritic cells, we found no significant variation among them.

Figure: -CD8 T-cytotoxic lymphocytes count. Total and memory shown.
-CD8 T-cytotoxic lymphocytes count.per entity-state.

Figure: -Natural Killer cells (total count).
-Macrophages:Total count, internalized, presentation on MHC class-II, active and resting.

Figure: -Dendritic cells. DC can present antigenic peptides on both MHC class-I and class-II molecules. The curves show the total number broken down to active, resting, internalized and presenting the antigen.
-Epithelial cells. Total count broken down to active, virus-infected and presenting on class-I MHC .

Linker Optimization

Designing A Multi-Epitope Based Vaccine Construct.

Figure (1): selected peptide fragments fused to each other by (EAAAK)3 & (GPGPGPG) linkers.

Exploitation of in-silico approaches for Our Vaccine design

This (Design-Build-Test) Cycle Shows our workflow through structural refinement of our Vaccine based on Linker optimization.

Implementation of vaccination as a tool in fighting wide spread diseases has resulted in substantial strides in the combat against many types of cancer. The one we focused on was the Triple Negative Breast Cancer (TNBC).

The main concern was finding the best immunogenic epitopes for our DNA vaccine plan, ailment counteraction, finding, and therapy.

In our project, we started with a collection of absolutely specific epitopes for TNBC (triple –ve breast cancer) from Custommune tool which is an automated tool for designing personalized and population-targeted peptide vaccines, and arranged it with linkers.

Pepfold3.5 was used to optimize linker selection for these peptide epitopes.

Linker selection parameters

We started by collecting a group of linkers from Igem Registry for Protein domains/Linker.

Utilizing pepfold3.5 in view of another Hidden Markov Model problematic compliance testing approach empowered us to choose the most fitting linker for every 2 progressive epitopes, as indicated by certain linker choice parameters, including:

This offered a likelihood to create up-and-coming adaptations of peptide-protein edifices going through further refinement to come to the most appropriate linker for every 2 arrangements of peptides.

Figure: shows Pepfold3.0 best model of linker optimization for peptide epitopes, where (EAAAK)3-represented by green Helix- works as the best linker fitting both Red-coloured strand FLSFHISNL & magenta coloured strand KFLGLSNIKF .

Assessment of linker optimization using PEPFOLD 3.5

As a vital component of recombinant fusion proteins, linkers have appeared of great significance within the development of steady, bioactive fusion proteins. They are classified into 3 categories concurring to their structures: rigid linkers, inflexible linkers, and in vivo cleavable linkers.

We took all of these epitopes and linker sequences separately in PEP-FOLD 3.5 tool –to be analyzed- which is an approach aimed at predicting peptide structures from amino acid sequences. Starting from a single amino acid sequence from 5 to 50 standard amino acids, PEP-FOLD3 runs a series of 100 simulations.

Each simulation samples a different region of the conformational space. It returns an archive of all the models generated, the detail of the clusters and the best conformation of the 5 best clusters. After that, we determined the best sequence with the best linker depending on sOPEP energy and TM-score.

The less the sOPEP of the sequence, the more stable to be used as a linker in our project. For instance, this attached picture shows several epitopes with linkers. The sequence (FLSFHISNLEAAAKEAAAKEAAAKFLGLSNIKF) has the least sOPEP among the whole epitopes equal -64.0417, so it is the most useful one.

Figure: Evaluating sOPEP energy & TM-score of each linker using PEPFOLD3.5

Team:AFCM-Egypt/Design