Team:SJTU-software/Implementation





Proposed Implementation
Our project mainly offered a dataset for rice experimenters to find and search related genes to modify the gene or select some samples for a specific group. If the user enters a gene name or aliases, the database will return all the information related to that gene.

First of all, we had gene annotations, which included reference sequence, source, sequence ontology, start and end positions, strand, phase, and attributes. Secondly, we had gene-gene relations and gene-phenotype relations generated by our novel algorithm, modified Greedy Equivalence Search (GES), which meant it could return all the related genes and phenotype after typing the gene name, which was significant for experimenters. For example, if researchers found a stress-resistant gene, they could investigate the related gene that might perform similar functions; if it was not convenient to modify a certain gene, it might be proper to investigate and modify related ones. What’s more, we had performed a phenotype association analysis related to flag leaf angle, grain length, grain weight, grain width, height, leaf angle, leaf length, leaf width. If we had more information, we could perform a similar analysis for them rapidly and accurately. Our sample-sample interaction part, which contains clustering and global search, could offer new insight into a specific plant's relations.

All in all, rather than offering basic information about genes in rice, we can give users more information about gene and phenotype interaction, which can broaden the view for experimenters to find target genes to implement innovative ideas.





Results
We selected 453 out of 3010 sample to construct a PAV matrix. Each row in PAV matrix indicates a gene, and each column indicates a sample. Based on the sequence data, we calculated the coverage of a gene and converted into a binary value according to our threshold. The follwong figure displays different kinds of genes in rice pan-genome of 453 samples, in which over 20,000 genes were identified as core genes.

We manually collected 24 resistance genes from the literature, and we wonder whether some of these genes existed in 453 samples. It was surprising that we identified 14 out of 24 genes in the PAV matrix, and 9 of them were identified as core genes. For all of 14 resistance genes present in 453 species, we provided an online tools based on lastz (Harris 2007) to enable users to compare resistence genes to CDS of all the other plants in plant.ensembl.org. When it comes to other 5 non-core genes, we also studied the relationship between phenotype and genotype. For soft-core and distributed genes, a multivariate analysis was conducted on different traits based on whether the genes were expressed or not, and several significant genes were found. *X21* was thought to have significant effects on grain length and grain weight, while *ZFP252* was thought to have significant effects on plant height.These genes may influence rice stress resistance in this way.