The tutorial of exoRBase 2.0 is as following:
A total of 905 RNA-seq data of extracellular vesicles (EVs) from human blood, urine, cerebrospinal fluid (CSF) and bile samples were collected and compiled into exoRBase 2.0 database. These blood samples were associated with diverse biological conditions, including healthy state, benign disease and 13 types of cancer.
Referring to the ASJA (Assembling Splice Junctions Analysis, https://github.com/HuangLab-Fudan/ASJA) and CIRI2 (CircRNA Identifier, https://sourceforge.net/projects/ciri/files/CIRI2/) bioinformatic tools,
we employed an improved exLR-seq analysis pipeline to reanalyze the raw sequencing data of all samples. The annotation information and expression profiles of 19,643 mRNAs, 15,645 lncRNAs and 79,084 circRNAs in EVs were obtained. We also covered the enrichment scores of 11,536 MSigDB (Molecular Signatures Database) pathways for each sample generated by ssGSEA (single sample Gene set Enrichment Analysis) analyses on exLR expression profiles. In addition, exoRBase 2.0 provides the relative and absolute distribution of 16 types of tissue cells and 23 types of blood cells produced by the modified EV-origin approach. This database enables users to ExoRBase 2.0 enables users to browse, search, visualize and compare mRNAs, lncRNAs, circRNAs, pathways, origins of EVs.
In the browse section, there are four web pages for users to browse mRNA, lncRNA, circRNA, Pathway, or Tissue/Cell origin. Users can browse targets in this database with different filter criteria. Taking the ‘mRNA and lncRNA’ as an example:
By clicking the ‘Search’ tab on the top navigation menu, users will jump to the search section from any pages. Users can search their mRNAs, lncRNAs, circRNAs, pathways, or tissue/cell origins of interest by entering a comma-separated list of gene symbols, circRNA IDs, pathway IDs/names, or tissue/cell names on the corresponding search page. The search and results for mRNA and lncRNA genes is illustrated by the following figure:
Clicking the gene symbol, circID, pathway ID or tissue/cell name in both browse and search result tables will link to the corresponding detail section that currently includes ‘Summary’, ‘Profile’ and ‘Comparison’ three pages. For exLRs, users can view more comprehensive and detailed annotation information as well as the expression frequencies (sample numbers) and mean expression values in tumor, benign, healthy, urine, CSF and bile samples from the ‘Summary’ page. A few manual options are provided on the ‘Profile’ and ‘Comparison’ pages:
ExoRBase 2.0 supports the intuitive visualization not only of the absolute proportions of tissue/cell origins for comparison between different cohorts but also of the relative proportions of tissue/cell origins for comparison between different tissue/cell origins in each sample. By clicking ‘Origin’ on the navigation bar or ‘Tissue/Cell origin’ on the home page, the ‘Relative tissue/cell origin proportions’ page is displayed:
(1)The calculation of tissue specificity for a mRNA or lncRNA gene:
To assess tissue specificity of a mRNA or lncRNA gene across human tissues, we downloaded the latest release (RNA-seq analysis V8) of gene expression TPM profiles (GENCODE version 26) across 30 tissues in the Genotype-Tissue Expression (GTEx) project. Genes that were lowly expressed (<0.1 TPM) in all tissues were removed. The tissue specificity score was defined as the difference between the logarithm of the total number of tissues and the Shannon entropy of the expression values for a gene. The score for one gene was calculated as follows:
(2)Annotating circRNAs to circBase database:
The resource of all human circRNAs were downloaded from circBase database ( http://www.circbase.org/ ). The genomic coordinates of circRNAs were converted to hg38 using the UCSC liftover tool. Then these circRNAs were combined with our collections.
(3)The definition of significant difference between healthy group and other groups:
To avoid the impact of sample imbalance on differential analysis, we randomly sampled 35 healthy samples to compare with groups with relatively fewer samples (CHD, CRC, GC, KIRC, ML, OV and SCLC). ExLRs with |log 2FC (fold change)| > 1 and p -value < 0.05 were considered significantly differential targets, and the enrichment results from pathway analyses and EV-origin strategy with p -value < 0.05 were also included in this study. For groups with larger sample sizes (benign, BRCA, HCC, PAAD), the q -values (adjusted using the adjusted Benjamini–Hochberg method) of differential targets were less than 0.05.
The exLR-seq data of ESCC, GBM and MEL blood samples are produced by different teams with low mapped read counts. Hence, these data are only used to annotate exLRs detected in such types of samples, and the differences between the three groups and the healthy group were not taken into consideration.