The Cancer Genome Atlas (TCGA) miRNome

33

Cancer types

10,554

Samples


Circulating miRNome Profiles of Human Cancer

32

Cancer types

40

Studies

28,633

Samples

News

2021-09-09

CancerMIRNome has been published in

Nucleic Acids Research

!

https://doi.org/10.1093/nar/gkab784

The role of microRNAs (miRNAs) in human cancer

miRNAs are a class of small endogenous non-coding RNAs of ~22nt in length that negatively regulate the expression of their target protein-coding genes. miRNAs are reported to be involved in many biological processes, such as cell proliferation, differentiation, and apoptosis. Mounting evidence has demonstrated that miRNAs are dysregulated in various types of human cancer, which can be leveraged as expression biomarkers/signatures for cancer diagnosis and prognosis.

Circulating miRNAs as promising diagnostic biomarkers

Circulating miRNAs represent the miRNAs that are secreted into extracellular body fluids, where they are incorporated in extracellular vesicles (EVs), such as shed microvesicles (sMVs) and exosomes, or in apoptotic bodies, or form complexes with RNA binding proteins, such as Argonates (AGOs). These protected circulating miRNAs remain in remarkably stable forms, rendering potential cancer biomarkers for non-invasive early detection or tissue-of-origin localization.

About CancerMIRNome

CancerMIRNome is a comprehensive database with the human miRNome profiles of 33 cancer types from The Cancer Genome Atlas (TCGA), and 40 public cancer circulating miRNome profiling datasets from NCBI Gene Expression Omnibus (GEO) and ArrayExpress.

CancerMIRNome provides a user-friendly interface and a suite of advanced functions for: (I) the pan-cancer analysis of a miRNA of interest across multiple cancer types; and (II) the comprehensive analysis of cancer miRNome profiles to identify boimarkers/signatures for cancer diagnosis and prognosis.

Loading...

Citation

Please cite the following publication: Li,R., et al. (2021) CancerMIRNome: an interactive analysis and visualization database for miRNome profiles of human cancer.

Nucleic Acids Research

, gkab784

https://doi.org/10.1093/nar/gkab784



miRNA Expression in Tumor and Normal Samples in TCGA
(Wilcoxon rank-sum test, ***: P < 0.001; **: P < 0.01; *: P < 0.05; ns: P > 0.05)

Loading...

ROC Analysis Between Tumor and Normal Samples in TCGA

Loading...

Kaplan Meier Survival Analysis of Overall Survival in TCGA
(Low- and high-expression groups were separated by median values)

Loading...


Box Plot of miRNA Expression

ROC Analysis (Tumor vs. Normal)

Kaplan Meier Survival Analysis



Pearson Correlation Analysis of the miRNA and its Targets (miRTarBase 2020)


miRNA-Target Correlation Plot
Loading...

miRNA-Target Correlation Across All TCGA Projects
(Pearson Correlation, ***: P < 0.001; **: P < 0.01; *: P < 0.05)


Functional Enrichment Analysis of the Targets (miRTarBase 2020)


Bar Plot of the Top 30 Enriched Pathways


Bubble Plot of the Top 30 Enriched Pathways



miRNA Expression in the Selected Circulating miRNome Dataset

Loading...

Comprehensive Analysis & Visualization of TCGA miRNome Profiles




Sample Type

Pathological Stage

Clinical Stage

Age at Diagnosis

Overall Survival Status

Overall Survival



Highly Expressed miRNAs
(CPM > 1 in more than 50% of the samples)


Bar Plot of the Top 50 Highly Expressed miRNAs




Volcano Plot of Differentially Expressed miRNAs


Differential Expression Analysis of Highly Expressed miRNAs


ROC Analysis Between Tumor and Normal Samples


Selection of Features for the Classification of Tumor and Normal Samples using LASSO




2D Principal Component Analysis using Highly Expressed miRNAs
Loading...
3D Principal Component Analysis using Highly Expressed miRNAs
Loading...



Kaplan Meier (KM) Survival Analysis of the Highly Expressed miRNAs
(Low- and high-expression groups were separated by median values)


Cox Proportional-Hazards (CoxPH) Survival Analysis of the Highly Expressed miRNAs
Pre-built Prognostic Model using Univariate CoxPH & Cox-Lasso



Kaplan Meier Survival Analysis of the Prognostic Model
(Low- and high-risk groups were separated by median values)
Time-dependent ROC Analysis of the Prognostic Model
(NNE method, span = 0.01)




                                      

Prognostic Model
Kaplan Meier Survival Analysis in the Training Dataset
(Low- and high-risk groups were separated by median values)
Loading...

Comprehensive Analysis & Visualization of Circulating miRNome Profiles




Summary

GEO/ArrayExpress Accession:

Platform:

Cancer Types:

Disease Status

Subgroups



Top 500 Highly Expressed miRNAs


Bar Plot of the Top 50 Highly Expressed miRNAs




Volcano Plot of Differentially Expressed miRNAs


Differential Expression Analysis of Highly Expressed miRNAs




ROC Analysis Between Case and Control Samples
Loading...




Selection of Features for the Classification of Case and Control Samples using LASSO

Loading...

Loading...


2D Principal Component Analysis using Highly Expressed miRNAs
Loading...
3D Principal Component Analysis using Highly Expressed miRNAs
Loading...

Loading...

Loading...

Tutorial for Cancer miRNome Data Analysis & Visualization

Introduction


CancerMIRNome is a comprehensive database for facilitating the use of publicly available cancer miRNome data to assist in miRNA research in various cancers. It has integrated the sequencing data of miRNome in 33 cancer types from the TCGA program and miRNA profiling data from the most comprehensive collection of 40 public datasets. A suite of advanced functions is provided to facilitate the interactive analysis and visualization of large-scale cancer miRNome data (Figure 1).


When querying a miRNA of interest, by default, the results will be automatically generated for pan-cancer analyses, including differential expression (DE) analysis, receiver operating characteristic (ROC) analysis, survival analysis, miRNA-target correlation analysis, and functional enrichment analysis based on TCGA projects, as well as the analysis of circulating miRNA expression profiles in public circulating miRNome datasets.


Users may choose to perform various comprehensive analyses at the dataset level, including identification of highly expressed miRNAs, DE analysis between two customized groups, ROC analysis, feature selection using a machine learning algorithm, principal component analysis (PCA), and survival analysis in a TCGA project or in a circulating miRNome dataset.


Advanced visualizations are supported to produce downloadable vector images of publication-quality in PDF format. All the data and results generated are exportable, allowing for further analysis by the end users.


Figure 1. Overview of the CancerMIRNome database

Query a miRNA

1. Overview

Users can query a miRNA of interest by typing the miRNA accession number, miRNA ID of miRBase release 22.1 [1] or previous miRNA IDs in the 'Search a miRNA' field and selecting this miRNA from the dropdown list. In addition to the general information including IDs and sequence of the queried miRNA, links to five miRNA-target databases including ENCORI [2], miRDB [3], miTarBase [4], TargetScan [5], and Diana-TarBase [6] are also provided.


A suite of advanced analyses can be interactively performed for a selected miRNA of interest (Figure 1), including:

(1) Pan-cancer differential expression (DE) analysis,receiver operating characteristic (ROC) analysis and Kaplan Meier (KM) survival analysis in TCGA;

(2) DE analysis, ROC analysis, and KM survival analysis in a selected TCGA project;

(3) miRNA-target correlation analysis;

(4) Functional enrichment analysis of miRNA targets;

(4) Functional enrichment analysis of miRNA targets;

(5) Circulating miRNA expression analysis


Figure 1. Query a miRNA of interest

2. TCGA Pan-cancer Analysis

Pan-cancer DE analysis and ROC analysis of a miRNA between tumor and normal samples can be performed in 33 cancer types from TCGA (Figure 2).

(1) Wilcoxon rank sum test is used for DE analysis. The expression levels and statistical significances of the miRNA in all the TCGA projects can be visualized in a box plot.

(2) ROC analysis is performed to measure the diagnostic ability of the miRNA in classifying tumor and normal samples. A forest plot with the number of tumor and normal samples, area under the curve (AUC), and 95% confidence interval (CI) of the AUC for each TCGA project is used to visualize the result.

(3) Prognostic ability of a miRNA can be evaluated by performing KM survival analysis of overall survival (OS) between tumor samples with high and low expression of the miRNA of interest defined by its median expression value. A forest plot displaying the number of tumor samples, hazard ratio (HR), 95% CI of the HR, and p value for each cancer type in TCGA is used to visualize the result of pan-cancer survival analysis.

Figure 2. Pan-cancer DE analysis, ROC analysis, and KM survival analysis for the selected miRNA

3. miRNA Analysis in individual TCGA projects

CancerMIRNome provides functions to focus the DE analysis, ROC analysis, and KM survival analysis for the miRNA of interest in a selected TCGA project.

When a TCGA project is selected from the dropdown list, (1) A box plot with miRNA expression and p value of wilcoxon rank-sum test between tumor and normal samples, (2) an ROC curve, and (3) a KM survival curve for the selected project will be displayed (Figure 3).

Figure 3. miRNA analysis in a selected TCGA project

4. miRNA-Target Correlation Analysis

Pearson correlation between a miRNA and its targets in tumor and normal tissues of TCGA projects can be queried in CancerMIRNome. The miRNA-target interactions are based on miRTarBase 2020 [4], an experimentally validated miRNA-target interactions database.

The expression correlations between a miRNA and all of its targets in a selected TCGA project are listed in an interactive data table. Users can select an interested interaction between miRNA and mRNA target in the data table to visualize a scatter plot showing their expression pattern and correlation metrics.

An interactive heatmap is also available to visualize and compare such miRNA-target correlations across all TCGA projects.

Figure 4. miRNA-target correlation analysis

5. Functional Enrichment Analysis of miRNA Targets

Functional enrichment analysis of the target genes for a miRNA can be performed using clusterProfiler [7] in CancerMIRNome. CancerMIRNome supports functional enrichment analysis with many pathway/ontology knowledgebases including:

(1) KEGG: Kyoto Encyclopedia of Genes and Genomes

(2) REACTOME

(3) DO: Disease Ontology

(4) NCG: Network of Cancer Gene

(5) DisGeNET

(6) GO-BP: Gene Ontology (Biological Process)

(7) GO-CC: Gene Ontology (Cellular Component)

(8) GO-MF: Gene Ontology (Molecular Function)

(9) MSigDB-H: Molecular Signatures Database (Hallmark)

(10) MSigDB-C4: Molecular Signatures Database (CGN: Cancer Gene Neighborhoods)

(11) MSigDB-C4: Molecular Signatures Database (CM: Cancer Modules)

(12) MSigDB-C6: Molecular Signatures Database (C6: Oncogenic Signature Gene Sets)

A data table is produced to summarize the significantly enriched pathways/ontologies in descending order based on their significance levels, as well as the number and proportion of enriched genes and the gene symbols in each pathway/ontology term. The top enriched pathways/ontologies are visualized using both bar plot and bubble plot.

Figure 5. Functional enrichment analysis of miRNA targets

6. Circulating miRNA Expression Profiles of Cancer

Expression of the interested miRNA in whole blood, serum, plasma, extracellular vesicles, or exosomes in both healthy and different cancer types can be conveniently explored in CancerMIRNome on the basis of 40 circulating miRNome datasets. Users can select one or more datasets for an analysis, through which violin plots are displayed for visualization and comparison of circulating miRNA expression between samples or datasets.

Figure 6. Expression of circulating miRNAs in cancer

References

[1] Kozomara, A., Birgaoanu, M. and Griffiths-Jones, S. (2019) miRBase: from microRNA sequences to function. Nucleic acids research, 47, D155-D162.

[2] Li, J.-H., Liu, S., Zhou, H., Qu, L.-H. and Yang, J.-H. (2014) starBase v2. 0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic acids research, 42, D92-D97.

[3] Chen, Y. and Wang, X. (2020) miRDB: an online database for prediction of functional microRNA targets. Nucleic acids research, 48, D127-D131.

[4] Huang, H.-Y., Lin, Y.-C.-D., Li, J., Huang, K.-Y., Shrestha, S., Hong, H.-C., Tang, Y., Chen, Y.-G., Jin, C.-N. and Yu, Y. (2020) miRTarBase 2020: updates to the experimentally validated microRNA-target interaction database. Nucleic acids research, 48, D148-D154.

[5] Agarwal, V., Bell, G.W., Nam, J.-W. and Bartel, D.P. (2015) Predicting effective microRNA target sites in mammalian mRNAs. elife, 4, e05005.

[6] Karagkouni, D., Paraskevopoulou, M.D., Chatzopoulos, S., Vlachos, I.S., Tastsoglou, S., Kanellos, I., Papadimitriou, D., Kavakiotis, I., Maniou, S. and Skoufos, G. (2018) DIANA-TarBase v8: a decade-long collection of experimentally supported miRNA-gene interactions. Nucleic acids research, 46, D239-D245.

[7] Yu, G., Wang, L.-G., Han, Y. and He, Q.-Y. (2012) clusterProfiler: an R package for comparing biological themes among gene clusters. Omics: a journal of integrative biology, 16, 284-287.

TCGA miRNome Analysis

1. Overview

CancerMIRNome is equipped with well-designed functions which can perform comprehensive dataset-level analysis of cancer miRNome for each of the 33 TCGA projects (Figure 1), including:

(1) Identification of highly expressed miRNAs

(2) DE analysis between two user-defined subgroups

(3) ROC analysis between tumor and normal samples

(4) Selection of diagnostic miRNA markers

(5) Principal component analysis

(6) Identification of prognostic miRNA biomarkers and construction of prognostic models

Figure 1. Overview of TCGA cancer miRNome analysis

2. Summary of the TCGA Project

When a TCGA project is selected, the summary of important clinical features of patients in this dataset, including sample type, tumor stages, ages, and overall survival will be displayed (Figure 2).

Figure 2. Summary of the TCGA project

3. Highly Expressed miRNAs

miRNAs with counts per million (CPM) greater than 1 in more than 50% of the samples in a TCGA project of interest are reported as highly expressed miRNAs. The miRNAs are ranked by the median expression values and the top 50 of the highly expressed miRNAs are visualized with a bar plot (Figure 3).

Figure 3. Identification of highly expressed miRNAs

4. Diffentially Expressed miRNAs

The DE analysis of highly expressed miRNAs at the dataset-level allows users to identify miRNAs that are differentially expressed between two user-defined subgroups in a TCGA project.

Metadata, including sample type, tumor stages, gender, and etc., may be used to group the samples. For examples, the DE analysis can be performed not only between tumor and normal samples, but also between patients at early and late tumor stages.

Both limma [1] and wilcoxon rank-sum test are used for the identification of differentially expressed miRNAs (Figure 3).

Figure 4. Identification of differentially expressed miRNAs

5. ROC Analysis

The ROC analysis is carried out to screen the highly expressed miRNAs in a selected TCGA dataset for the diagnostic biomarkers that can distinguish tumor samples from normal samples. All the miRNAs are ranked in a data table based on their AUC values (Figure 5).

Figure 5. ROC analysis to identify diagnostic biomarkers

6. Feature Selection

The least absolute shrinkage and selection operator (LASSO) [2], a machine-learning method, can be used to analyse the entire set of miRNAs in a selected TCGA project for the identification of disgnostic miRNAs, and use the miRNA signature to develop a classification model for differentiating tumor and normal samples (Figure 6).

Figure 6. Identification of disgnostic miRNAs using LASSO

7. Principal Component Analysis

Principal component analysis can be utilized to analyse the highly expressed miRNAs in a selected TCGA project such that all patient samples, including tumor and/or normal samples, may be visualized in a 2D and 3D interactive plot using the first two and three principal components, respectively (Figure 7).

Figure 7. Principal component analysis

8. Survival Analysis

Three survival analysis modules were developed in CancerMIRNome for the identification of prognostic miRNA biomarkers and development of miRNA expression-based prognostic models (Figure 8), including:

(1) Univariate Survival Analysis: univariate CoxPH regression analysis and KM survival analysis

(2) Pre-built Prognostic Model: development of pre-built prognostic models using the regularized Cox regression model with Lasso penalty (Cox-Lasso)

(3) User-provided Prognostic Signature: development of prognostic models for the user-provided miRNA signatures

Figure 8. Overview of survival analysis modules

CancerMIRNome supports both Cox Proportional-Hazards (CoxPH) regression analysis and Kaplan-Meier (KM) survival analysis to identify prognostic miRNA biomarkers in a TCGA project (Figure 9).

Figure 9. Univariate CoxPH and KM survival analysis

The pre-built prognostic model for each cancer type in TCGA was developed by jointly analyzing the significant miRNAs (p < 0.05) in the univariate CoxPH analysis using the regularized Cox regression model with LASSO penalty (Cox-Lasso) [3]. The prognostic model, which is a linear combination of the finally selected miRNA variables with the LASSO-derived regression coefficients, will be used to calculate a risk score for each patient. All the patients will be divided into either high-risk group or low-risk group based on the median risk value in the cohort. The KM survival analysis and time-dependent ROC analysis can be performed to evaluate the prognostic ability of the miRNA-based prognostic model.

Figure 10. Pre-built prognostic model

CancerMIRNome also provides a module allowing for users to submit their own miRNA expression signatures of interest to build prognostic models using three survival analysis methods (Figure 10), including multivariate CoxPH, Cox-Lasso, and Cox regression model regularized with ridge penalty (Cox-Ridge) [3,4].

Figure 11. User-provided prognostic signature

References

[1] Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic acids research, 43(7), e47.

[2] Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58, 267-288.

[3] Friedman, J., Hastie, T. and Tibshirani, R. (2010) Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33, 1.

[4] Li, R. and Jia, Z. (2021) PCaDB - a comprehensive and interactive database for transcriptomes from prostate cancer population cohorts. bioRxiv, 10.1101/2021.06.29.449134.

Circulating miRNome Analysis

1. Overview

A set of similar functions are available for the comprehensive analysis of circulating miRNome at a dataset level to identify diagnostic miRNA biomarkers for non-invasive early cancer detection. Users can perform various analyses for circulating miRNome in CancerMIRNome (Figure 1), including:

(1) Identification of highly expressed miRNAs

(2) Differential expression analysis

(3) ROC analysis

(4) Selection of diagnostic circulating miRNA markers

(5) Principal component analysis

Figure 1. Overview of the cancer circulating miRNome analysis

2. Summary of the Circulating miRNome Dataset

The summary of a selected dataset includes the distribution of cancer types, the distribution of subgroups of the patients (if available), and an embedded webpage (from either GEO or ArrayExpress) housing the public dataset (Figure 2).

Figure 2. Summary of the cancer circulating miRNome dataset

3. Highly Expressed miRNAs

Since almost all the circulating miRNome datasets were based on the microarray assays, all the miRNAs in a dataset are ranked by the median expression values and the top 500 miRNAs are considered as highly expressed miRNAs in this dataset. The top 50 highly expressed miRNAs in a selected dataset are visualized in a bar plot (Figure 3).

Figure 3. Identification of highly expressed miRNAs

4. Differential Expression Analysis

The limma and the wilcoxon rank sum test can be used to identify DE miRNA biomarkers between two user-defined subgroups in the dataset. Similar to DE analysis in TCGA projects, only the highly expressed circulating miRNAs are included in the DE analysis (Figure 4).

Figure 4. Identification of differentially expressed miRNAs

5. ROC Analysis

The ROC analysis of the highly expressed circulating miRNAs between two user-defined subgroups of samples in a selected dataset can be performed to identify diagnostic biomarkers for non-invasive early cancer detection or cancer type classification. The circulating miRNA biomarkers are ranked in a data table by their AUC values (Figure 5).

Figure 5. ROC analysis to identifiy diagnostic miRNAs

6. Feature Selection

Similar to the feature selection function for miRNome analysis in TCGA projects, LASSO can be also used in a selected dataset to identify the circulating miRNA biomarkers for non-invasive early cancer detection or cancer type classification (Figure 6).

Figure 6. Identification of disgnostic miRNAs using LASSO

7. Principal Component Analysis

Principal component analysis can be utilized to analyse the highly expressed circulating miRNAs in a selected dataset such that all the subjects, including healthy individuals and patients with various types of cancers, may be visualized in a 2D and 3D interactive plots using the first two and three principal components, respectively (Figure 7).

Figure 7. Principal component analysis

Data Download

1. Download Processed miRNome Datasets

All the processed data deposited in CancerMIRNome, including the

ExpresionSet

object with the normalized miRNA expression data and metadata for each dataset, as well as the miRNA annotation data (from miRBase release 10.0 to release 22) in .RDS format can be downloaded directly by clicking the link to the data in the Download page

Figure 1. Download processed miRNA expression data, sample metadata, and miRNA annotation data

2. Export Data Analysis & Visualization Results

We provide two download buttons (

PDF

and

CSV

) under each figure generated by CancerMIRNome, allowing the users to download the high-resolution publication-quality vector image in PDF format and the data that is used to generate the figure in CSV format.

Figure 2. Download the publication-quality vector image in PDF format

In this example, the expression data of the selected miRNA

hsa-let-7a-5p

for all the samples across the 33 TCGA projects will be saved to the CSV file

hsa-let-7a-5p.TCGA_PanCancer_Expression_Data.csv . by clicking the download button.
Figure 2. Download the data that is used to generate the figure to a CSV file

All the data tables generated for the data analysis outputs are also exportable in

CSV

or

EXCEL

formats. The output can also be copied to the clipboard.

Figure 3. Download the data table for the data analysis output