Ontolomics-P

I.1 Input data preparation: Type in a keyword

In the 'Import Data' part (Step 1), users could type in a gene name (e.g. TNFSF10), or a UniProt ID (e.g. P50591), or a noun (e.g. Liver). For instance, when users input 'TNFSF10', which is a gene name, the parameters should be configured as follows:

I.2 Results

I.2.1 Ontology-guide Results

Herein, Ontolomics-P built a topics database based on the GO database (please refer to the 'III. Topics analysis database' part), aiding in the retrieval of pertinent protein/gene functional information. It outputs two kinds of results:

I.2.1.A Original GO functions

'Original GO functions' means that this tool retrieves the GO functions of the input keyword (e.g. TNFSF10) and plots word clouds of BPs, MFs and CCs respectively. The results include:
A.1. Original GO function plots based on the keyword that users type in: Here will show the distribution of the number, word clouds of BPs, MFs and CCs for TNFSF10 respectively.
A.2. Original GO function table based on the keyword that users type in: Here will show the GO functions of TNFSF10 derived from GO database.
A.3. Word frequency table of the word clouds in A.1. above: Here shows the word frequency for the world clouds (BP, CC, MF) in figure A.1. above.
The results are shown as below:

I.2.1.B Topics analysis results

'Topics analysis results' means that this tool processes topics analysis based on the pre-established topics database (please refer to the 'III. Topics analysis database' part). The results include:
B.1. The correlation plots between the original GOs and those in each topic: This will show the distribution of the semantic similarity between the keyword’s (e.g. TNFSF10) GO functions and each topic derived from BPs, MFs and CCs, respectively. Then it also exhibits the histogram of the 125 semantic similarity values.
B.2. The correlation table between the original GOs and those in each topic: This will show the result table of the semantic similarity between the keyword’s (e.g. TNFSF10) GO functions and each topic derived from BPs, MFs and CCs, respectively. The semantic similarity values are used to make the figures above ( The “B.1. The correlation plots between the original GOs and those in each topic” part).
The results are shown as below:

I.2.2 Data-driven Results

Herein, Ontolomics-P seamlessly integrates quantitative proteomic data from ten cancer types stored in the CPTAC database. These expression data were downloaded from https://proteomic.datacommons.cancer.gov/pdc/explore-quantitation-data with PDC study identifiers COAD (PDC000116), BRCA (PDC000120), UCEC (PDC000125), ccRCC (PDC000127), LUAD (PDC000153), HECA (PDC000198), HNSCC (PDC000221), PADA (PDC000270), LSCC (PDC000219) and OV (PDC000360). This integration facilitates the efficient review of the expression profiles of proteins of interest (POIs) for researchers. For example, users type in a gene name TNFSF10, the results include:
A.1. The protein(s) that users type in or upload in the Step 1 is/are displayed in a boxplot or heatmap. As users only type in one gene name here, it shows boxplots for this protein across all selected cancer data.
A.2. A.2. The protein(s) that users type in or upload in the Step 1 is/are displayed in the volcano plot. As users only type in one gene name here, it shows volcano plots for this protein across all selected cancer data.
A.3. The gene co-expression network analysis based on expression profiles. This gene co-expression network analysis reveals relationships between the input protein and other strongly correlated proteins (based on absolute Spearman correlation coefficients) across selected cancer datasets.

II.1 Input data preparation: Upload data

In the 'Import Data' part (Step 1), users could also upload/paste a list of POIs (not just a keyword). For instance, when users click the 'C. Load example data', which is a list of protein UniProt IDs, the parameters should be configured as follows:

II.2 Results

II.2.1 Ontology-guide Results

Herein, Ontolomics-P processes classical GO enrichment analysis based on original GO database and Topics enrichment analysis based on the pre-built topics database. Thus it outputs two kinds of results:

II.2.1.A Classical GO enrichment analysis

'Classical GO enrichment analysis' means that this tool retains classical GO enrichment analysis and semantic similarity analysis to simply the results. The results include:
A.1. Classical GO enrichment plots based on the IDs/Names that users upload: Here will show the boxplot of the top 20 classical GO enrichment terms and the simplified results of BPs, MFs and CCs of uploaded POIs using heatmaps with word clouds respectively.
A.2. Classical GO enrichment table based on the IDs/Names that users upload: Here will show the classical GO enrichment result table.
The results are shown as below:

II.2.1.B Topics enrichment analysis

'Topics enrichment analysis' means that this tool processes topics enrichment analysis using Fisher's exact test method based on the pre-established topics database (please refer to the 'III. Topics analysis database' part). The results include:
B.1. The topics enrichment analysis plots: Here will show the meteoric plots based on the topics enrichment analysis results of BPs, MFs and CCs of uploaded POIs respectively. Users could adjust the left '3. Object number for barplot' parameter to control the topics number of the plot.
B.2. The topics enrichment analysis table: Here will show the topics enrichment analysis result table of BPs, MFs and CCs of uploaded POIs. The meteoric plots above are made based on this table.
The results are shown as below:

II.2.2 Data-driven Results

Herein, Ontolomics-P seamlessly integrates quantitative proteomic data from ten cancer types stored in the CPTAC database. These expression data were downloaded from https://proteomic.datacommons.cancer.gov/pdc/explore-quantitation-data with PDC study identifiers COAD (PDC000116), BRCA (PDC000120), UCEC (PDC000125), ccRCC (PDC000127), LUAD (PDC000153), HECA (PDC000198), HNSCC (PDC000221), PADA (PDC000270), LSCC (PDC000219) and OV (PDC000360). This integration facilitates the efficient review of the expression profiles of proteins of interest (POIs) for researchers. For example, users upload a list of protein UniProt IDs in the 'Import Data' part (Step 1), the results include:
A.1. The protein(s) that users type in or upload in the Step 1 is/are displayed in a boxplot or heatmap. As users upload a list of proteins here, it shows heatmaps for uploaded proteins across all selected cancer datasets.
A.2. A.2. The protein(s) that users type in or upload in the Step 1 is/are displayed in the volcano plot. As users a list of proteins here, it shows volcano plots for all uploaded proteins across all selected cancer data.
A.3. The gene co-expression network analysis based on expression profiles. This gene co-expression network analysis reveals relationships between the uploaded proteins and other strongly correlated proteins (based on absolute Spearman correlation coefficients) across selected cancer datasets.

III. Topics analysis database: This tool built a topics database based on the GO database using the Latent Dirichlet allocation (LDA) model. Every topic was re-annotated using the GPT-4o language model. Users can check the database from here:

Download

IV. GO terms used for topics analysis: This database was implemented using GO.db package (Version 3.17.0, https://doi.org/doi:10.18129/B9.bioc.GO.db). Database shown as below:

Download

Calculating......

Step 1: Import Data

1. Please type in a gene name/UniProt ID/noun:

1. File format:

1.1. Import your data:

1.4. Sheet index:

1.4. Sheet index:

1.4. Separator:

1. Paste your data here:

2. Data type:

3. Species:

Data view:

Step 2: Ontology-guide Results

1. Minimum word number:

2. Cloud area size:

1. Adjusted P value:

2. GO ID number based on GeneRatio:

3. Object number for barplot:

A.1. Original GO function plots based on the keyword that users type in:

A.2. Original GO function table based on the keyword that users type in:

A.3. Word frequency table of the word clouds in A.1. above:

B.1. The correlation plots between the original GOs and those in each topic:

B.2. The correlation table between the original GOs and those in each topic:

A.1. Classical GO enrichment plots based on the IDs/Names that users upload:

A.2. Classical GO enrichment table based on the IDs/Names that users upload:

B.1. The topics enrichment analysis plots:

B.2. The topics enrichment analysis table:

Step 3: Data-driven Results

2. Plot type:

4. Color for volcano plot:

5. Study names:

A.1. The protein(s) that users type in or upload in the Step 1 is/are displayed in a boxplot or heatmap:

A.2. The protein(s) that users type in or upload in the Step 1 is/are displayed in the volcano plot:

A.3. The gene co-expression network analysis based on expression profiles:

The expression data of the protein(s) that users type in or upload in the Step 1:

Which result table users want to show below:

Step 4: User Manual

III. Topics analysis database: This tool built a topics database based on the GO database using the Latent Dirichlet allocation (LDA) model. Every topic was re-annotated using the GPT-4o language model. Users can check the database from here:

IV. GO terms used for topics analysis: This database was implemented using GO.db package (Version 3.17.0, https://doi.org/doi:10.18129/B9.bioc.GO.db). Database shown as below: