laitimes

iMeta | Songfeng Wu/Jingli Li/Yunping Zhu-Jointly developed an online proteomics data analysis platform

author:Seishin Treasure Book

BioLadder: A bioinformatics platform focused on proteomics data analysis

iMeta | Songfeng Wu/Jingli Li/Yunping Zhu-Jointly developed an online proteomics data analysis platform

iMeta Homepage: http://www.imeta.science

Methods Papers

● 期刊:iMeta [IF 23.7]

● Original link DOI: https://doi.org/10.1002/imt2.215

● 2024年6月20日,北京青莲百奥生物科技有限公司和国家蛋白质科学中心(北京)团队在iMeta在线联合发表了题为 “BioLadder: A bioinformatic platform primarily focused on proteomic data analysis” 的文章。

● In this study, an online proteome data analysis platform, BioLadder (https://www.bioladder.cn/), was developed, including 3 types of experimental data analysis modules and 4 types of routine data analysis modules. It allows users to perform a variety of proteomic data analysis easily and efficiently.

● First author: Zhang Yupeng, Yang Chunyuan

● Corresponding Authors: ZHU Yunping ([email protected]), LI Jingli ([email protected]), WU Songfeng ([email protected])

● Co-authors: Wang Jinhao, Wang Lixin, Zhao Yan, Sun Longqin, Sun Wei

● Main unit: Beijing Qinglian Baiao Biotechnology Co., Ltd.; National Protein Science Center (Beijing), Beijing Proteomics Research Center, Beijing Institute of Life Omics

Bright spots

iMeta | Songfeng Wu/Jingli Li/Yunping Zhu-Jointly developed an online proteomics data analysis platform

● BioLadder includes 3 types of experimental data analysis modules and 4 types of routine data analysis modules;

● BioLadder allows for easy and efficient analysis of various proteomics data;

● BioLadder provides 4 auxiliary methods to help users quickly and accurately use the relevant analysis module.

Summary

BioLadder (https://www.bioladder.cn/) is an online data analysis platform designed for proteomics research, including 3 types of experimental data analysis modules and 4 types of routine data analysis modules. It allows users to perform a variety of proteomic data analysis easily and efficiently. In addition, most of the modules can also be used for the analysis of other omics data. In order to facilitate the user experience, we have carefully designed 4 auxiliary methods to help users quickly and accurately use the relevant analysis module.

Video Interpretation

, duration 04:56

Bilibili:https://www.bilibili.com/video/BV1xy411q7Ci/

Youtube:https://youtu.be/VciXv84LjHc

Download extended materials such as Chinese translation, PPT, Chinese/English video interpretation, etc

Please visit the journal's official website: http://www.imeta.science/

Full text interpretation

INTRODUCTION

In recent years, the vigorous development of multi-omics research has generated a large amount of data, and in-depth data analysis and mining have become an important feature of life science research. Bioinformatics has become one of the most commonly used research tools and plays a key role in life science research.

However, bioinformatics research requires programming training, which may not be the strong point of those researchers who focus on scientific problems. In addition, even if some researchers have coding skills, they still need to invest a lot of time and effort in coding to complete the analysis, which will undoubtedly lead to delays in related work.

Online analysis platforms are undoubtedly the preferred choice for researchers as they do not require additional installation and preparation. Simply opening a web page and uploading data for analysis can dramatically accelerate the pace of life science research. Currently, there are many similar online data analysis platforms, including some dedicated to omics data analysis, such as ImageGP, Sangerbox, Majorbio Cloud, OmicStudio, OmicsSuite, OmicsAnalyst, etc. However, most of these analytical platforms are developed based on the needs of genomics and transcriptomics, and few are specifically designed for proteomics.

The proteome is translated from the transcriptome and has not only the expression features of the transcriptome but also additional features such as modifications and interactions. Proteomics is much more complex than genomics and transcriptomics when it comes to qualitative and quantitative experimental techniques, which creates additional requirements for data analysis. In recent years, with the advancement of technology, proteomics has gradually played an increasingly important role in medical research, resulting in an increasing and diverse demand for protein data analysis.

Here, we provide the BioLadder Bioinformatics Platform (https://www.bioladder.cn/), which provides not only some traditional analysis tools, but also commonly used proteomics analysis tools, including experimental result visualization, sequence-level analysis, expression data analysis, and functional analysis.

Results

Modular design for proteomic data analysis

Proteomic data analysis can be divided into two categories (Fig. 1) :(1) Experimental data analysis: involves the analysis of proteomics experimental data, including experimental data analysis, expression matrix data analysis, etc. (categories 1-3); (2) Routine data analysis: analysis that does not rely on proteomics experimental data, including protein sequence analysis and some general classification and functional analysis (categories 4-7).

The seven categories are outlined below:

类别1. ExpDataVisualization

Experimental data visualization currently includes two modules (CoverageBar and Pep2ProMap) that show the coverage of proteins by proteomic identified peptides, as well as information on protein cleavage sites.

类别2. DataPreProcessing

Data preprocessing includes data format conversion, standardization, imputation, etc. It is an important part of the subsequent analysis.

类别3. QuantitativeAnalysis

Quantitative comparison, which involves the analysis of quantitative results for each protein, is the most common type of analysis module and can be subdivided into five subcategories: (1) DifferenceAnalysis: Difference analysis includes difference calculations, false discovery rate (FDR) correction, and visualization of difference results, such as volcano plots, receiver operating characteristics (ROC) curves, etc. These modules are capable of calculating differences and displaying results at the same time; (2) QuantitativeDes: Quantitative data descriptions include the creation of scatter plots, density plots, distribution bar charts or line plots, and coefficient of variation (CV). These modules are designed to describe the distribution, density, and other characteristics of quantitative data; (3) QuantitativeComp: Quantitative data comparison includes bar charts, heat maps, box plots, etc. These modules are primarily used to compare quantitative differences or changes between different samples or genes; (4) QuantitativeCorr: Quantitative data correlation includes correlation heat map, correlation matrix map, etc. These modules calculate quantitative correlations between samples or genes to reveal relationships between samples or genes; (5) QuantitativeCluster: Correlation clustering includes principal component analysis (PCA), T-SNE, UMAP, trend analysis of multiple datasets, tree map, etc. These modules typically use dimensionality reduction algorithms or other distance calculation methods to cluster and analyze samples or genes.

类别4. SeqAnalysis

Sequence analysis refers to the analysis that can be done based on protein sequences, including multi-sequence alignment, sequence motif analysis, protein physicochemical property calculation, etc.

类别5. AbundanceMap

Abundance plots provide a convenient way to query and display reference quantitative data on body fluids, which currently include blood and urine.

类别6. ClassificationAnalysis

Categorical analysis consists of two subcategories: (1) categorical display involves the use of scatter charts, pie charts, area charts, etc., to display the differences of different types of results after classification; (2) Classification comparison involves the use of visualization tools such as Venn charts, Sankey charts, and radar charts to compare different types of results.

类别7. FunctionAnalysis

Functional analysis focuses on the visualization of enrichment results based on gene ontology (GO), as well as the mapping of interaction networks.

As a result, the analysis modules included in the BioLadder cover experimental data analysis in proteomics research, as well as multiple common sequence data analysis modules. These analysis modules can meet most of the data analysis needs of researchers in the field of proteomics.

iMeta | Songfeng Wu/Jingli Li/Yunping Zhu-Jointly developed an online proteomics data analysis platform

Figure 1. BioLadder module categories in the framework of proteomic data analysis

The BioLadder consists of more than 50 analysis modules that fall into two main categories (experimental data analysis and routine data analysis) and 7 categories (C1-C7 in the figure).

Specific proteomics data analysis module

In order to meet the needs of proteomics research, we have developed several proteomic data visualization modules, such as: (1) coverage analysis of peptides in protein sequences, including the CoverageBar and Pep2ProMap modules. These modules are primarily designed to present Lip-MS experimental results, but can also be used to display identification data for any proteomics experiment; (2) Analysis and visualization of quantitative data distribution, including CV curve and SumCurve module. Users can use these modules to examine the variability and abundance curves of quantitative data; and (3) quantification data and labeled proteins, including the AbundancePoint and BodyFluidMap modules. The former allows users to enter their own quantitative data and specify proteins, while the latter enables users to query quantitative information for specific proteins (currently including blood and urine) in a body fluid database.

We believe that these proteomic data visualization modules will meet the needs of proteomics research and provide valuable insights for researchers.

Convenient and user-friendly design

In order to enable omics research users to use our online analysis platform in the most convenient and efficient way, we have carefully designed it in several aspects, including input file format (Figure 2A), parameter settings (Figure 2B), color scheme (Figure 2C), etc. We provide help documentation, WeChat customer service, and real-time tooltips to make it easy for users to access relevant help information (Figure 2D). At present, the existing online cloud platform can only partially implement these functions (Table S3).

Simplified input formatting

Many data analysis methods are generic and have their own input data formats that may not be commonly used in the field of proteomics. Proteomics data may require some transformation to facilitate the corresponding analysis. Therefore, in our design, we provide conversion modules for different types of data (e.g., conversion between long and wide formats) and design some modules to directly support common proteomics formats. For example, in the Venn Plot module, users can not only enter data in the commonly used Venn format, but also directly into the quantitative matrix data table (commonly used in proteomics) for analysis. In addition, it can filter out certain data below the minimum quantitative value, which helps to eliminate results that can be caused by noise.

Specialized default parameters, diverse and extensive tuning methods for proteomics

In order to meet the specific requirements of proteomics data analysis, we have established suitable default parameters for some modules to minimize the need for parameter adjustments.

First, in terms of algorithms, we adjusted the default parameters according to the characteristics of the proteomic data. For example, in correlation calculations, some highly abundant proteins may significantly affect the default Person relevance calculations due to the nature of the expression data. Therefore, in the modules that involve correlation calculations, we default to using Spearman's rank correlation for calculations, which is also employed in many proteomics-related studies. In addition, given that there is often significant variation in the amount of protein identified in different samples, traditional standardization methods may inevitably introduce bias. To solve this problem, we have introduced a method called consensus protein median normalization in the standardization module.

Secondly, in terms of data preprocessing, we have made some adjustments according to the characteristics of the proteomic data. For example, since most genes are expressed at relatively low levels, direct mapping of the quantitative distribution often results in the concentration of most proteins at low abundance, making differences between samples difficult to discern. Therefore, in modules such as boxplots, violin plots, and kernel density plots, we directly set the default settings to require logarithmic transformation, allowing for clear visualization of changes in quantitative data between different samples without any parameter modifications.

In addition, we have set some special default parameters in the data presentation. For example, in heatmap analysis, genes are often numerous on the y-axis, and the names of the displayed genes are often illegible. Therefore, by default, we only display the sample name and omit the gene name for better clarity.

In addition, to meet user preferences, we have included easy-to-adjust parameters in several modules that allow users to customize their display results. For example, in the volcano map analysis, we include two methods for protein labeling:1. Customize the protein tag based on the tag column specified in the upload file; 2. Batch labeling based on p-value and difference fold change threshold. Similarly, in boxplot analysis, users can choose whether or not to add hypothetical test labels between different groups. We've also designed custom options that allow users to selectively add hypothetical test labels for specific group comparisons (e.g., annotating only significant results or comparisons of particular interest).

Powerful color scheme

Color schemes are a key aspect of data visualization, and improper color combinations can significantly reduce the effectiveness of visualization.

To solve this problem, we have configured the default color scheme in all modules, including some from the R package or ggplot2 (https://github.com/tidyverse/ggplot2), ensuring that users can create crisp graphics in no time and without extra steps.

In addition, more than half of the modules incorporate excellent color schemes (ggsci: https://github.com/nanxstats/ggsci) commonly used in literature or journals such as Nature, Science, and Lancet.

For users with specific requirements, we offer the option to customize colors. Users can select colors directly using the palette or precisely modify the color configuration by adjusting the color code, allowing them to customize colors for each sample or group based on their individual needs and aesthetics.

These three features provide our module with powerful color customization capabilities to meet a variety of user needs and allow users to quickly complete color customization according to their preferences.

In addition, some modules with unique characteristics use special color schemes. For example, the Volcano Map module typically requires only three colors for up, down, and non-inconspicuousness, so use the color picker to set up the three-color scheme.

Comprehensive help information, convenient real-time help

In order to ensure that users can smoothly use our modules for data analysis, we provide helpful information from multiple perspectives in the "User Guide". First, we provide an introduction to give an overview of the website structure and functionality. Second, we have an FAQ page that summarizes most of the frequently asked questions. Third, we provide detailed documentation for each module. In addition, we also provide a WeChat communication group where users can directly consult our staff about their problems.

In addition, in addition to the commonly used parameter settings, we have added tooltips to provide immediate assistance, so that users can access help information about parameter settings at any time to help accurately configure the corresponding parameters. For example, in the Heatmap module, four types of tooltips are provided: (1) a tooltip for entering file details, including an explanation of the file contents, maximum file limit, and file format; (2) a tooltip for a drop-down selection box explaining the meaning of each option; (3) a tooltip in download format, providing download instructions and graphical explanations of download settings; (4) In the upper left corner of the result graph in most modules, a link to a "text tutorial" is provided, as well as a tooltip that explains the chart, so that the user can quickly understand the importance of the chart. These tooltips make it easy for users to access help information and seamlessly continue with configuration and data analysis.

Figure 2. Four convenient and user-friendly designs in the BioLadder

(A) An example of a commonly used proteome file format as the default input format (Venn plot). (B) Specific default parameters for proteomics, including algorithm selection, data preprocessing, and presentation. Diverse and wide range of adjustment methods (using the volcano diagram as an example). (C) Three different color matching methods. (D) Comprehensive help information (three types of documentation and WeChat communication), as well as convenient real-time assistance (take the heatmap as an example).

Citation Format:

Yupeng Zhang, Chunyuan Yang, Jinhao Wang, Lixin Wang, Yan Zhao, Longqing Sun, Wei Sun, Yunping Zhu, Jingli Li, Songfeng Wu. 2024. BioLadder: A bioinformatic platform primarily focused on proteomic data analysis. iMeta 3: e215. https://doi.org/10.1002/imt2.215

About the author's affiliation

Beijing Qinglian Baiao Biotechnology Co., Ltd

Beijing Qinglian Baiao Biotechnology Co., Ltd. is an innovative platform enterprise focusing on proteomics detection and analysis, the company is guided by clinical needs and takes source innovation as the core driving force, providing one-stop complete solutions for proteomic detection and clinical transformation of protein diagnosis and treatment markers. Focusing on blood, exosomes, tissue sections, single cells and other fields, the company has successfully built a new generation of proteomics technology and bioinformatics platform, which has the characteristics of fully automatic, micro-detection, high depth coverage, and accurate quantification, providing practical and practical solutions for the clinical application of proteomics.

iMeta | Songfeng Wu/Jingli Li/Yunping Zhu-Jointly developed an online proteomics data analysis platform

National Protein Science Center (Beijing Base)

The National Protein Science Center (Beijing Base) is located in Zhongguancun Life Science Park, which is jointly built by the Academy of Military Medical Sciences, Tsinghua University, Peking University, and the Institute of Biophysics of the Chinese Academy of Sciences. It focuses on the construction of proteome analysis systems and functional proteome research systems, as well as support systems with bioinformatics, protein/antibody preparation, biological resource banks, and model animals as the core. The center also undertakes the research work of the "Chinese Human Proteome Program (CNHPP)" of the Ministry of Science and Technology. The center building covers an area of nearly 40,000 square meters, with high-throughput, high-precision proteomics research platform, protein function analysis platform, bioinformatics platform, translational medicine platform, laboratory animal center, biobank and other supporting facilities, forming a systematic and complete protein research system, mainly carrying out innovative research on proteomics, metabolomics, protein function, etc., and has the world's leading proteomics big data output capacity, which will provide high-level and one-stop proteomic analysis services for the world. The research results will be widely used and served in the development of disease diagnostic markers, new drug creation, infectious disease prevention and control, crop improvement, bioenergy conversion and other fields.

iMeta | Songfeng Wu/Jingli Li/Yunping Zhu-Jointly developed an online proteomics data analysis platform

Read on