The transformational role of GPU computing and deep learning in drug discovery

2022-05-02 12:49:34

Edit/Green Lotus

Deep learning (DL) has upended almost all areas of research, including drug discovery. Much of this revolution is due to the unprecedented advancement of highly parallelizable graphics processing units (GPUs) and the development of GPUs-enabled algorithms.

Recently, researchers from the University of British Columbia, the University of North Carolina at Chapel Hill, and Nvidia collaborated to publish a review article titled "The Transformational Role of GPU Computing and Deep Learning in Drug Discovery."

The transformational role of GPU computing and deep learning in drug discovery

In the review, the researchers provide a comprehensive overview of historical trends and recent advances in GPU algorithms and discuss their direct impact on discovering new drugs and drug targets. State-of-the-art deep learning architectures that have been applied in the early drug discovery and subsequent pilot optimization phases are also introduced, including accelerated molecular docking, assessment of off-target effects, and prediction of pharmacological properties. Finally, the impact of GPU acceleration and deep learning models on the global democratization of the drug discovery space is discussed, which could lead to effective exploration of the expanding field of chemistry to accelerate the discovery of new drugs.

Figure 1: Computer-aided drug discovery (CADD) workflow. (GPU accelerators have applications at every step of the drug discovery and development process)

GPU computing and deep learning for molecular simulations

GPU acceleration comes from massive data parallelism, which stems from similarly independent operations performed on many elements of the data. In molecular simulations, data parallelism can be applied to the independent calculation of atomic potential energy. Similarly, DL model training involves forward and backward passing, often represented as a matrix transformation that is easy to parallelize (Figure 2).

Figure 2: Parallelization of the DL architecture in single-GPU and multi-GPU environments.

Accelerate molecular dynamics simulations on the GPU

The development of GPU-centric molecular dynamics code over the past decade has resulted in hundreds of times lower computational costs for simulations compared to central processing unit (CPU)-based algorithms. Not only are GPUs well suited to accelerate molecular dynamics simulations, but they can also adapt well to system scale using spatial domain decomposition. Thus, molecular dynamics simulations extend to a wider range of biomolecular phenomena, approaching the viral and cellular level and closer to experimental timescales. Recent advances in methods and algorithms have made it possible to simulate molecular assembly of up to 2 × 10^9 atoms in molecular dynamics, with a total simulation time of microseconds or even milliseconds.

Figure 3: Timeline of the complexity of biological systems that can be simulated with molecular dynamics.

Free-energy simulation represents another area that benefits from advances in GPU development. Methods such as relative binding free energy calculations, thermodynamic integration, and free energy perturbations can now calculate reliable binding affinities for a large number of protein-ligand complexes.

Quantum mechanics and GPUs

TeraChem was the first quantum chemistry code written specifically for GPUs. Mixed-precision arithmetic allows very efficient calculations of coulombs and exchange matrices. TeraChem's latest algorithm allows the simulation of entire proteins using density functional theory (DFT).

Future exascale supercomputers will provide a high level of parallelism in heterogeneous CPU and GPU environments. Such an extension requires the development of new hybrid algorithms and is essentially a complete rewrite of scientific code. These new developments are now being implemented as part of the NWChemEx package. NWChemEx will provide systems with the possibility to perform simulations of quantum mechanics and molecular mechanics, which are orders of magnitude larger than those that can be processed with canonical formulas of theoretical methods.

GPU-accelerated protein structure assays

Cryo-EM's high throughput and automation are becoming increasingly important as the most advanced experimental techniques for protein structure determination, as the most advanced experimental techniques for protein structure determination, and for structure-based drug design.

DL-based approaches, such as DEFMap and DeepPicker, have been developed to speed up the processing of cryo-EM images.

In addition to accelerating the experimental characterization of protein structures by cryo-EM, DeepMind's recent breakthrough success in using the AlphaFold-2 method in the Critical Assessment of Protein Structure Prediction (CASP) challenge hints at the future impact of DL algorithms on the structural characterization of proteins and the expansion of the pharmaceutically available proteome.

The advent of DL in CADD

The development of deep learning, particularly in computer vision and language processing, has rekindled CADD researchers' interest in neural networks.

The advent of GPU-enabled DL architectures, and the proliferation of chemical genomics data, has led to meaningful CADD-enabled discovery of clinical drug candidates. In addition, artificial intelligence (AI)-driven companies such as BenevolentAI, Insilico Medicine, and Exscientia, among others, are succeeding in enhancing drug discovery. Recent success stories have shown that further rollout and application of AI-driven approaches powered by GPU computing can greatly accelerate the discovery of new and improved drugs.

DL schema for CADD

From finding applications of discriminant neural networks in virtual screening of existing or synthetically viable chemical libraries, to the recent success of the DL generation model that inspired its use in de novo drug design, Figure 4 depicts a general scheme for commonly used state-of-the-art DL architectures. Table 1 lists their adoption in CADD.

Figure 4: Architecture of several popular neural networks.

Table 1: State-of-the-art DL categories and their use in drug discovery.

Use GPUs and LLs to scale up virtual filtering

Structure-based virtual screening and ligand-based virtual screening are designed to sequence compounds based on computational combined affinity with targets, and to extrapolate structural similarities between small molecules as functional equivalence, respectively. As the libraries of purchasable ligands grow exponentially to include tens of billions of synthesizable molecules, there is growing interest in scaling up traditional virtual screening operations through parallelization of docking computations or DL-based acceleration.

A number of structure-based virtual screening methods have recently been developed to efficiently sift through chemical libraries of billions of entries. However, the computational costs remain high and can be prohibitive for drug discovery organizations that don't have access to elite supercomputing clusters.

On the other hand, alternative structure-based virtual screening platforms have recently emerged, leveraging DL prediction and molecular docking to facilitate the selection of active compounds from large libraries with limited computational resources. These DL-based approaches may play an important role in enabling academic research groups and small and medium-sized industries, among others, to access chemical spaces compared to brute force methods.

GPU-enabled DL promotes the democratization of open science and drug discovery

The integration of DL with CADD has greatly contributed to the global democratization of drug discovery and open science efforts. The growing demand for large data sets of DL models will naturally encourage data sharing practices and require broader open data policies. In addition, GPU acceleration in cloud-native computing and microservices-oriented architectures can make caddy methods free and widely available, helping to standardize compute modules and tools, architectures, platforms, and user interfaces.

While these new DL-enabled modeling opportunities are exciting, CADD scientists need to be cautious about the expected impact of DL technology.

Open science work benefits from recent end-to-end DL models that can be implemented using GPUs at all stages of drug discovery.

Due to the complexity of the law, the sharing of proprietary data between institutions continues to be a bottleneck to streamline drug discovery research. Federated learning allows participating institutions to localize their respective non-shared data. The trained local model is then aggregated in a central server for broader accessibility. Thus, federated learning supports democratization by mitigating the challenge of data exchange to some extent, although effective model aggregation remains an active area of research.

Conclusions and Outlook

Modern drug discovery has benefited from the recent explosion of DL models and GPU parallel computing. Driven by advances in hardware, DL excels on drug discovery issues ranging from virtual screening and QSAR analysis to generative drug designs. It is expected that the growing popularity of increasingly powerful GPU architectures, as well as the development of advanced DL strategies and GPU-accelerated algorithms, will help make drug discovery affordable and usable to the broader scientific community around the world.

Another key driver of DL algorithms is the availability of "big data". As gene sequencing and high-throughput screening become easier, data-driven computational chemistry researchers now have easy access to large amounts of raw data. However, the cost of managing high-quality labeled data, which is critical to supervised learning methods, remains high. Therefore, the hypothetical advantage of delving into centralized, processed, and well-labeled data repositories remains an open area of research.

Overall, researchers in the field of drug discovery and machine learning collaborate effectively to identify CADD subproblems and corresponding DL tools. We believe these applications will be fine-tuned and matured in the coming years, and this collaboration will further expand into other untapped areas of the life sciences. As a result, federated learning and collaborative machine learning are gaining more and more attention, and we believe they will be the precursors of the democratizing drug discovery revolution.

Artificial Intelligence × [ Biological Neuroscience Mathematics Physics Materials ]

"ScienceAI" focuses on the intersection and integration of artificial intelligence with other cutting-edge technologies and basic sciences.

Welcome to follow the stars and click Likes and Likes and Are Watching in the bottom right corner.

The transformational role of GPU computing and deep learning in drug discovery

Read on