He watched his daughter break the entire newly built lighthouse.
“What’s wrong with
it?”
“There’s no
balcony,” she muttered.
As she crushed the
beautiful white tower, her dad pointed out. “It resembled Fastnet. You could’ve
just added a few circular railings at the top and- “
“I want two
balconies.”
She’d made up her
mind. The dad shook his head and sipped his morning coffee, reading the
newspaper. He wondered if using LEGO instead of CADD would help him in his
quest for better drug designs. Maybe LEGO wasn’t such a bad idea after
all.
He imagined the
day when designing new drugs would be as simple and creative as building with
those little tiny pieces.
He’d still have to
work out multiple permutations of different blocks, hoping he would find the
perfect combination for a drug that could possibly help millions around the
globe.
This is where the
latest breakthrough in structure-based generative chemistry comes into play.
The Puzzle of Drug Design
Have you ever
played one of the most influential games created by Alexey Pajitnov? Well,
designing drugs has always been like playing a gigantic game of Tetris.
Scientists take molecules from existing databases (whose space ranges from 10^60 to 100^100 depending on the size of the required drug), and try
to fit them into protein target areas, and see what fits the best. This method,
called Virtual Screening, is laborious and limited by what’s already known.
Kind of like making the same, repeated buildings from a LEGO manual, right?
Deep Learning
methods have allowed us to explore such a widespread database. Methods such as Variational
Autoencoder, Generative Adversarial Networks, normalizing flows, and diffusion
models learn the underlying hidden distribution of molecules. Yet, they fail to
provide substantial 3D information, representing molecules as simple SMILE
strings or graphs. Capturing the 3D structure of a molecule is perhaps one of
the most important tasks. One molecular graph could form various conformations
with different properties in the 3D space. Considering this, 3D molecule
generation was incorporated to account for the 3D spatial information of
molecules. However, whether these molecules would bind well to the target
proteins was not considered.
Henceforth,
considering the 3D structure of the target pocket as conditional information as
well as learning the interactions between molecules and proteins helped researchers
to understand the conditioned density of desired molecular data.
Variational
Autoencoders (VAEs) are sophisticated generative models engineered to
understand the underlying probability distribution of a given dataset and
generate new, similar samples. At the heart of a VAE is an encoder-decoder
architecture. Voxelized atomic density images would be fed to VAEs to obtain
transformed molecules from those images. However, VAE compresses the pocket
structure information and fails to generate accurate target-specific molecules.
What about
auto-regressive models?
Think of it like this: imagine you're trying to draw a
detailed picture, but you can only draw one tiny part at a time, and you have
to start from scratch with each stroke of your pen. That's kind of what these
auto-regressive frameworks do when generating molecules- they build them up
atom by atom, step by step. Sure, they're trying to explore this vast chemical
space, but it's like trying to navigate through a maze blindfolded. You're
bound to make mistakes, especially when you can't see the bigger picture. And
let's face it, molecular structures are as messy and unpredictable as trying to
untangle a ball of yarn- every twist and turn adds another layer of complexity.
Consequently, achieving accurate and efficient 3D sampling of molecules within pocket cavities remains a significant challenge in the field. Recently, diffusion models have garnered significant attention in computer vision tasks, especially in point cloud generation, which is quite similar to 3D molecule generation. These methods can fill in 3D objects by learning the joint distribution of the data. Maybe they could be used for molecule generation?
Enter PMDM: The Master Builder
The Pocket-based
Molecular Diffusion Model (PMDM) is like having a magical LEGO set that can
create the perfect piece for any spot in your castle along with providing
information about its binding efficiency. PMDM uses advanced diffusion-based
techniques to generate molecules with fixed pocket information. Lei Huang and her
team have developed this novel conditional deep generative model for 3D
molecule generation fitting specified target proteins. Let us take a look at
the simplified steps below!
- Protein Point Cloud Encoding
- Let's imagine molecules as points in the 3D space. A molecule would contain many such points and form a “cloud” of data points in this space. An invariant encoder called SchNet is used to capture the semantic and spatial context of the protein.
- Drawbacks
- Regular methods for 3D point clouds cannot involve edge information like chemical bond information if we represent 3D molecular geometries as 3D point clouds.
- The Dual Diffusion Method
- PMDM uses two kinds of connections:
- Covalent Localized Edges: These represent strong chemical bonds between atoms that are close to each other (pairs of atoms with interatomic distances below a certain threshold).
- Global Edges: These represent weaker forces affecting atoms that are further apart (van der Waals forces).
- Diffusion Process
- This is where the points spread out and move around. Think of this as adding "noise" or randomness to the molecule’s data. This process iteratively corrupts the original molecule (ligand) by adding Gaussian noise.
- The goal of PMDM is to learn how to reverse this process to create molecules that fit some specific criteria. There is a need to model a conditioned data distribution that further effectively generates accurate molecules with high affinity to the targets.
- Kernels and Experiments
- No matter the orientation of the molecule, its identity should remain the same. Hence, researchers at the City University of Hong Kong and Tencent AI Lab in China designed an equivariant dynamic kernel that obeys the translation, rotation, reflection, and permutation equivariance of molecular geometry systems. The team tested their model on the Synthetic CrossDocked Dataset. This dataset includes molecules designed to fit into certain pockets.
- Results of PMDM:
- ·
PMDM
was able to create new molecules that are similar to real drugs.
- ·
These
new molecules could be made in a lab (synthesis-accessible).
- ·
They
can bind well to specific proteins (high binding affinity), which is important
for making effective drugs.
- ·
PMDM
performed better than the best existing models in multiple tests.
Let’s take a deeper dive into the results. After all, we need to see why
PMDM works as well as it does, don’t we?
Metrics for Evaluation
PMDM’s performance was measured using several metrics:
- Vina Score: Estimates the binding affinity between the ligand
and the protein pocket.
- High Affinity: Percentage of molecules with better binding affinity
than the ground truth molecules.
- QED (Quantitative Estimate of Drug-likeness): Assesses how drug-like a molecule is based on
several properties.
- SA (Synthetic Accessibility): Measures how easily the molecule can be synthesized
in a lab.
- Lipinski's Rule of Five: Determines if the molecule meets key criteria for
drug-likeness.
- LogP:
Indicates the molecule’s solubility and permeability. It indicates the
octanol-water partition coefficient, which should be between -0.4 and 5.6 if
the molecule is a good drug candidate
- Diversity: Measures the variety of generated molecules.
- Generation Time: The time taken to generate 100 samples for each
target pocket.
For every target protein in the test set, the group produced 100
molecules, for a total of 10,000 molecules. A sample of these created
molecules' sizes was taken from the training set's size distribution. Several
kinds of indicators were used to compare the success rate of PMDM with other
models.
It performed better than rivals in several domains, including CVAE and
AR-SBDD. For example, PMDM demonstrated a higher Vina Score, which suggests a
greater binding to the target proteins. It also exhibited quicker production
times and better scores on measures like QED and Lipinski Scores, indicating
that its compounds were not only efficient but also had promise as potential
drugs.
When taking into account the complete model creation, PMDM wasn't simply good; it was extraordinary, creating a new benchmark for molecular design.
What about evaluating each LEGO piece, not just the final model?
When evaluating a
LEGO model, it's not enough to look at the finished build; each brick must fit
well and contribute to the structure's stability. Similarly, to understand the
quality of generated molecules, it's not just the overall structure that
matters but also the quality of their sub-structures. Several pocket proteins
were investigated to understand the substructures.
AR-SBDD often
creates unstable three-atom rings, indicating it gets stuck in local optima.
In terms of ring
structures, PMDM generates fewer unstable three and four-atom rings and more
stable five and six-atom rings (cue the organic chemistry flashbacks), which
are crucial in drug design due to their frequent hydrogen bonds. This balance
suggests that PMDM has a much better understanding of the molecular data
distribution, producing molecules that are more representative of real-world
drugs.
Further, bond angles and dihedral angles were assessed to ensure the molecules' local geometry was accurate. PMDM outperformed all models in maintaining these geometric properties and these results indicate that it is capable of capturing the local atom geometry of the data.
So many LEGO worlds!
After checking out the local geometry of molecules
from PMDM, it's important to consider the broader chemical space they occupy. We
need to focus on the overall shape and distribution of molecules.
PMDM stands out by accurately capturing both 2D and 3D molecular fingerprints. Using methods like Morgan, RDKit, and USRCAT, PMDM represents the chemical space of generated molecules compared to test-set molecules. The Extended-Connectivity Fingerprints (ECFP) based on the Morgan algorithm consider atom types, connectivity, and chemical features. RDKit fingerprints measure 2D substructures.
Visualizing chemical space distribution with t-SNE (t-distributed stochastic neighbor embedding) shows that PMDM-generated molecules can cover the test-set molecules in 2D substructure space, showing an accurate modeling of the training space. The 3D chemical space is also well captured by PMDM, indicating no significant mismatches between generated and test-set molecules.
Since the shape of molecular targets is important, researchers use Principal Moments of Inertia (PMI) and Plane of Best Fit (PBF) descriptors to characterize these shapes. PMI descriptors reflect whether a molecule's geometry is rod-shaped, disc-shaped, or sphere-shaped. A ternary plot of Normalized Principal Moment of Inertia ratios (NPR) shows that PMDM-generated molecules exhibit similar patterns to test set molecules, gathering around the rod corner and even touching the disc and sphere corners, suggesting PMDM's ability to explore novel shapes beyond the dataset.
The PBF values, which measure the distance of heavy atoms from the plane of best fit, also show a great match between the test set and generated molecules. This indicates that PMDM can model both 2D and 3D molecular structures accurately, guiding the exploration of novel drug-like structures.
Is your LEGO model better than ours?
How well does PMDM work practically? To test this, the trained model was
applied to generate molecules targeted for SARS-CoV-2-related proteins with
high affinities. Specifically, the focus was on designing non-covalent
inhibitors for the SARS-CoV-2 main protease (Mpro), which is crucial for viral
replication. Hence it is a viable drug target.
SARS-CoV-2 Mpro was selected as a test case.
The researchers aimed to generate molecules with novel scaffolds. Using three atoms as the seed fragment, 40,000 molecules were generated. According to their Vina scores, some were filtered resulting in 10,627 high-affinity molecules. None of these molecules were in the training set, indicating that PMDM can generate novel molecules that bind well to target proteins.
Building the Perfect LEGO
Tower: Pharmacophore Analysis
Imagine building a LEGO tower, where each piece has to
fit just right to complete the structure. In the same way, designing effective
drugs requires molecules to have specific features that fit perfectly with
their target. To see if the molecules generated by PMDM had these features, the
team used Align-It software to visualize the distribution of hydrophobic
groups, such as aromatic rings and lipophilic regions.
The results were promising. The hydrophobic groups in
the generated molecules clustered in key areas (S1’, S1, S2, S3, and S4) just
like the reference compound, showing that PMDM can create molecules with
similar binding properties.
Further analysis revealed that the hydrogen bond acceptors in the generated molecules interacted well with crucial residues like HIS 163 and GLU166, and the hydrogen bond donors were in the right spots. Additionally, new clusters suggested that these molecules could form hydrogen bonds with other parts of the protein pocket.
Admiring your final build
As the dad watched his daughter build her LEGO tower (including
two balconies) with crazy determination, he couldn't help but feel a sense of
pride. Just like her, he knew that creating something remarkable often required
breaking down a few things along the way. It requires patience, determination,
and the ability to plunge yourself into a vast ocean of opportunities.
PMDM makes things easier and faster while delivering
better results than the top methods out there. We believe it's going to
revolutionize how new drugs are designed, especially for targeting specific
proteins. This method could be a ground-breaking method for faster drug
discovery and delivery.
Comments
Post a Comment