1/18
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Docking simulations
Simulation of binding between small molecule and a protein, where the output is a set of protein-molecule complexes with assigned scores
Docking simulations differ by
Scoring function
Force field based
Emperical
Knowledge based
Algorithm
Deterministic = predefined procedure with reproducible outcome
pros: fast and reproducible outcom
cons: may miss solutions
examples
Brute force
Shape fitting
incremental construction
Stochastic: includes randomness so different experiments can give different results
pros: better exploration of search space
examples
genetic docking
Monte Carlo
Tabu list search
Small molecule docking:
1 protein vs 1 ligand (class)
To understand how a ligand binds
Pose = different ways for a ligand to bind to a pocket
method works best when you use several similar ligands whose binding is known
Docking all creates many poses that can be clustered to identify common binding modes
Analyze
is there a correlation between the score and the experimental affinity
Do differences in bindingmodes explain differences in affinity
Small molecule docking:
1 protein vs many different ligands
For drug discovery
virtual screen of 1k to 10M compounds to identify which will bind and select those with the highest predicted affinity
Challenges:
as the amount of ligands increases, the score distribution widens and the top scores aren’t always active and you might miss hits
Solution: Consensus scoring = combine multiple scoring functions and select those that score well in all of them
Different ligands might prefer different poses and it might be hard to select the best ones
some scoring function might give high score to pose that is not biologically relevent
Solution: combine docking with other methods (MIFs, Pharmacophore modelling, similarity to other compounds)
Incremental construction algorithm
Deterministic: fast and reproducible
fragment the ligand
choose core fragment (largest/most rigid)
Place core fragment in pocket and try all different orientations using shape complementarity
Incremently (1 by 1) add remaining fragment back to their original positions trying different torsions around the bonds
at each step only conformations that fit and avoid steric clashes are kept —> tree diagram
when fully reassembled: you have a set of ligand poses that can be scored

Genetic docking algorithm
Stochastic: evolutionary principles with random variations
Treat poses as a population of organisms that evolve over time
Each pose (1 solution) is encoded as a chromosome containing genes that encode information like atom coordinates, orientation, torsion angles
Start with a random population of chromosomes (diverse ligand poses)
Evaluation: score each pose
bad ones die
good ones survive
all poses where ligand binds outside of active site die, only the ones that bind in active site survive and reproduce
Reproduction: surviving chromosomes can reproduce via:
combining parts of 2 good solutions
randomly changing genes = local search
Repeat evaluation and reprodcution for many generations
Stop when:
top solutions don’t change significantly (RMS difference)
preset number of generations is reached
Similarities and differences Monte Carlo and Simulated annealing
Similarities
Both are stochastic methods (use random sampling to explore the search space)
Both used in Sampling of the PES to find low energy protein folds and in docking to find low energy protein ligand poses
Iterative processes: both generate and evaluate multiple candidate solutions over time
Differences
Goal
GA: find the best possible solution (optimization)
output = best found solution
MC: random sampling
output = distribution of potential outcomes
Approach:
Ga works on populations of solutions
MC works on a single solution at a time
Mechanism
GA: solutions can share information via crossover
MC: No information exchange between samples
Evaluation
GA: uses score to compare and rank multiple solutions within a population (determines which survive and reproduce)
MC: uses scoring function to probabilistically accept or reject a move relative to the current state (not to rank solutions)
other algorithms
stochastic
monte carlo
tabu list search
deterministic
brute force
shape fitting
Tabu list search
several initial ligand placements are generated randomly near the bindning site
The molecule is moved by small changes
Each new position is scored
moves or positions that appear in the Tabu List are not allowed
move selection
better-scoring moves are preferred
worse moves may be chosen if no better move is available
memory update
the selected move is added to the tabu list
so they’re temporarily forbidden to prevent revisiting them
the process repeats until convergence
Brute force
try every possible orientation/postion of the ligands
shape fitting
generate a low energy conformation of the ligand
fit the conformation in the pocket based geometric constraints
optimize and score the fit
often used for fast screening
Scoring function
Biophyisical formulas that describe the quality of a docked molecule to its receptor
quality score during docking to guide algorithm towards better poses
quantitative score to rank docked molecules according to binding strength after docking
Force Field based scoring function
Only take non bonded interactions between protein and ligand into account (gives energetic values)
cons:
no entropic contribution
no inclusion of water models
water can mediate key ligand-receptor interactions (H-bonds)
FF parameters are hard to parameterize for a specific target
FF rely on param like partiel charges, atom types, … which are general and not tailored for a specific protein so the FF may not describe interactions accurately for a particular target
Emperical scoring function
= estimates protein-ligand binding by summing weighted interaction terms (H-bonds, hydrophobic,…) where the weights are obtained by fitting experimental binding data of known protein-ligand complexes using regressionmethods (=adjusting weights to match exp data)
Free energy terms
\Delta Gx =contains all unknown contributions learned through regression
polar interactions: H-bonds and ionic
apolar interactions: aromatic and lipophylic
entropic effects:
desolvation effects (removal of water upon binding)
loss of ligand flexibility: as ligand binds, the number of rotable bonds decreases ~Nrot
cons
need a training set
scores are only good if problem resembles training set
Weights=\sum f\left(\Delta R,\Delta\alpha\right)
\Delta R,\Delta\alpha —> The more you deviate from the ideal angle/distance the worse the score will be and how much worse is determined by the weights

Knowledge based
uses statistics from known crystal structures to determine how favorable an interaction is
Get stratistics from protein data bank: how often does atomtype i interact with atomtype j at a certain distance
build histograms of distance vs frequency of interaction
convert frequency to energy (more common interactions = lower energy)
apply a scaling factor: some atomtypes occur more in crystalstructure than others
Resulting score ranks the quality of the predicted score (not true energies)
Rosetta score is a scoring function for folding and design and is a knowledge based scoring function
Challenges for scoring functions
water
Water kan form upto 4 H-bonds and a lot of protein ligand interactions are mediated by water
but this is hard to model
In general, important water form >2 H-bonds
Induced fit
Most docking software keep protein rigid while flexible ligands are docked, but in reality the protein can change conformation as a ligand binds = induced fit
Solution:
Allow rotation of polar H’s or alternative rotamers (= alternative side chain conformations)
Energy mininmize after docking
run MD for realistic flexibility
Over and underscoring
scoring functions are additive so more interactions means a higher score
Big molecules in general have higher scores because they have more contactpoints
So it’s hard to compare molecules of different sizes
Docking produces many possible poses
false positives= bad pose with high score
false negatives = good pose with low score
solution: consensus scoring: combine multiple scoring functions
solvation and desolvation is not taken into account
Advanced fix to scoring function challenges
Use MM-PBSA after scoring:
inlcudes solvent
accounts for some induce fit through MD simulations
less dependent on the amount of ligands in the ligand (less overscoring)
inludes entropy - desolvation effects (energetic consequences for removing water from both the ligand and receptor when they bind)
MM-PBSA: concept
= post processing method use to estimate the binding free energy of a ligand to a receptor more accurately than simple scoring functions
MM = molecular mechanics: calculates internal energies of ligand receptor and complex
bonded+non-bonded terms
PBSA = poisson boltzmann surface area
estimates 2 solvation effects
polar solvation = electrostact stabilization by water using the poisson boltzmann formula
apol solvation = energetic penalty of creating a cavity in water (related to solvent accessible surface area)
the binding free energy is computed
