Molecular Docking Overview and Applications
Molecular Docking Overview and Applications
'9't +$ - ~ l
Design of molecules conforming
to the desired re ulrements ·• +;,.~- -·
Molecul ar docking tries to predict the structur e of the intermo
constitu ent molecules. Molecul ar docking has become an increasin
lecular complex formed between two or more
gly importa nt tool for drug discover y.
•
D-"'ockingistheidenti~'o f the low-ene r bind in modes fa small molecul e or ligand within the active
site of a
macrom olecule, or receptor, whose structur e is known .
Oocking is the computa tiona determi nation o lnding affinity between molecules rotein structur e and ligand).
Giver1 a protein and a liganc.: find out the binding free energy
of the complex formed by docking them
0
Provides vis~lal and quantitative insights into binding modes and sta bili ty.
o Helps in virhial screening of com pound libraries for potential candidates in drug discovery.
6. Here is a tabular re presentation of the types of docking based on the system, flexibility, and focus:
II
T) p e of
J~escription Key Features Use Cases
Docking II I I
Both the lig1and and receptor are Simplest form of docking; faster
Rigid Initial screening and
treated as rij~d, with no flexibility and less computationtz
Docking simple interactions.
in their con~~mnations. ntensive. ).J,rl..... o• ,
11
" '
I I I " - I
Flexible
Both the li~ md and receptor are r~u~
Most r
l'-1. 'C""" •
,c but highly
Complex systems
flexible, ad+ ting their where both interact
Docking resource-intensive.
conformations. dynamically.
Protein- Drug-target
Small [Link] lig~d interacts Focused on drug discovery and
Ligaiid interactions; inhibitor
!With a proteio receptor. small molecule design.
Docking design.
Protein- Larger interfaces; models ' Understanding
Protein
La,ge-scale ~ocking for protef- .
surface complementa,rity and signaling or metabolic
Docking 0
lorc},tJin~~te f tions.
1 dO (J) hydrophobic patches. pathways.
1-RJGIO DOCKI IG/TH~ LOCK AND KEY THEORY
- -
t Subsu ate
~
In Rigid d,ocking both
Ca)
~ ~[Link]-.•t• the interna l g e o metry
of receptor a nd ligand
O(ey lsubsu-a~e) Lock ( enzyme)
is kept fixed and
-
l e
1
nzyme docking is [Link].
<b) , j
1 Qn Flexible docking an
enumeration on the
rotations of one of the
molecules is
performed. Every
totahon tne surface
cell occupancy and
energy is calculated
•~~er the most
pti~uan pose is
t~(l.
Docking Assessment:
atoms
The spheres become potential locations tor liQ~~~_
Step 4 Matching: Sphere centers are then matched to the
ligand atoms to determine possible orientations for the ligand
-- 1
f' Step 5Scorin Find the top scoring orientation Ligand docked into
protein's active site
Types of views
Stick view
• In this view the bonds are displayed as sticks. One can specify the
colour and width of the sticks. Different bonds are represented in
different colours.
Ball and stick model
• In this kind of display the atoms are displayed as small sphere or balls
and bonds are displayed as sticks. One can specify the colour and
width or radius of the sticks or balls. Different atoms are represented
in different colours.
Spacefill view
• This view is used to represent all of the currently selected atom as
solid spheres. It can also display both union-of-sphere and ball and
stick models of a molecule. One can provide the sphere radius in
RasmoI(1/250th an angstrom)' · ·
Wire frame
• This view represents each bond within the selected region of the
molecule asia cylinder, a line, or a depth-cued vector. One can specify
the cylinder radius in angstrom or Rasmol units, colour,etc. By defau lt,
non-bonded atoms become invisible and can be marked by a specia l
command. ·
Ribbon view
• This view displays the selected molecule of protein or nucleic acid as
a smooth solid 'ribbon' surface passing along the backbone of the
protein. The ribbon is drawn between each amino acid whose alpha
carbon is currently selected. One can specify the width and colour of
the ribbon.
Cartoon vi E~W
Backbone vile w . -,
• This representati,bn displays the polypeptide ba~kbon-~ as a series of
bonds connectinlg tne adjacent alpha cartoons of each' amino acid in a
chain. Like the Wj{eframe view, one can specify the cylinder radius,
colour, etc. one can also render·smoothern backbone or backbone
displayed with dashed line'. .
Strands viev\~
H~ _:___ ~------
/iWtrogen
• -
Sky blue ...,,,,.
St>lpl\.W'\
fiomlne,zinc
"/ill..ou_r
Brown
w~ ~-~--
Structure
LIGPLOT
LIGPLOT is a. computer prog am designed to generat~ schematic 2D diagrams ~ f protein-ligand interactions based on
data from Protein Data Bank (PDB) files. It focuses on visualizing key molecular interactions such as hydrogen bonds
and hydrophobic contacts.
• Hydrophobic contaCi~s are shown using arcs with spokes pointing toward the ligand atoms they interact with .
These arcs give a visual representation of contact points, making it easier to interpret binding sites and their
environments.
DIMPLOT
• It visualizes hydrogen bonds, salt liridges, and other critical contacts at the interaction interface.
• This tool is particular! 1 bl f ,
. . . . Y va ua e Of stu dying large macromolecular com plexe4 such as enzyme complexes,
signaling protein interactions, or structura l protei ns with multi-domain architectures.
DIMPLOT is common ly used to:
3.
.------ ------- ---~-- ------- --
Provide insight into cooperative functions of prot~in domains.
· rt~
• The ~ am automatically generates schematic
diagrams of protein-ligand interactions from the 3D
coordinates PDB file(Edited by PDB- Editer).
neyelashe represents hydrophobic
-.1._s_w_1t_r,:e:;:,.~ep- t-or while
interacti.-on
, Pink eyelashes represents hydrophobic interactions with
JJ$!ands.
',
Applica tions ~
• Virtual screenlng(h lt Identificatio n)
Docking with a sco ring function can be used to quickly screen l arge .
databases of potential drug s in - silica to identify molecules that are likely to
bind protein target of Interest.
u Discover lzatlon
ocking can be u se d to predict in wherein which relative orientation a ligand
binds to a protein(bind lng mode or pose). This information may in turn be
used to design more potent and selective analogs.
remediatio n
rotein ligand docking can also be used to predict pollutant that can be
degraded by enzymes.
The Autigcnicil y Plot is~ computational tool used lo predict IY,!.Ljg_enic regions Q~oteio or peptide
sequence. These regions can trigger an immune rcs1)011sc-1. making them crncial fo v a c ~ r
antibody research. Herc's how it works, slcp~by~slep : -
Mode of Operation:
·1. Input:
o The sequence of inleres( is provided a a single-letter amino acid notation and any non-
IUP AC char~ are ignored.
o The progran~ --,--
requires tluf..,_a_m _u
in_o_a_c_i_d_s_eq _ en- c'i , so.__h_eaders or comments from sequence
files should be removed. __ _ _ _ __
.._
2. Algorithm: ..-- - - -•- - - - - - - - - - .
o The algorithm used is based on th~ Hopp and Woods (1981) methoq hr, in some cases, the
Kolaskar and Tongaonkar (1990) pred1ctton model, which assesses each amino acid in the
sequence based on its likehhood of being anti=en=i=c-'.--_ __
o These algorit ns compu e a numenca n 1gemc1 y mdex for each amino acid in the
sequence. Higher values sug es reg10ns more I e y o e antigenic.
3. Visualization:
o After computation, the program generates a graphical plot ofantigenicitym dices across the
se ue1
o ac 1 point n the plot represents the antigenicity value of a corresponding amino acid
".'" ue, foniiing a line graph or bar chart that visuali~ h and low antigenic sites
~ ··. .
4. Window Size: ·
o Since the program is designed for use on small screens, the sequence input window and the
output plot window are often smaller by default. Users can adjust the view by using the
scroll-bars for long sequences. _
o Suggestion: Use a text editor to prepare your sequence, and copy-paste it into the input
window. This method works best if the sequence is long and requires scrolling.
5. Computation and Output:
o After p~sting the sequence i1;to the input area, th1 "COMPUTE" button is pressed to trigger
the antigenicity prediction.
t
o The system will compute the antigenic index for each residue and then visually represent this
data OP tfie plot. ) /
Applications
~accine Design: Identifying potential antigenic epitopes for the development of vaccines.
~ o p e Mapping: Detecting regions of proteins that could bind with antibodies.
~munology Studies: Enhancing understanding of immune system recognition mechanisms.
Usage Strategy
• For optimal user experience, especially on devices with small screens, preparing the sequence in an
external text editor and copying/pasting it into the input area is highly recommended.
• After computation, the antigenicity plot provides a visual overview of where the protein sequence
might interact with the immune system, enabling researchers to highlight potential epitopes or
antigenic regions.!
Motifs Supersecondary Structures)
fl motif is a recognizable olding pattern involving two or more elements of
secondary structure and the connection(s) between them.
Or
"Th"Z!-A,'"'nnectivity between secondary structure el~ments and the type of
secondary struc um:,--.~-9.lm_ents involved define the level of structural
organization called structural · "
l:,..~~~~~~~~~~rn.
• otifs do not allow us to predict the biological functio s: they are found in
proteins an enzymes with dissimilar functions.
• In proteins, a structural motif describes the connectivity between secondary
structural elements. ·
Types of Motifs
• A motif can be very simple, such as iw~l elements of seconda structure
folded against each other, and represent only. sma.11 part of a protei An
example is cj B~ -B loop. J /
• A motif can also be a very complex structur~)nvolving s~ores of protein
segments folded together, such as the B barr~I.
I , \ I ~
Motif mediated protein -protein inte rac tion s as
drug .
targets : ·· . . ,
There are several diseases and syndromes related to the disru
ption of specific
DMI (drug mediated · cf tifs.
- For instance Liddle' Noo nan' and Usher's here
ditary syndromes can
be caused by mutations in the reco gniti on moti f (PD
Z recognition motif
respectively) leading to the deregulation of important
signalling p~thways.
~ t has also been recognized that seve r:al viruses,
e.g., Ebo la and Rabies
viruses, hijack the cell machinery using modified dom
ain mQ!!fs
interactions. . -
- lnad ditio n, numerous oncogenic proteins either cont
ain a moti f, or
.reco gnise motif interaction ·sequences for which in_~
ibition is a pote ntial
cancer treatmen~ - ----·-
Do ma ins
• A protein domain i a conserved a f a given
tein sequence and
(terti ary) structure that ca evolve function an exis
t i dependently of the rest
of the pro · chain. · ·
• Eac domai orms a co~ thre e-d ime nsio nal Sti-u
cture and Otten can be
inde pe nd en t!~ a n d ~ - - - - - -
----- -
• A domain usually contains bEl twe e@ nd 350 amin
o acids, and it is the
modular unit rom which man y larger proteins are
constructed. ~
• e 1fferent domains of a protein are often associate
d with different
functions.
MOTIF DOMAIN
........
~per secondary structure Jrertiary structure
Formed by the connected Formed by the formation
~a-heli c_es and beta-sheets of disulfide bridges, ionic bonds, and
, ough.:;.. loops hydrogen bonds between amino acid
side chains. ,
1l ·· . .
< .•~ sen._
lU(J ~ ,.. •
n C\UU..tb .._ ~_..._.""W
""A .l,.f - ~- :, I l J
I • Self-associat ion
·· '\.•■•---·"""""
!
~~'~~ ~_.......,· ( I :'>~ l)
r '1J1J.
I
-
~~":i ol "'"H ~1'Vf t--. • ~ an=---
.J.. Fifa-/JL t
1 .•
~ ~[Link]. ~ I ) (;".myloidfib ,Uco,estruct ure]
1- Protein Families
• A protein family is a rou roteins that share commo n evolutionary
origin, reflected 'by their related functio ns and similarities an sequence or
itructu re.
• Protein families are often arrang ed into hierarchies, with proteins that share
a common ancest or subdiv ided into smaller, more closely related groups.
• The terms superf amily (describing a large group of distantly related
prQ![Link]) and subfam ily (describing a small group of closely related prgteins)
are someti mes used in this context.
• One set of proteins that compri se a superfa mily are the G protein-coupled
receptors (GPCRs).
. th t are involved in many
• These are a large and diverse group of pr~tems a . f th immune
biological processes, including photorecept1on, regulation° e
system, ..and nervous system transmission . rt· th
• At the superfamily level, GPCRs share two commo n pzpe . 1es . • d ey
have seven transmembrane domains, and interact with peciahse . . .
proteins (called G proteins) to influence intr~cellular pathways after binding
extracellular signals.
Y)
o
Z> _ ~~, G protein-coupled L1Tto':7L~~·'
~ 6f1'l"YI
rT'Y)
_1...,A- '-'Y"q
receptors
--------------
f------
ecretfn-like· ~ ---J __I_~_ --=--==~-e~
~MP metabotropic
GPCRs ~'1fa'rs~ glutamate
receptors ·
. etc _
Lr~cep_tors
v ,.
etc
t -
•
As we group the GPCRs into smaller families, the individual groups have
more properties in common.
• For example, the protein short-wave-sensitive opsin 1 belongs to a
specialised family, known as the rbf>dopsin-like GPCRs,
• The rhodopsin-like GPCRs themselves can be further broken down .. into
smaller families that respond to different signals.
• Short-wave-sensi\ive opsin 1 proteins belong to the opsin family (opsins
being the photoreceptors of animal retinas), but more specifically, they
are members of the blue-sensitive opsin subfamily, all of which are
activated by a particular wavelength of light.
II- F~mily- and domain-based protein classific~tion
Protein famili es and doma in composition - an example
•Regulators of G-protein signalling (RGS) domains are protein str:uctural units
that activate GTPases.
• T~E:Y a~e found in sequences t~at belong to th{RG s protein fami ii)
which are multi-functional GTPase-acceleratin roteins. .:
• [Link] RGS protein family member~ contain a RGS dam i but while some _
(such as RGS1) consist of little more than the domain, others (such as 8GS3
4
and RGS6) contain additional domains that confer further functions, such as
(pEP domain~ which are involved in membrane targeting.
• RGS domains are also found in proteins belonging to other families
such as beta-adrenergic receptor kinases, ~ ,-and some members of
the sorting nexin family.
~~ ~ J
-"' ftG$1 ~
So,ttng nexlft-13
~A,- 80-z:=b 1
--
~ 11
...
;,
IProtei~ analysis
0
......._ ~JJJJ.U,t~.
~u.u .a.l
.•I-search I-
0 --- f~~
~
-----.. . I 11wOUAL&..m -
.---M-at-ure_ _., . D ~·
~~~,,,,,id ~ • model
~~JJJJJJJo
~.U.U.J.U
- -- - - ~ - - -- - -
- ' '
.. ~ 0.:
• When building the initial model, the level of amino acid conservation at
different positicms in the alignment is taken into account.
• The ~ I is tH1en used to search a protein database in an it~ratlve manner,,
refining the medial as more distantly related sequences in the database are ·
identified. ..
• Once the model is mature, the signature is ready and can be used for.
• --■--,---:;:===-
protein sequenice analysis.
,.
------➔---=-· -- -
How do prJ ein signatures compare to other ways of
classifying proteins?
• Multiple sequence alignments can provide us with ~ble information for
protein classification since they allow us to identify the (often few) amino acid
residues that are~ conserved in distantly related proteins. _ 1 f £e!j ·
• It is not possibl~ to identify such important residues with pairwise alignment
techniques, suet!~as BLAST. As a consequence, protein signatures built
from multiple s1equence alignments are usually better at detecti(lg
divergent homc,logues than pairwise comparison methods.
o.s::::-::: ::J::::=:::::1
~=-H .--1•-•~•••---S
• llAO_&U •• 1-•-•••:-W•ID
IILAO,fllCK - ~ •·· ••"'•••U
IU.M IUIIIIY •• ~ ••••••••••,
Ol~-■UU •- ••••--•--U
UM:1cn, ··1·······••HD o'•,IIIQ*I ..
[Link] __ II"!
-- 1 •--··--·MUI ■
&I.M DlCOI
os,111:DJCDI
-·r --·--·-•IAI
-••··•-•-•-Nllal
[Link] tLU'I •••••••••••MAICI.I
.._..·..uc -- u ··EJ!
IILM-Hl,TO ••
&LM-HUO ·-
QIU
OU ==~~i
111111'~·.::t;
1•1· i"'
IILM:uus.. uu
[Link] • U l,Qlt t U
u.u_11rtac -- ttn 101 11t1t' ••·IUt
IU■tll ·••IUt
&LAI IIITIIA • •
aut)acn -· ,... _.. .. Hitt
AU
~r,11
¥11~1- ;::tf'i
[Link] •·
11.M_llffU Htyltlt
••--··•MM
aut .)GITT£ • - • • ••,lTMHIIC
&[Link] •• •••• JDIIICHIIC
D It
1t1
II
(111D 11
tA
HY
~~::,
,.....,;
&LM_IIITJA •• • ••••IIIHYUII IHIP 11 [Link]~
. .
Fig: Multiple sequence alignment fo 60S acidic ribosomal protein PO rom different organisms
(eukaryota and archaea). There are two ammo acids indicated by red arrows, lysine (K) and
arginine (R), that ar"e conserved in all se ue
1
ultiple sequence alignment methods are
impo~ant for@enJ1fy)n hi hi conserved at are essential for stability or function of the)
protein.
f Diffe
Sig nat ure typ es J
rent approaches can be used to generate signature~. Thes e i nclude:
~at tern s - fq · f ~
Aro file s
~ge rpr ints
v-f11dden Mar kov rnod els (HMMs)
Each approach starts with a protein multiple seq• ,ence align
ment, and can
focus on a sing le·~?ons erve d sequ ence regi on ffno wn
as· a mot !fl, mul tiple
cons erve d mot ifs, or the full align men t of the entire prote
in or a particular
domain.
merods~
Single motif
-t--= Multl)>lemotlfmethods ~
13
Ci!
11 ~ _4 .
1~ -:= l}·::s:- ~~ :
1- PATTERNS ~ ~ S€ !f' ~
~;,e
1 •ence alignment , Et-
Motif
l
~
1 Extract patter n
sequences
.-
J·.::::.::::.~~ I (AC(-x-Ll-{ED) I
l ,
• Many [Link] features, such as binding sites
or ·the
active site s of enzymes, con sist of only a few amino acid
s that are
essential for pr otein func tion .
• Pafterns are very good at recognising such features. They
are built by
identifying thesE~ regions in multiple sequence alignments.
• The pattern of conserv t· . .
a ion within the seouenoo feature is then modelled
as a regular expres · . . ·
d t b sion, as is md1cated in Figure 13. An example of a
a 8 ase th at uses patterns Is PROSITE.
2- PROFILES
- -----
...
-----
l It
r.
Ii,
•.
t. I I
0
C
,, . .,0
M
L
,.l ti •'• •
A C
[Link]
't • . •.
I 0
•,, tI .s •
[Link] , I C r;
:etudl I
It
•, z • r, A A
I C 0
[Link] tr:
••• I Ii,
" C
Cornet order
\ Fingerprint signature
PROOOOO
4-HMMs
Sequeooet .
Sequence 2.
r
r
I
r;
,_r,
,. , • ,,.0
r,
, C L r, V
Q
0 'I "
G
Sequenoe3
Sequenoe-4
J'
r
I
r
J
y
V
V
•I I
I .A
1, 11
J,
If
I
Figure: Representation of a Hidden
SequmoeS r y r, A A V I A D
,.'
• ,. .,
Sequenoe6 J, I r J II I C l I 0 Markov model based ao a rot •11iple
Sequence 7. r .r, J, Vi ,,, ,;
• I \
I \ J I' I
, sequence alignment. Amino acids are
'
given a score at each position in the
l sequence alignment according to the
freq•,ear.¥ wjth..whjch they occur.
Transition probabilities (i.e., the
likelihood that one particular amino
acid follows another particular amino
acid) and insertion and deletion states
are also modelled.
1 == insert state
M = match state
D = delete state
-----
• H_idden Markov models (HMMs) are used by many databases.
• Ll~e profiles,. they can be used to convert multiple sequence ·
ahgnments mto position-specific scoring systems. .
• HMMs are adept at representing amino acid insertions and deletions,
meaning that they can model entire alignments, including divergent
regions. ·
• They are sophisticated and powerful statistical models,~ery well suited
to searchihg databases fQt"liomologous sequences?:) =
• HMMs have wide utility, as is clear from the numerous databases that
· use this method for protein classification, including Pfam,
SMART,TIGRAM, PIRSF, PANTHER, SFLD, Superfamily and ~ene3D .
Protein classification resources at the EBI: lnterPro
• ~ i s the [Link] resource for protein classification at the
EBl(European Bioinformatics Institute).
• In lnterPro, patterns, profiles, fingerprints and HMMs from a numb er
of different databases are brought together into a single searchable
resource, offering convenient access to their predictive capabilities """""
without the need .to visit the member databases individually.
• By combining the different databases and signature types, lnterPro
ca italises on their individual strengths, producing powerful tool for \
1
the rediction of rotein func 10n .
• lnterPro aims to simplify and rationalise protein sequence analysis
for the user by combining and or anisin information in a
consistent manner removing [Link]~. and adding extensive
annotation and useful linl<s' about the signatures and the proteins they
match. -
..
.. .. . ,., .
SIGNATUR E BIOLOGIC AL MEMBER DATABASES ,
1 1
:
/
METHOD ENTITY I\ , I
I 1
Profile
HMMs
••
•• •• -
.. ,. .. ,-~
,,,;.,.,-
. ~~m,;,in
•• •
... ,. &
._Families
-- - -♦
INTERPRO
•
Profill'S
. .. . ~
111111 ■
1111111 ■ J
t ••
' I
SUMMARY
• Protein classification allows functional and structural properties to be inferred
for novel -proteins that have not been experimentally characterised.
• Proteins can be classified according to the family to which they belong,
and/or the domain s and features they contain:
- A protein family is a group of proteins that share a common evolutionary
origin reflected by their related functions and similarities in sequence
and/or structure.
- Domain s are distinct functional and/or structural units in a protein that
can exist in a variety of biological contexts.
- Sequence features include active sites, binding sites, post-translational
modification sites and repeats.
• Signatures are mathematical models constructed from multiple sequence
alignme nts that can be used to classify proteins.
• Using protein signatures is often a more sensitive way of identifying protein
function than pairwise sequence similarity searches, such as BLAST.
• Different types of signatures use different methods, focusing on single motifs
(patterns), multiple motifs (fingerprints) or considering the whole alignment
(profiles and HMMs). They offer distinct advantages in terms of protein
sequen ce analysis and can be used to classify proteins into families , or to
identify domain s or sequen ce feature s.
• The EBI offers a resource for protein family classification and domain and
site prediction using protein signatures: lnterPro. lnterPro combines signatures
from multiple, diverse source databa ses into a single searchable resource.