100% found this document useful (1 vote)
27 views23 pages

Molecular Docking Overview and Applications

Molecular docking is a computational technique used to predict the interaction between small molecules and target macromolecules, primarily for drug discovery and understanding biomolecular interactions. It involves identifying binding modes and affinities, aiding in rational drug design by simulating protein-ligand interactions and providing insights into molecular structures and characteristics. Various docking types and software are utilized to assess binding interactions, visualize molecular complexes, and generate diagrams for analyzing protein-ligand and protein-protein interactions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
27 views23 pages

Molecular Docking Overview and Applications

Molecular docking is a computational technique used to predict the interaction between small molecules and target macromolecules, primarily for drug discovery and understanding biomolecular interactions. It involves identifying binding modes and affinities, aiding in rational drug design by simulating protein-ligand interactions and providing insights into molecular structures and characteristics. Various docking types and software are utilized to assess binding interactions, visualize molecular complexes, and generate diagrams for analyzing protein-ligand and protein-protein interactions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

sfll, ~ e sf«nbeng ~2-

fW 1-u,,., c.,,-rl\plex sJ-,vcJ-u.,ie ~ ;rt11vrdvoJ.~


(,,5, vl~i n(;lf ilnl ~
Mole cular Dock ing h""" )...-r ,n.(JJl.l ,nqeu.J,u. 5tnxJ-w,e f<h ~
Molecul ar docking is al computa tional technl ~ used to predlc
mall molecule rgaod interacts with a target
macrom olecule (usually a protein or nucleic acid) to form a
able comple It Is a critical tool w..drug discoveof,.., _
understa nding biomole cular interacti ons, and virtual sc~g
. L :J
anddrugdc
Conceptual frame usod In slgn ,._,,,.. modcUng,.__ _ _ _ _ _ _ _ _ _ _
molecular _ _ _ _ _ _ _ __

'9't +$ - ~ l
Design of molecules conforming
to the desired re ulrements ·• +;,.~- -·
Molecul ar docking tries to predict the structur e of the intermo
constitu ent molecules. Molecul ar docking has become an increasin
lecular complex formed between two or more
gly importa nt tool for drug discover y.

D-"'ockingistheidenti~'o f the low-ene r bind in modes fa small molecul e or ligand within the active
site of a
macrom olecule, or receptor, whose structur e is known .

Oocking is the computa tiona determi nation o lnding affinity between molecules rotein structur e and ligand).
Giver1 a protein and a liganc.: find out the binding free energy
of the complex formed by docking them

Why docking is [Link]:,t?


• !tis cf extreme relevance in cellular biolo&y, where is accomplished
bv ~ro~eins interacting with themselves and
prc+e'rc into Kticgw't b them&QI a§ al'l!I with other :nolecu!ar [Link]
nts , ~ (D J
It is the key to rational drug design: the results of docking can be 1
used to fi:id i11hi61tors for specific ta r&et'_wo,t5:?in~
and thus to design new drugs. It is gaining important as the number of
proteins whose structure is known f"Ol' toc.f-1~
~ - 0 .9d e_r.h~c-'1 ______, f~_d/ i
Molec ular model ing provides insights into various molec ular
aspects: ~ BilY/l al
~ - e 3D Structure of Molecules: }
Id~ I/:> ,
{ n,d.
"[Link])i
o Helps in unde,stonding thespat ial anaage meot of atoms.
4 / l ~l'l) oJ.
o Tools like PyMOL, Chimera , or MOE enable high-res olution
3D visualiza tion . \J ~ a [Link]"("\
0 Useful for investig ating stereois omers or conform ations .
ll• ~-egtrr'-1
2 Chemical and Physical Characteristics:

0 Enables calculat ion of physicoc hemical properti es like charge


hydroph obicity, etc.
distribu tion, dipole momen t,
~r
o Guides the design of molecul es with desired solubilit y, permea
bility, and reactivit y.
3 Comparison of Structures:

o Aligns and compare s the structur es of related molecul es .

o Reveals conserv ed regions critical for binding or function .

o Applicat ions in evolutio nary studies, QSAR, and drug design


.
4. Visualization of Molecular Complexes:

o Simulate s protein- protein, protein- ligand, or protein- nucleic


acid Interact ions.
I• I I
• I , ,
I
t t

0
Provides vis~lal and quantitative insights into binding modes and sta bili ty.

o Aids in unde standing mechanisms like catalysis, inhibition, or signaling.

5. Prediction of Relate j Molecules:


o Generates a alogs of existing molecules by modifyi ng functiona l groups or structures.

o Helps in virhial screening of com pound libraries for potential candidates in drug discovery.

6. Here is a tabular re presentation of the types of docking based on the system, flexibility, and focus:
II
T) p e of
J~escription Key Features Use Cases
Docking II I I
Both the lig1and and receptor are Simplest form of docking; faster
Rigid Initial screening and
treated as rij~d, with no flexibility and less computationtz
Docking simple interactions.
in their con~~mnations. ntensive. ).J,rl..... o• ,
11
" '
I I I " - I
Flexible
Both the li~ md and receptor are r~u~
Most r
l'-1. 'C""" •
,c but highly
Complex systems
flexible, ad+ ting their where both interact
Docking resource-intensive.
conformations. dynamically.
Protein- Drug-target
Small [Link] lig~d interacts Focused on drug discovery and
Ligaiid interactions; inhibitor
!With a proteio receptor. small molecule design.
Docking design.
Protein- Larger interfaces; models ' Understanding
Protein
La,ge-scale ~ocking for protef- .
surface complementa,rity and signaling or metabolic
Docking 0
lorc},tJin~~te f tions.
1 dO (J) hydrophobic patches. pathways.
1-RJGIO DOCKI IG/TH~ LOCK AND KEY THEORY
- -
t Subsu ate

~
In Rigid d,ocking both
Ca)
~ ~[Link]-.•t• the interna l g e o metry
of receptor a nd ligand
O(ey lsubsu-a~e) Lock ( enzyme)
is kept fixed and
-
l e

1
nzyme docking is [Link].

<b) , j

FLEXIBLE DOCKIN ~ /THE INDUCED-FIT THEORY


• En [Link]- Subs t n lte
Complex

1 Qn Flexible docking an
enumeration on the
rotations of one of the
molecules is
performed. Every
totahon tne surface
cell occupancy and
energy is calculated
•~~er the most
pti~uan pose is
t~(l.

Docking Assessment:

Hydrophobic and hydrf philic interactions,


Hydrogen bond donati~d
Hydrogen bond accep red
Ligand orientation wit, a best complementary score
Binding affinities
• Ionic interactions
• Aromatic interactions
• Vander-Waal forces
• Electrostatic forces
• Free energies.

Protein docking softwares

A-ffinity- (Accc l1-ys lnc.) ·


1AutoDoci;:}(The Scripps Research ins titute)
FlcxX -(BioSolve IT)
GLlDE- (Sch1·o dingcr)
- GOLD CCDC)
LIGPLOT -(University College of London)
Flex1DOCK -(Tripos)

(S m ith and Michael , 2002)

DOCK works in 5steps:


Step 1Start with crystal coordinates of target receptor Ligand Target
database Protein
Step 2Generate molecuiar surtace for receptor
Molecular
Step 3Generate sphems to fill the ac!ive slte of the receptor: docking

atoms
The spheres become potential locations tor liQ~~~_
Step 4 Matching: Sphere centers are then matched to the
ligand atoms to determine possible orientations for the ligand
-- 1

f' Step 5Scorin Find the top scoring orientation Ligand docked into
protein's active site

Types of views

Stick view

• In this view the bonds are displayed as sticks. One can specify the
colour and width of the sticks. Different bonds are represented in
different colours.
Ball and stick model
• In this kind of display the atoms are displayed as small sphere or balls
and bonds are displayed as sticks. One can specify the colour and
width or radius of the sticks or balls. Different atoms are represented
in different colours.

Spacefill view
• This view is used to represent all of the currently selected atom as
solid spheres. It can also display both union-of-sphere and ball and
stick models of a molecule. One can provide the sphere radius in
RasmoI(1/250th an angstrom)' · ·

Wire frame
• This view represents each bond within the selected region of the
molecule asia cylinder, a line, or a depth-cued vector. One can specify
the cylinder radius in angstrom or Rasmol units, colour,etc. By defau lt,
non-bonded atoms become invisible and can be marked by a specia l
command. ·

Ribbon view
• This view displays the selected molecule of protein or nucleic acid as
a smooth solid 'ribbon' surface passing along the backbone of the
protein. The ribbon is drawn between each amino acid whose alpha
carbon is currently selected. One can specify the width and colour of
the ribbon.
Cartoon vi E~W

• H_ere the view J!f


a molecul e is represented as thick, deep ribbons
Richardson(MalScr ipt) style protein 'cartoon'. By default, the c-
t~rmini of beta!sheets are displayed as arrows heads that can be
d1splayed if on~ wish to . One can also set depth of the ribbon.
~

Backbone vile w . -,
• This representati,bn displays the polypeptide ba~kbon-~ as a series of
bonds connectinlg tne adjacent alpha cartoons of each' amino acid in a
chain. Like the Wj{eframe view, one can specify the cylinder radius,
colour, etc. one can also render·smoothern backbone or backbone
displayed with dashed line'. .

Molecular surface view

· • The surface vie~l of the molecule renders a Lee-Richards molecular


surface resulting from rolling a probe atom on the selected atoms.
One can specify 1he radius of the probe. If the radius is given in the
first form,. of thelsurface of the probe is displayed i.e. the solvent
excluded surfacl If the radius given in the second form, the envelope
of the positions <Df the centre of the probe is displayed, i.e., the
solvent accessiblj surface.

Strands viev\~

• This representati<?n is similar to the ribbon view. The ribbon is


composed of a nLrmber of strands that run parallel to one another
along the peptid ► plane of each residue. Here also the width and
colour of the stra d can be specified.
Colou rJ,: Monoch rome/CP K

• CPK: this Gption provides color scheme that were developed by


Corey, Pa~~ling and later improved by Kulan. It is based upon the
col_ors of t 1t,e popular plastic space filling models, which colors 'a~om'
obJec~s by~the atom(element) types. This scheme is conventionally
used y c~iem '. ~- . . . --~-;..;;;.;; Colour . •
co, """' ~ = ===-----,
~ ygen

H~ _:___ ~------
/iWtrogen
• -
Sky blue ...,,,,.
St>lpl\.W'\
fiomlne,zinc
"/ill..ou_r
Brown
w~ ~-~--

Structure

This options [Link] the molecules by the protein secondary ~t~ucture.


The secondary tructure is either read from the PDB file(HELIX,SHEET
and TURN) and DSSP algorithms. .
• Alpha helices re coloure...d.
• Beta sheets are .[Link] yellow.
• Turns are~ [Link] as pale.
• .All other residues are coloured ~hiteJ

LIGPLOT

LIGPLOT is a. computer prog am designed to generat~ schematic 2D diagrams ~ f protein-ligand interactions based on
data from Protein Data Bank (PDB) files. It focuses on visualizing key molecular interactions such as hydrogen bonds
and hydrophobic contacts.

• Hydrogen bonds am represented as dashed lines between interacting atoms.

• Hydrophobic contaCi~s are shown using arcs with spokes pointing toward the ligand atoms they interact with .
These arcs give a visual representation of contact points, making it easier to interpret binding sites and their
environments.

LOT is often used for: ~


1. Analyzing and interpreting how drugs or small molecules bind to proteins.

2. Supporting the PDBS um database by providing standardized visualizations of molecular structures.


3. Assisti ng in understand!
n _ Relationships (SAR in drug discovery.
_ _ _ _ _ty_
Structure-Activi _ _ _.....:..._.....:...__J~

DIMPLOT

DIM PLOT is a related program th f ·


rather than protein-ligand co at ocu~es on Interactions across erotein-protein or domain-domain interfaces
highlight interactions t . mf plexes .. Like LI GPLOT, it creates schematic 2D diagrams but is specifically designed to
a inter ace regions .

• It visualizes hydrogen bonds, salt liridges, and other critical contacts at the interaction interface.
• This tool is particular! 1 bl f ,
. . . . Y va ua e Of stu dying large macromolecular com plexe4 such as enzyme complexes,
signaling protein interactions, or structura l protei ns with multi-domain architectures.
DIMPLOT is common ly used to:

~ tigate ho two proteins interact structurally.

, 2. Explore changes in interface interactions caused by mutations or external factors. ,

3.
.------ ------- ---~-- ------- --
Provide insight into cooperative functions of prot~in domains.

Whil LIGPLOT is focused on protein-ligand interactions, IMPLOT specializes in protein-protein or domain-domain


interfaces. ·

· rt~
• The ~ am automatically generates schematic
diagrams of protein-ligand interactions from the 3D
coordinates PDB file(Edited by PDB- Editer).
neyelashe represents hydrophobic
-.1._s_w_1t_r,:e:;:,.~ep- t-or while
interacti.-on
, Pink eyelashes represents hydrophobic interactions with
JJ$!ands.
',

c Green dott lines show Hydrogen bonding between


receptor and li an~
• The spoked arcs represent protein residues
making nonbonded contacts with the ligand.

Applica tions ~
• Virtual screenlng(h lt Identificatio n)
Docking with a sco ring function can be used to quickly screen l arge .
databases of potential drug s in - silica to identify molecules that are likely to
bind protein target of Interest.

u Discover lzatlon
ocking can be u se d to predict in wherein which relative orientation a ligand
binds to a protein(bind lng mode or pose). This information may in turn be
used to design more potent and selective analogs.

remediatio n
rotein ligand docking can also be used to predict pollutant that can be
degraded by enzymes.
The Autigcnicil y Plot is~ computational tool used lo predict IY,!.Ljg_enic regions Q~oteio or peptide
sequence. These regions can trigger an immune rcs1)011sc-1. making them crncial fo v a c ~ r
antibody research. Herc's how it works, slcp~by~slep : -

Mode of Operation:

·1. Input:
o The sequence of inleres( is provided a a single-letter amino acid notation and any non-
IUP AC char~ are ignored.
o The progran~ --,--
requires tluf..,_a_m _u
in_o_a_c_i_d_s_eq _ en- c'i , so.__h_eaders or comments from sequence
files should be removed. __ _ _ _ __
.._
2. Algorithm: ..-- - - -•- - - - - - - - - - .
o The algorithm used is based on th~ Hopp and Woods (1981) methoq hr, in some cases, the
Kolaskar and Tongaonkar (1990) pred1ctton model, which assesses each amino acid in the
sequence based on its likehhood of being anti=en=i=c-'.--_ __
o These algorit ns compu e a numenca n 1gemc1 y mdex for each amino acid in the
sequence. Higher values sug es reg10ns more I e y o e antigenic.
3. Visualization:
o After computation, the program generates a graphical plot ofantigenicitym dices across the
se ue1
o ac 1 point n the plot represents the antigenicity value of a corresponding amino acid
".'" ue, foniiing a line graph or bar chart that visuali~ h and low antigenic sites
~ ··. .
4. Window Size: ·
o Since the program is designed for use on small screens, the sequence input window and the
output plot window are often smaller by default. Users can adjust the view by using the
scroll-bars for long sequences. _
o Suggestion: Use a text editor to prepare your sequence, and copy-paste it into the input
window. This method works best if the sequence is long and requires scrolling.
5. Computation and Output:
o After p~sting the sequence i1;to the input area, th1 "COMPUTE" button is pressed to trigger
the antigenicity prediction.

t
o The system will compute the antigenic index for each residue and then visually represent this
data OP tfie plot. ) /

Applications

~accine Design: Identifying potential antigenic epitopes for the development of vaccines.
~ o p e Mapping: Detecting regions of proteins that could bind with antibodies.
~munology Studies: Enhancing understanding of immune system recognition mechanisms.

Usage Strategy

• For optimal user experience, especially on devices with small screens, preparing the sequence in an
external text editor and copying/pasting it into the input area is highly recommended.
• After computation, the antigenicity plot provides a visual overview of where the protein sequence
might interact with the immune system, enabling researchers to highlight potential epitopes or
antigenic regions.!
Motifs Supersecondary Structures)
fl motif is a recognizable olding pattern involving two or more elements of
secondary structure and the connection(s) between them.
Or
"Th"Z!-A,'"'nnectivity between secondary structure el~ments and the type of
secondary struc um:,--.~-9.lm_ents involved define the level of structural
organization called structural · "

l:,..~~~~~~~~~~rn.
• otifs do not allow us to predict the biological functio s: they are found in
proteins an enzymes with dissimilar functions.
• In proteins, a structural motif describes the connectivity between secondary
structural elements. ·

Types of Motifs
• A motif can be very simple, such as iw~l elements of seconda structure
folded against each other, and represent only. sma.11 part of a protei An
example is cj B~ -B loop. J /
• A motif can also be a very complex structur~)nvolving s~ores of protein
segments folded together, such as the B barr~I.

(a) /J•a-/J Loop (b) {J Barrel


FIGURE 4-18 Motifs. (a A simple motif, thel P-a-P loop] (b) A more
elaborate motif. the a arrel This a barrel is a single domain of a-hemolysin (n

~ ~ rroH/f /ec,d Iv /r' 11 ai /;-flc(___


Or' I' FMtrJ.
\
I •
' I

I , \ I ~
Motif mediated protein -protein inte rac tion s as
drug .
targets : ·· . . ,
There are several diseases and syndromes related to the disru
ption of specific
DMI (drug mediated · cf tifs.
- For instance Liddle' Noo nan' and Usher's here
ditary syndromes can
be caused by mutations in the reco gniti on moti f (PD
Z recognition motif
respectively) leading to the deregulation of important
signalling p~thways.
~ t has also been recognized that seve r:al viruses,
e.g., Ebo la and Rabies
viruses, hijack the cell machinery using modified dom
ain mQ!!fs
interactions. . -
- lnad ditio n, numerous oncogenic proteins either cont
ain a moti f, or
.reco gnise motif interaction ·sequences for which in_~
ibition is a pote ntial
cancer treatmen~ - ----·-

Do ma ins
• A protein domain i a conserved a f a given
tein sequence and
(terti ary) structure that ca evolve function an exis
t i dependently of the rest
of the pro · chain. · ·
• Eac domai orms a co~ thre e-d ime nsio nal Sti-u
cture and Otten can be
inde pe nd en t!~ a n d ~ - - - - - -
----- -
• A domain usually contains bEl twe e@ nd 350 amin
o acids, and it is the
modular unit rom which man y larger proteins are
constructed. ~
• e 1fferent domains of a protein are often associate
d with different
functions.

y_ th~ pro tein k~ wh ich functions in signaling pathways insid e


vertebrate cells.L ~
• This protein ha~ ur dom ~: the SH2 and SH3
domains have regulatory
roles , while the two rema~ning dom ains are responsi
ble for the kinase catalytic
activity.
.. ,
~ y explain how. proteins can form molecular
> I
switches that transmit
information thro ugh out cells. I

• The central core of a domain can be constructed


from alpha helices, from
B-sheets, or from various combinations ,of these two
fundamental _folding
elements. Each different combination is known as
protein fold.
Motif vs Domain

MOTIF DOMAIN
........
~per secondary structure Jrertiary structure
Formed by the connected Formed by the formation
~a-heli c_es and beta-sheets of disulfide bridges, ionic bonds, and
, ough.:;.. loops hydrogen bonds between amino acid
side chains. ,

Mainly have a structural function in Mainly have fu nctional importance.


the protein structur1 /
Perform similar biological functions Have unique functioos
through a particular protein family v ' ,
,>

Are not stable indepen ~ntly Independently stable


.,-
Protein misfolding is the basis of numerous human
disease

Native Misfolded or partially unfolded Denatured

1l ·· . .
< .•~ sen._
lU(J ~ ,.. •
n C\UU..tb .._ ~_..._.""W
""A .l,.f - ~- :, I l J
I • Self-associat ion

·· '\.•■•---·"""""
!
~~'~~ ~_.......,· ( I :'>~ l)
r '1J1J.
I
-
~~":i ol "'"H ~1'Vf t--. • ~ an=---
.J.. Fifa-/JL t
1 .•
~ ~[Link]. ~ I ) (;".myloidfib ,Uco,estruct ure]

(a) Furtner assembly of protofilamen ts

Cc) Amyloid fibrils

Formation of disease-causing amyloid fibrils. (a) Protein molecules whose


normal structure includes regions of~ sheet undergo partial folding. In a small
number of the molecules, before folding is complete, the ~- sheet regions of
one polypeptide associate with the same region in another polypeptide,
forming th nucleus of an amyloid. dditional protein molecules slowly
associate wi e amy aid and extend it to form a fibril. (b) The4fu ,yloidj)
[Peptid J begins as two a-helical segments of a larger protein. Proteol tic
cleavage of this larger protein leaves the relative! unstable am laid-a peptide
which loses its a-helical structure. It can then assemble slowly into amyloid
} fibrils (c), which contribute to the characteristic plaques UIJ-H-M~l~rior o
nervous tissue in people with Alzheimer disease. The aromatic side hains
shown her9-+~ a significant role in stabilizi~g the amyl~id s rl!, re. Amyloid
is rich i ~ she , with1he ~ strands arranged perpendicular to the axis of the
amyloid fibril. ~
. Protein classification
• Proteins are the
the cell. Th ~acromo~ecules responsible for the biological processes in
d . ey consist at their most•basic level of a chain of amino acids,·
etermin~d by the sequence of nucleotides in a gene.
• _Depending on the amino acid sequence (different amino acids have different
biochemical ~rope~ies) and interactions with their environment, proteins fold
into a three-d1mens1onal structure, which allows them to interact with other
proteins i:md molecules and perform their function. · .
• Proteins 1that hav~ diverged from a common ancestral gene are known
as l!,_omologous;.
'-"' Proteins with similar sequences are assumed to be homologous and usually
(within certain limits) have similar str~c~ures and func tion~ .

Why classify proteins?


• Proteins can be classified into groups according to sequence or structural
similari!)'. These groups often contain well characterised proteins whose
function is known. Thus, when a novel protein is identified, its· functional
properties can be proposed based on the group to which it is predicted
to belong.
• Proteins can be classified into different groups based on:
~ the FAMILIES to which they belong ~~ . ..
~ DOMAINS they contain •
~he SEQUE NCE FEATURES they possess

1- Protein Families
• A protein family is a rou roteins that share commo n evolutionary
origin, reflected 'by their related functio ns and similarities an sequence or
itructu re.
• Protein families are often arrang ed into hierarchies, with proteins that share
a common ancest or subdiv ided into smaller, more closely related groups.
• The terms superf amily (describing a large group of distantly related
prQ![Link]) and subfam ily (describing a small group of closely related prgteins)
are someti mes used in this context.
• One set of proteins that compri se a superfa mily are the G protein-coupled
receptors (GPCRs).
. th t are involved in many
• These are a large and diverse group of pr~tems a . f th immune
biological processes, including photorecept1on, regulation° e
system, ..and nervous system transmission . rt· th
• At the superfamily level, GPCRs share two commo n pzpe . 1es . • d ey
have seven transmembrane domains, and interact with peciahse . . .
proteins (called G proteins) to influence intr~cellular pathways after binding
extracellular signals.
Y)
o
Z> _ ~~, G protein-coupled L1Tto':7L~~·'
~ 6f1'l"YI
rT'Y)
_1...,A- '-'Y"q

receptors

--------------
f------
ecretfn-like· ~ ---J __I_~_ --=--==~-e~
~MP metabotropic
GPCRs ~'1fa'rs~ glutamate
receptors ·

. etc _
Lr~cep_tors
v ,.

etc
t -

As we group the GPCRs into smaller families, the individual groups have
more properties in common.
• For example, the protein short-wave-sensitive opsin 1 belongs to a
specialised family, known as the rbf>dopsin-like GPCRs,
• The rhodopsin-like GPCRs themselves can be further broken down .. into
smaller families that respond to different signals.
• Short-wave-sensi\ive opsin 1 proteins belong to the opsin family (opsins
being the photoreceptors of animal retinas), but more specifically, they
are members of the blue-sensitive opsin subfamily, all of which are
activated by a particular wavelength of light.
II- F~mily- and domain-based protein classific~tion
Protein famili es and doma in composition - an example
•Regulators of G-protein signalling (RGS) domains are protein str:uctural units
that activate GTPases.
• T~E:Y a~e found in sequences t~at belong to th{RG s protein fami ii)
which are multi-functional GTPase-acceleratin roteins. .:
• [Link] RGS protein family member~ contain a RGS dam i but while some _
(such as RGS1) consist of little more than the domain, others (such as 8GS3
4

and RGS6) contain additional domains that confer further functions, such as
(pEP domain~ which are involved in membrane targeting.
• RGS domains are also found in proteins belonging to other families
such as beta-adrenergic receptor kinases, ~ ,-and some members of
the sorting nexin family.

~~ ~ J
-"' ftG$1 ~

So,ttng nexlft-13

~A,- 80-z:=b 1

11- ' t) '"r -


Ill-~ c e Feature~-
Sequence features are groups of amino acids that confer certain
characteristics upon 'a protein, and may be important for its overall function.
Such features include: ·
- Active sites which contain' amino acids involved in catalytic activity. For
example, the enzyme lipase, which catalyses the formation and
hydrolysis of fats, has two amino acid residues ( Is I in followed by a
~ ) that are essen Ia or 1 "'.....,... , ac IvI y.
- l!Jinciing sites, containing amino acids that are directly involved in
binding molecules or ions, like the iron-binding site of haemoglobin.
- Post-translational modification (PTM} sites, which contain residues
known to be chemically modified (phosphorylated, palmitoylated,
acetylated, etc) after the process of protein translation. ,
.
- Repeats, which are typically short amino 'd quences that are
ac1 se rt·
. . . and
repeated within a protein, may confer· b'in d'ing or structural prope Ies
upon it. .
• Sequenc efeature s differ from ·domains in that they are usually qU1te
small {oft~n only a few amino acids long), whereas domains represe rt
entire structural or functional units of the protein. , ..
• Sequence features ·are often nested within domains - a protein kinase,
domain, for example, usually con.!9ins a protein kin active site. •

Proteins can also be classified according to the sequence features they
contain. · ·
• For example ferredoxins are sulphur-iron proteins that mediate electron
transfer in a variety of biological redox reactions, including the photosynthetic
process. _
• TheY. can be divided into several groups according to the nature of their
iron-sulphur cluste y
• In the 2Fe-2S ferredoxins (which bind a cluster of two iron (Fe) and two
sulphur (S) atoms), there are 4 cysteine residue; involved in iron-sulphur
binding. / . ,
. • I \
- ....:,• - ; ( ..)

What are protein signatures?


• In order to classify proteins into families and to predict the presence of ~- ·
· important domains or sequence features, we require computat_iona.1tools.
ne ·set of such tools are the predictive models known as rotein signatures.
• There are different types o s1gna ures, built using different computational
approaches. r~~
• Ho~ever, their common starting point is a multiple sequence alignm~nt
of proteins sharing a set of characteristics,(e.g. belonging to the same
family or sharing a domain).
0
/ Pro!Bin family/domain /~ .....---- --, / \
Multiple sequence alignment I Build model 1· j,

--
~ 11
...
;,

IProtei~ analysis
0
......._ ~JJJJ.U,t~.
~u.u .a.l
.•I-search I-

0 --- f~~
~
-----.. . I 11wOUAL&..m -
.---M-at-ure_ _., . D ~·
~~~,,,,,id ~ • model

~~JJJJJJJo
~.U.U.J.U
- -- - - ~ - - -- - -

- ' '
.. ~ 0.:

',~~•.-.i;-_,.' .._;~ ';i•' J>•:· :-,.;-;.,-,., :,i,, -:. ,·""·

• When building the initial model, the level of amino acid conservation at
different positicms in the alignment is taken into account.
• The ~ I is tH1en used to search a protein database in an it~ratlve manner,,
refining the medial as more distantly related sequences in the database are ·
identified. ..
• Once the model is mature, the signature is ready and can be used for.
• --■--,---:;:===-
protein sequenice analysis.
,.
------➔---=-· -- -
How do prJ ein signatures compare to other ways of
classifying proteins?
• Multiple sequence alignments can provide us with ~ble information for
protein classification since they allow us to identify the (often few) amino acid
residues that are~ conserved in distantly related proteins. _ 1 f £e!j ·
• It is not possibl~ to identify such important residues with pairwise alignment
techniques, suet!~as BLAST. As a consequence, protein signatures built
from multiple s1equence alignments are usually better at detecti(lg
divergent homc,logues than pairwise comparison methods.

o.s::::-::: ::J::::=:::::1
~=-H .--1•-•~•••---S
• llAO_&U •• 1-•-•••:-W•ID
IILAO,fllCK - ~ •·· ••"'•••U
IU.M IUIIIIY •• ~ ••••••••••,
Ol~-■UU •- ••••--•--U
UM:1cn, ··1·······••HD o'•,IIIQ*I ..
[Link] __ II"!
-- 1 •--··--·MUI ■

&I.M DlCOI
os,111:DJCDI
-·r --·--·-•IAI
-••··•-•-•-Nllal
[Link] tLU'I •••••••••••MAICI.I
.._..·..uc -- u ··EJ!
IILM-Hl,TO ••
&LM-HUO ·-
QIU
OU ==~~i
111111'~·.::t;
1•1· i"'
IILM:uus.. uu
[Link] • U l,Qlt t U
u.u_11rtac -- ttn 101 11t1t' ••·IUt
IU■tll ·••IUt
&LAI IIITIIA • •
aut)acn -· ,... _.. .. Hitt
AU
~r,11
¥11~1- ;::tf'i
[Link] •·
11.M_llffU Htyltlt
••--··•MM
aut .)GITT£ • - • • ••,lTMHIIC
&[Link] •• •••• JDIIICHIIC
D It

1t1
II
(111D 11
tA
HY
~~::,
,.....,;
&LM_IIITJA •• • ••••IIIHYUII IHIP 11 [Link]~

. .
Fig: Multiple sequence alignment fo 60S acidic ribosomal protein PO rom different organisms
(eukaryota and archaea). There are two ammo acids indicated by red arrows, lysine (K) and
arginine (R), that ar"e conserved in all se ue
1
ultiple sequence alignment methods are
impo~ant for@enJ1fy)n hi hi conserved at are essential for stability or function of the)
protein.
f Diffe
Sig nat ure typ es J
rent approaches can be used to generate signature~. Thes e i nclude:
~at tern s - fq · f ~
Aro file s
~ge rpr ints
v-f11dden Mar kov rnod els (HMMs)
Each approach starts with a protein multiple seq• ,ence align
ment, and can
focus on a sing le·~?ons erve d sequ ence regi on ffno wn
as· a mot !fl, mul tiple
cons erve d mot ifs, or the full align men t of the entire prote
in or a particular
domain.

merods~
Single motif

-t--= Multl)>lemotlfmethods ~
13

Ci!
11 ~ _4 .
1~ -:= l}·::s:- ~~ :
1- PATTERNS ~ ~ S€ !f' ~
~;,e
1 •ence alignment , Et-
Motif
l
~

1 Extract patter n
sequences

.-
J·.::::.::::.~~ I (AC(-x-Ll-{ED) I
l ,
• Many [Link] features, such as binding sites
or ·the
active site s of enzymes, con sist of only a few amino acid
s that are
essential for pr otein func tion .
• Pafterns are very good at recognising such features. They
are built by
identifying thesE~ regions in multiple sequence alignments.
• The pattern of conserv t· . .
a ion within the seouenoo feature is then modelled
as a regular expres · . . ·
d t b sion, as is md1cated in Figure 13. An example of a
a 8 ase th at uses patterns Is PROSITE.

2- PROFILES

- -----
...
-----
l It
r.
Ii,

•.
t. I I
0
C
,, . .,0
M
L

,.l ti •'• •
A C
[Link]
't • . •.
I 0

•,, tI .s •
[Link] , I C r;
:etudl I
It
•, z • r, A A
I C 0
[Link] tr:
••• I Ii,
" C

• Profiles are used to model protein families and domains.


• They are built by converting multiple sequence alignments into
position-specific scoring systems (PSSMs).
• Amino acids at each position in the alignment are scored according to the
i ~ fre uenc with which they occur, as represented in Figure.
• ubstitution matrices ( uch as BLOSUM matrices) can be used to add
~ \e>I> evolutionary distance weighting these scores.
• Examples of ~ h a t use profiles to classify proteins include COD,
HAMAP a n ~ h i c h produces profiles as well as patterns).
The PRODOM database also uses a related approach, using
PSI-BLAST to create its profile.

3- FINGERPRINTS _tw..,, fY'«.[Link],l ~i ~


. • While single motif !Jlethods are good at identifying features a protein, ir
. most protein families are characterised not by one, but by several
conserved regions, which occur in a certain order. Identifying these
regions is the principle behind fingerprints.
• Fingerprints are composed otmultlple short conserved motifs, )
WEich are drawn from sequence alignments, ~s illustrated in Figure.
•r Each motif)is then converted Into an Individual profile (as described in the
✓previous section) to create a1fingerprint signatutiJ
Seq11en~ •[Link]
[] E2J [_]
Oafinemotlfs

Cornet order
\ Fingerprint signature

Cot red spacing

PROOOOO

• Fingerprints are used by the PRINTS database.


• They are very good at modeling the often small differences between closely
related proteins . .
• This means fingerprints can distinguish individual subfamilies within
protein families.
• This allows functional characterisation of sequences at a high Jevel of
§pecificity (identifying individual cellular pathways in which a protein might be
involved, the ligand it might bind, the exact reaction it may catalyse, and so
on).

4-HMMs

Multiple sequenc:e allcnment

Sequeooet .
Sequence 2.
r
r
I
r;
,_r,
,. , • ,,.0
r,
, C L r, V
Q
0 'I "
G
Sequenoe3
Sequenoe-4
J'
r
I
r
J
y
V
V
•I I
I .A
1, 11
J,
If
I
Figure: Representation of a Hidden
SequmoeS r y r, A A V I A D

,.'
• ,. .,
Sequenoe6 J, I r J II I C l I 0 Markov model based ao a rot •11iple
Sequence 7. r .r, J, Vi ,,, ,;
• I \
I \ J I' I
, sequence alignment. Amino acids are
'
given a score at each position in the
l sequence alignment according to the
freq•,ear.¥ wjth..whjch they occur.
Transition probabilities (i.e., the
likelihood that one particular amino
acid follows another particular amino
acid) and insertion and deletion states
are also modelled.
1 == insert state

M = match state
D = delete state

-----
• H_idden Markov models (HMMs) are used by many databases.
• Ll~e profiles,. they can be used to convert multiple sequence ·
ahgnments mto position-specific scoring systems. .
• HMMs are adept at representing amino acid insertions and deletions,
meaning that they can model entire alignments, including divergent
regions. ·
• They are sophisticated and powerful statistical models,~ery well suited
to searchihg databases fQt"liomologous sequences?:) =
• HMMs have wide utility, as is clear from the numerous databases that
· use this method for protein classification, including Pfam,
SMART,TIGRAM, PIRSF, PANTHER, SFLD, Superfamily and ~ene3D .
Protein classification resources at the EBI: lnterPro
• ~ i s the [Link] resource for protein classification at the
EBl(European Bioinformatics Institute).
• In lnterPro, patterns, profiles, fingerprints and HMMs from a numb er
of different databases are brought together into a single searchable
resource, offering convenient access to their predictive capabilities """""
without the need .to visit the member databases individually.
• By combining the different databases and signature types, lnterPro
ca italises on their individual strengths, producing powerful tool for \
1
the rediction of rotein func 10n .
• lnterPro aims to simplify and rationalise protein sequence analysis
for the user by combining and or anisin information in a
consistent manner removing [Link]~. and adding extensive
annotation and useful linl<s' about the signatures and the proteins they
match. -

..
.. .. . ,., .
SIGNATUR E BIOLOGIC AL MEMBER DATABASES ,
1 1
:
/
METHOD ENTITY I\ , I
I 1

Profile
HMMs

••
•• •• -
.. ,. .. ,-~
,,,;.,.,-
. ~~m,;,in

•• •
... ,. &
._Families
-- - -♦
INTERPRO


Profill'S
. .. . ~

111111 ■
1111111 ■ J

t ••
' I

When to use lnterPro


• You can use lnterPro if you have an amino acid sequence or set of
sequences and ou want to k
what they are, what family they belong to
what their function is and how it can be explained in structural
terms
-li i i ;;:~~
----
you also use lnterPro for a va .
t~ structur I o~ I predictio~':o/' of other purposes, such as examining
~ or any ~equence already in the UniProt
. fnterPro cannot help you if:
- you want to erform structur I .
you have a genomi c DNA a alignme nt of protein sequences
annotation ('int I sequen ce and are interested in gene
·· 1
ron exon predict·ions, 1'd ent1f1cat•on of promoter regions,
etc).

SUMMARY
• Protein classification allows functional and structural properties to be inferred
for novel -proteins that have not been experimentally characterised.
• Proteins can be classified according to the family to which they belong,
and/or the domain s and features they contain:
- A protein family is a group of proteins that share a common evolutionary
origin reflected by their related functions and similarities in sequence
and/or structure.
- Domain s are distinct functional and/or structural units in a protein that
can exist in a variety of biological contexts.
- Sequence features include active sites, binding sites, post-translational
modification sites and repeats.
• Signatures are mathematical models constructed from multiple sequence
alignme nts that can be used to classify proteins.
• Using protein signatures is often a more sensitive way of identifying protein
function than pairwise sequence similarity searches, such as BLAST.
• Different types of signatures use different methods, focusing on single motifs
(patterns), multiple motifs (fingerprints) or considering the whole alignment
(profiles and HMMs). They offer distinct advantages in terms of protein
sequen ce analysis and can be used to classify proteins into families , or to
identify domain s or sequen ce feature s.
• The EBI offers a resource for protein family classification and domain and
site prediction using protein signatures: lnterPro. lnterPro combines signatures
from multiple, diverse source databa ses into a single searchable resource.

You might also like