ChEMBL Resources

Friday, 11 April 2014

Target Prediction IPython Notebook Tutorial

As promised in the previous post, the ChEMBL target prediction models are now available to download from here. Furthermore, here is an IPython Notebook that showcases how the models can be used in Python. As usual, your feedback is very welcome. 


Friday, 4 April 2014

Paper: Chemical, Target, and Bioactive Properties of Allosteric Modulation

We have just had a paper accepted in PLoS Computational Biology on the work we've done on allosteric modulators (first mentioned on the blog here).  The work is based on the mining of allosteric bioactivity points from ChEMBL_14. The data set of allosteric and non-allosteric interactions is available on our FTP site (here). This blogpost will just highlight some sections of the paper, but we would like to refer the interested reader to the full paper (here). 

The dataset contains ChEMBL annotated and cleaned data divided in both an 'allosteric' set and a 'non-allosteric' (or background) set. Abstracts and titles mentioning allosteric keywords were pulled and from the resulting papers we extracted the primary target and all bioactivities on this primary target. From the remainder of the papers we also retrieved the primary target and all bioactivities on this primary target in a similar manner. 

When we observed the target distribution in both sets, we saw differences (see below ; also touched upon in the previous post). Targets that are known to be amenable to allosteric modulation are indeed well represented in our allosteric set (e.g. Class C GPCRs). However there are also some interesting observations that we did not expect (please see the paper for further details). 

Obviously, as we are the ChEMBL group, we are interested in potential chemical differences between the allosteric and background set. Interestingly, the allosteric modulators appear to form a subset of the background set, rather than that they are distinct from the background set. We have calculated a large number of descriptors and compared the sets (median values, but also histograms; all available on the FTP). We observe that allosteric modulator molecules tend to be smaller, more lipophilic and more rigid. Although there is understandably a large variance over the diverse targets included in the set. Shown here is the rigidity index calculated over the full sets (L0), but when the target selection becomes more concise, the differences become more distinct.

Likewise we observe differences between our allosteric subset and the background set with regard to bioactivity. While 'allosteric modulation' is a very diverse concept, in which the specific manner wherein the protein is influenced by the small molecule differs per protein - ligand pair, we do observe some general differences. From our data it appears that allosteric modulators bind with a lower affinity (on average) but similar ligand efficiency (on average) when compared to our background set. In the paper we provide a more extensive discussion on this observation and we would again refer the reader given the limited space here.

Classification models
Built on the dataset we have created allosteric classifier models that can predict if an interaction is likely allosteric or not. We have tried this on the full dataset, but also on lower levels (restricting the data to e.g. Class A GPCRs). We find that we can train predictive models that gain in quality if we have a more concise dataset (eliminating some of the inter-target variation). In the paper we provide case studies on HIV Reverse Transcriptase, the adenosine receptors (family), and protein Kinase B. Here the model performance for class A GPCRs (full L2 tgt class) is shown. Note that rigidity, number of sp3 carbons, Polar Solvent Accessible Surface (normalized), and rotatable bonds fraction are most important for model fit.

All data is ChEMBL and hence can be freely downloaded and used. Please let us know if you find any errors or misclassifications as we will correct them (crowd curation).

Anna, jpo, and Gerard

%T Chemical, Target, and Bioactive Properties of Allosteric Modulation
%A G.J.P. van Westen
%A A. Gaulton
%A J.P. Overington
%J PLoS. Comput. Biol.
%D 2014
%V 10
%O doi:10.1371/journal.pcbi.1003559

Thursday, 3 April 2014

Ligand-based target predictions in ChEMBL

In case you haven't noticed, ChEMBL_18 has arrived. As usual, it brings new additions, improvements and enhancements both on the data/annotation, as well as on the interface. One of the new features is the target predictions for small molecule drugs. If you go to the compound report card for such a drug, say imatinib or cabozantinib, and scroll down towards the bottom of the page, you'll see two tables with predicted single-protein targets, corresponding to the two models that we used for the predictions. 

 - So what are these models and how were they generated? 

They belong to the family of the so-called ligand-based target prediction methods. That means that the models are trained using ligand information only. Specifically, the model learns what substructural features (encoded as fingerprints) of ligands correlate with activity against a certain target and assign a score to each of these features. Given a new molecule with a new set of features, the model sums the individual feature scores for all the targets and comes up with a sorted list of likely targets with the highest scores. Ligand-based target prediction methods have been quite popular over the last years as they have been proved useful for target-deconvolution and mode-of-action prediction of phenotypic hits / orphan actives. See here for an example of such an approach and here for a comprehensive review.

 - OK, and how where they generated?

As usual, it all started with a carefully selected subset of ChEMBL_18 data containing pairs of compounds and single-protein targets. We used two activity cut-offs, namely 1uM and a more relaxed 10uM, which correspond to two models trained on bioactivity data against 1028 and 1244 targets respectively. KNIME and pandas were used for the data pre-processing. Morgan fingerprints (radius=2) were calculated using RDKit and then used to train a multinomial Naive Bayesian multi-category scikit-learn model. These models then were used to predict targets for the small molecule drugs as mentioned above. 

 - Any validation? 

Besides more trivial property predictions such as logP/logD, this is the first time ChEMBL hosts non experimental/measured data - so this is a big deal and we wanted to try and do this right. First of all, we did a 5-fold stratified cross-validation. But how do you assess a model with a many-to-many relationship between items (compounds) and categories (targets)? For each compound in each of the 5 20% test sets, we got the top 10 ranked predictions. We then checked whether these predictions agree with the known targets for that compound. Ideally, the known target should be correctly predicted at the 1st position of the ranked list, otherwise at the 2nd position, the 3rd and so on. By aggregating over all compounds of all test sets, you get this pie chart:

This means that a known target is correctly predicted by the model at the first attempt (Position 1 in the list of predicted targets) in ~69% of the cases. Actually, only 9% of compounds in the test sets had completely mis-predicted known targets within the top 10 predictions list (Found above 10). 

This is related to precision but what about recall of know targets? here's another chart:

This means that, on average, by considering the top 10 most likely target predictions (<1% of the target pool), the model can correctly predict around ~89% of a compound's known single protein targets. 

Finally, we compared the new open source approach (right) to an established one generated with a commercial workflow environment software (left) using the same data and very similar descriptors:

If you manage to ignore for a moment the slightly different colour coding, you'll see that their predictive performance is pretty much equivalent.

 - It all sounds good, but can I get predictions for my own compounds?

We could provide the models and examples in IPython Notebook on how to use these on another blog post that will follow soon. There are also plans for a publicly available target prediction web service, something like SMILES to predicted targets. Actually, if you would be interested in this, or if you have any feedback or suggestions for the target prediction functionality, let us know


Wednesday, 2 April 2014

ChEMBL_18 Released

We are pleased to announce the release of ChEMBL_18. This version of the database was prepared on 12th March 2014 and contains:
  • 1,566,466 compound records
  • 1,359,508 compounds (of which 1,352,681 have mol files)
  • 12,419,715 activities
  • 1,042,374 assays
  • 9,414 targets
  • 53,298 documents

The web front end at is now connected to the ChEMBL 18 data, but you can also download the data from the ChEMBL ftpsite. Please see ChEMBL_18 release notes for full details of all changes in this release.

Changes since the last release


New data sets


The ChEMBL_18 release includes the following new datasets:
  • University of Vienna G-glycoprotein (pgp) screening data
  • UCSF MMV Malaria Box screening data
  • DNDi Trypanosoma cruzi screening data
  • DrugMatrix in vivo toxicology data
In addition, 43,335 new compound records from 2015 publications in the primary literature have been added to this release. Approved drug and usan data have also been updated, with 103 new structures added.


Updates to the protein family classification


A review and update of the ChEMBL protein family classification has been carried out. The main changes are listed below:

  • New ion channel/transporter classification, based on the BPS classification
  • New epigenetic protein classification, based on SGC/ChromoHub classification
  • Modification of kinase classification, to follow Human Kinome classification


Assay classification and ontology mapping


The following annotations and classifications have been added to the ChEMBL assay data:
  • Classification of assay format (e.g., biochemical, cell-based, organism-based) using BioAssay Ontology
  • Classification of endpoints (e.g., IC50, AUC, Ki) using BioAssay Ontology
  • Addition of Physicochemical and Toxicity assay type classification
  • Mapping of assay cell-lines to CLO, EFO and Cellosaurus
  • Mapping of standard units to Unit Ontology and QUDT



Capture of assay parameters


A new table in the database (assay_parameters), is used to capture additional properties of assays such as dose, administration route, time points. These additional parameters are displayed on the Assay Report Card.


Target predictions


Bioactivity data for single protein targets in ChEMBL have been used to train and validate two Naive Bayesian multi-label classifier models (at <= 1uM and <= 10uM bioactivity cutoffs respectively). These models have been subsequently employed to predict biological targets for a set of approved drugs, which are displayed on in the new Target Predictions section of the Compound Report Card, where applicable. Since some of the predictions correspond to compound/target pairs that were included in the training set for the models, these are shown in white, to distinguish them from genuine predictions (coloured light yellow). Only predictions scoring >= 0.2 are included in the result tables. The models were built with open source tools such as RDKit and scikit-learn and are available upon request.

We would appreciate any feedback on this feature, and any further ideas you may have on including predicted data on top of ChEMBL experimental data.


UniChem connectivity mapping


In addition to the standard UniChem cross-references shown on the report card (based on exact InChI Key matching), a new link is included to an expanded view of UniChem cross-references, generated based on InChI connectivity layer matching (e.g., 

This expanded view shows any compounds in UniChem that share the same connectivity as the query structure, even if they have stereochemical, isotopic or protonation state differences. The differences between the query and retrieved structures are shown by their position in the table: the first column shows compounds that match in all InChI layers, while the subsequent columns show those structures that differ in stereochemistry (s column), isotope (i column), protonation state (p column), or various combinations of these layers (final four columns). A button at the top of the table gives the additional option to retrieve compounds that match individual components of a mixture or salt. Where the query structure consists of multiple components, matches to each of these components will be coloured different colours (e.g., black, blue, red). 




The ChEMBL RDF data model has been enhanced and now includes the following information:
  • Drug mechanism of action and binding site information
  • Molecule hierarchy
  • Target relationships
  • Assay format
  • Cell-line information
More information (documentation, SPARQL endpoint and example queries), about the RDF version of the ChEMBL database can be found on the EBI-RDF Platform and you can download the RDF files from the ChEMBL ftpsite.


Web Service Update


Three new Web Service calls focused on approved drugs, mechanism of action and compound forms are now available. Example calls to these methods can be seen below and also please visit the ChEMBL Web Service page for more details.

As always, we greatly appreciate to reporting of any omissions or errors.

The ChEMBL Team

Sunday, 30 March 2014

Catching up on all those GPCR structures that keep being published

I've missed quite a few rhodopsin-like GPCR structures, so catching up on some of this today. Below are representative structures of all the 22 sequence distinct rhodopsin-like GPCRs which are in the public domain as of today. I've done an initial alignment, and the areas the ambiguities are in, are variable between structures, and also within structure sets (same sequence, different ligand, cell, etc). It is incredible that there are now 22 of these, and of course, representative structures of some of the other GPCR superfamilies too.
  1. 4dkl - mouse mu opioid receptor 
  2. 4ej4 - mouse delta opioid receptor
  3. 4djh - human kappa opioid receptor
  4. 4ea3 - human nociceptin receptor
  5. 3odu - human CXCR4 receptor 
  6. 2lnl - human CXCR1 receptor (NMR)
  7. 4mbs - human CCR5 receptor
  8. 3vw7 - human PAR1 receptor
  9. 4ntj - human P2Y12 receptor
  10. 4grv - rat neurotensin receptor
  11. 3uon - human muscarinic M2 receptor 
  12. 4daj - rat muscarinic M3 receptor 
  13. 3rze - human histamine H1 receptor
  14. 2rh1 - human beta-2 adrenergic receptor 
  15. 2vt4 - turkey beta-1 adrenergic receptor 
  16. 4iaq - human 5HT1B receptor
  17. 3pbl - human dopamine D3 receptor
  18. 1ib4 - human 5HT2B receptor
  19. 2ydv - human adenosine A2a receptor 
  20. 3v2w - human sphingosine-1-phosphate receptor
  21. 1u19 - bovine rhodopsin
  22. 2z73 - squid rhodopsin
                           10        20        30        40        50        60        70    
4dkl   (  65 )                                             mvtaitimalYsiVcvvGlfgNflvmyvIvrytk
4ej4   (  41 )                                        rsasslalaiaitalYsavcavGllgNvlvmfgIvrytk
4djhA  (  55 )                                            spaipviitavysvvfvvGlvgNslVmfVIirytk
4ea3A  (  47 )                                            plglkvtIvglYlavcvgGllgNclvmyVIlrhtk
3oduA  (  27 )            pçfre-------------------------enanfnkiflptiYsiIfltGivgNglvilvMgyqkk
2lnl   (  29 )            pÇmle--------------------------tetLnkYvviiayalvFllsllgNslvMlvilysrv
4mbsA  (  19 )            pçqki-------------------------nvkqiaarllpplYslvfifGfvgNmlViliLinykr
3vw7   (  91 )                                     dasgYLtsswLtlfVPsvYtgVfvvSlplNimaivvFilkmk
4ntj   (  16 )            lÇtr---------------------------dykitqvlfPllYtvLffvGlitNglAmriFfqir-
4grvA  (  52 )                                    nsdldVnTdiyskvlvtaiYlalfvvGtvgNsvtlftlark s
3uon   (  20 )                                             tfevvfivlvagslSlvTiigNilVmvSIkvnrh
4dajA  (  64 )                                             iwqvvfiafltgflAlvTiigNilVivAFkvnkq
3rze   (  28 )                                                 mplvvvlsticlvTvglNllVlyAvrserk
2rh1   (  29 )                                            devwvvgmgivmslivlaIvfgNvlVitAIakfer
2vt4A  (  40 )                                               weagmsllmalVvllIvagNvlViaAigstqr
4iaq   (  38 )                                     yiyQdsislpwkvllvmllalitlaTtlsNafViatVyrtrk
3pblA  (  32 )                                                   yalsYcalilaIvfgNglVcmAVlkera
4ib4   (  48 )                                          eeqgnklhwAallilmviipTigGNtlVilAVslekk
2ydv   (   3 )                                             imgssvYitvElaiavlAilgNvlVcwAvwlnsn
3v2w   (  17 )           sdyvnydIIvrHYnyTgklnisa                ltsvvfiliCcfIileNifvlltiwktkk
1u19A  (   1 )            mnGtegpnfyVPfsnktgvVrsPFeapQyyLaepwqFsmlAayMflLimlGfpiNflTlyVTvqHkk
2z73A  (   9 )         etwwyNpsIvVhpHWref--------------dqvpdavYyslGifIgiCgiiGcggNgiViyLFtktks

                      80        90        100       110       120       130       140       150 
4dkl   (  99 )    MktAtniYIfNLAlADalATsTLpfqsvnylmg---tWpfgnilÇkiviSidYyNMFTSIfTLctMSvdRyiAVC
4ej4   (  80 )    LktATniYIfNLAlADalATstLpfqsakylme---tWpfgellÇkaVlSidYyNMFTSIfTLtmMSvDRyiavc
4djhA  (  90 )    mktaTniYIfNLAlADalVTtTMpfqstvylmn---sWpfgdvlÇkiVlsiDyyNMfTSIfTLtmMSvdRyiaVc
4ea3A  (  82 )    mktatNiYIfNLAlADtlVLlTLpfQGtdillg---fWpfgnalÇktVIaiDyyNMFTSTfTLtaMSvdryvaic
3oduA  (  69 )    lrsmtdkYRlhLSvADllFVitLpfWavDAva----nWyfgnflÇkaVHviYTVNlYSSVwILAfISlDRylAiV
2lnl   (  70 )    GrsvTdvyLlnLalaDllfaltlpiwaaSkvn----gwifgtfLÇkvVslLkEvnfYsgilLlacIsvdrylaiv
4mbsA  (  61 )    lksMtdIYLlnLAiSdlfFLlTVpfWahyaaaq----WdfgntmÇqlLTglYFiGFFSgIfFIilLTiDRylaVv
3vw7   ( 133 )    vkkPAVVyMlhLAtADvlFVsvLpfkisYyfsg--SdWqfgselÇrfVtAaFYcnMYASIlLMtvISiDrflAVv
4ntj   (  55 )    sksnFiIFLknTViSDllMIltFpfkilsdakl      lrtfvcqvtsVifyfTMYISIsFLGlITidryqktt
4grvA  (  98 )    lqstvhyHlgsLalSDllILllAMpvElyNFIWvhhpWafgdagÇrgyYflRDactYATAlNVasLSvaRylAic
3uon   (  54 )    LqtvnnyflfSLAcADliiGvfSMnlytlytvi--gyWplgpvvÇdlWlalDYvVSNAsVmNLliiSfdryfcvt
4dajA  (  98 )    LktvnnyFllSLAcADliIGviSMnlFttyiim--nrWalgnlaÇdlwLSiDYvASNAsVmNLlvISfDryfsit
3rze   (  58 )    LhtvGnlYIvsLSvADliVGavVMpmnilyllm--skwsLgrplÇlfWLSmDYVASTASIfSVfiLCiDryrsvq
2rh1   (  64 )    LqtvtnyFItsLAcADlvMGlaVVpfgaahilm--kmWtfgnfwçefWTSiDVlCVTASIeTLcvIAvdryfAIt
2vt4A  (  72 )    LqtltnlFItsLAcADlvvGllVVpfgatlvvr--gtWlwgsflçelWTSlDVlCVTAsIeTLcvIAiDrylait
4iaq   (  80 )    LhtpanyLiasLAvTDllVSilVMpiStmytvt--grWtlgqvvÇdfWlssDItCCTASIwHLCviAldrywait
3pblA  (  60 )    LqtttnyLVvsLAvADllvAtlVMpwvvylevt-ggvWnfsricÇdvFVTlDVmMcTAsIwNLCaISidRytAVv
4ib4   (  85 )    LqyatnyFlmsLAvADllVGlfVMpiaLltimf-eamWplplvlÇpawLflDVlfSTASIwHLCaIsvdryiaIk
2ydv   (  37 )    LqnvtnyFVvsAAaADilVGvlAIpfaiaIst----GfçaaçhgÇLfiACfVLVLTASSIfSLlaIAiDryiair
3v2w   (  76 )    FhrpMYyFIgnLAlSDllaGvaYtaNlllsga---tTykLtPaqWFlREGsMFvALSASVfSLlaIAieryitml
1u19A  (  68 )    LrtplNyILlnLAvADlfMVfg-GFtTTlyTSl-hGyFvfgptGÇnlEGffATLGGEIaLWSLvvLaieRyvvVc
2z73A  (  65 )    LqtpanmFiinLAfSDftFSlvNGfplMtiSCf-lkkWifgfaaÇkvYGfiGGiFGFMsIMTMAMiSiDrynViG
                     aaaaaaaaaaaaaaaaaa aaaaaaaaa          aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

                           160       170       180       190       200       210       220   
4dkl   ( 171 )    hpvkaldf---rtprnAkivnvcNwilSsaiGlpVmfmAttkyrqg--------------sidçtltfsh-ptwy
4ej4   ( 152 )    hpvkaldf---rtpakAklinicIwvlAsgvGvpimvmAvtqprdg--------------avvÇmlqfps-pswy
4djhA  ( 162 )    hpvkaldf---rtplkAkiinicIwllSssvGisAivlGGtkvred------------vdvieÇslqFpdddysw
4ea3A  ( 154 )    hp          tsskAqavnvaIwalAsvvGvpvaimGsAqvede--------------eieÇlveipt-pqdy
3oduA  ( 140 )    hatn---sqrprkllAekvVyvgVwipAlllT-ipDfif--Anvsead-----------dryiÇdrfyp---ndl
2lnl   ( 141 )    haTr----tltqkrhlvkfvclgcwglsmnlS-lpFflf--RQayhpN----------NsSPvÇyEVlg-ndtak
4mbsA  ( 132 )    havfAlka---rtvtfGvvtsvitwvvAvfaS-lpNiif--Trsqkeg-----------lhytÇsshfpysqyqf
3vw7   ( 206 )    ypm        rtlgrAsftClaiwalAiagV-vpLllkeQtiqvpg-----------lgitTçhdvlsetLleg
4ntj   ( 128 )    rpfkt      knllgAkilsvviwafMfllS-lpNmil                              ksefgl
4grvA  ( 173 )    hpfkaktl---msrsrtkkfisaIwlaSallAipMlftMGlqnrSadg--------thpgGlVÇTPiv----dta
3uon   ( 127 )    kpltypvk---rttkmAgmmiaaAwvlSfilwapaIlfwqfivg-----------vrtVedgeÇyIqff------
4dajA  ( 171 )    rpltyrak---rttkrAgvmiglAwviSfvlWApaIlfwqyfvg-----------krtVppgeÇfIqfl------
3rze   ( 131 )    qplrylky---rtktrAsatilgawflSfl-WvipIlgwnh                 rredkÇeTdfy------
2rh1   ( 137 )    spfkyqSl---ltknkArviilmvwivSgltSflpIqmhwyr-----athqeAinÇyae-etçÇdff--------
2vt4A  ( 145 )    spfryqsl---mtrarAkviictvwaiSalvSflpImmhwWr-----dedpqAlkçyqd-pgçÇdfv--------
4iaq   ( 153 )    daveysak---rtpkraavmialvwvfSisISl-pPffwrqa                   seÇvvntd------
3pblA  ( 134 )    mpvhyqhgtgqsscrrValmitavwvlAfaVSc-pLlfgfNtTg---------------dptvÇsIs--------
4ib4   ( 159 )    kpiqanqy---nsratAfikitvVwliSigiAi-pVpikgiet              npnnitÇvLtke------
2ydv   ( 108 )    iplryngl---vtgtrAkgiiaicwvlSfaIGltPmlgwnnÇgqp--kegkahsqgÇgegqvAÇlFedVV-----
3v2w   ( 148 )    k           nnfrlfllisacwviSlilGglPimgwn----------------ÇisalssÇSTVLP------
1u19A  ( 141 )    kpmsn----frfgenhaimgvafTwvmAlaCAapPlvgwSrYIPE-------------GMQCSÇGIDYYTpheet
2z73A  ( 139 )    rpmaas---kkMshrrAfimiifVwlwSvlwAigPifgwGaYtLE-------------GVLCNÇSFdYIsr--ds
                               aaaaaaaaaaaaaaaaaaa  aaa                                      

                      230       240       250       260       270       280       290       300 
4dkl   ( 228 )    wenllKicVfifAfimPvliItvcyglmilrlksvr                   ekdrnlrritrMVlvVvavF
4ej4   ( 209 )    wdtvtkicvflfAfvvPiliitvcyglMllrlrsvr                   ekdrslrriTrMVlvVvgaF
4djhA  ( 222 )    wdlfmkicVfifAfviPvliIivcytlMilrlksvrllsg              rekdrnlrritrLVlvVVavF
4ea3A  ( 211 )    wgpvfaiciflfSFivPvlvIsvcyslMirrlrgvrlls-------------gsrekdrnlrritrLVlvVvavF
3oduA  ( 195 )    wvvvfqfqhimvglilPgivIlsCyciIisklshs                     kghqkrkalktTviLilaF
2lnl   ( 198 )    wrmvLrilPHtfGfivplfvmlfcygftlrtlf---------------------kahmgqkhrAmrvIfaVvlif
4mbsA  ( 190 )    wknfQTlkIVilGlvlPllvmvicysgIlktllr                       ekkrhrdvrlIftIMivY
3vw7   ( 266 )    yyayyfsafSavfFfvpliiStvCyvsIirclsssa                   anrskksrAlfLSaaVfcIF
4ntj   ( 185 )    vwheiVnyiCqviFwinfliVivcYtlItkelyrsyvrt              rgvgkv rkkvnvkvfiIiaVF
4grvA  ( 233 )    tvkvvIqvNtfmSFlfPmlvIsilNtvIAnkLtvmv                     vqalrhGVlvAraVviaf
3uon   ( 182 )    snaavtfgtAiaaFylpviiMtvlywhisrasksri                   pppsrekkvtrtilaIllaF
4dajA  ( 226 )    septitfgtAiaaFymPvtiMtilywrIyketek                       like   aqTlsaIllaF
3rze   ( 186 )    dvtwfkvmtaiinFylPtllMlwfyakIykaVrqhc                   lhmnrerkaakQLgfIMaaF
2rh1   ( 195 )    TnqayaiasSivSFyvplviMvfvYsrVfqeakrql                   kfclkeHkaLktlgiIMgtF
2vt4A  ( 203 )    TnrayaiasSiiSFyipLliMifvalrvyreakeq                       irehkalktlgiImgvF
4iaq   ( 205 )    -hilytvySTvgAFyfPtllLialygrIyvearsri                   lmaarerkaTktLgiIlgaF
3pblA  ( 185 )    -npdFViySSvvSFylPfgvTvlvyarIyvvlkqrrrk-----------------gvplrekkatqMVaiVlgaF
4ib4   ( 213 )    rFgdfMlfgSlaAFftPlaiMivtyfltihALqkka                   qtisneqraskvlgivFflF
2ydv   ( 173 )    pmnYMVyfNffaCVlvPlllMlgvylrIflaarrqlkqmesq             stlqkevhaakSLaiIvglF
3v2w   ( 197 )    LYhkhYIlfCTtvFtllllsIvilYcriyslvrtr                   asrssenvaLlkTViiVLsvF
1u19A  ( 199 )    nNesFViyMfvvHfiiPlivIffcygqLvftvkeaaaq------------qqesattqkaekevTrMviiMviaF
2z73A  ( 196 )    ttrsNIlcMFilGffgPiliiffCyfnIvmsvsnhekemaamakrlnakelrkaqaganaemrlAkIsivIVsqF
                    aaaaaaaaaaa aaaaaaaaaaaaaaaaa                            aaaaaaaaaaaaaaaa

                           310       320       330       340       350       360       370   
4dkl   ( 290 )    ivcWtpIHiyViikaliti--------pettfqtvswhfcialGYtNSclNpvlYafldenFkrCfrefci    
4ej4   ( 271 )    vvCWapIHifVivwtlvdi-------nrrdplvvaalhlcialGYaNSslNpvlYaflDenfkrc          
4djhA  ( 284 )    vvcWtpIHifilvealgs             aalssyyfcIalGytNSslNPilYafldenFkrcfrdfcfp   
4ea3A  ( 273 )    vgcWtpVQvfvlaqglgvq--------pssetavailrfctAlGYvNSclNpilYafldenFkacfr        
3oduA  ( 249 )    facWlpyyigisidsfillei-ikqgçefentvhkwisitEAlAFfHCclNpilyaflgakfktsaqhalts   
2lnl   ( 252 )    llcwlpynlvlLadTlmrtq-viqeeRrNnIGraLdatEilGflhsclnpiiyafigqnfrhgflkilamhg 
4mbsA  ( 245 )    flfWapYNivLllnTfqeff--glnnçsSsnrldqamqvtetlGMtHCciNpiiYafvgeefrnyllvffq    
3vw7   ( 323 )    iiCFgpTNvlLiaHYsflsh------tstteaAYfaYLlcvCvSSiSCciDplIyyyAssec             
4ntj   ( 246 )    fiCFvpFHfaripytlsqtr--dvfdçtaentlfyvkestlwlTSlNAClNpfIYfflcksFrnslism      
4grvA  ( 318 )    vvcWlpYHvRRlmFCyisdeq---WttflFdfYHyfYmlTNalAYasSAinpilYnlvsanFrqv          
3uon   ( 397 )    iitWapYNvmVlintfçap--------ç---ipntvwtiGywlCYinstiNpacYalcnatFkktfkhllm    
4dajA  ( 500 )    iitWtpyNimVlvntfçds--------ç---ipktywnlgywlCYiNStvNPvcYalcnktFrttfkt       
3rze   ( 425 )    ilCWipYFiffmviafçkn--------ç---cnehlhmftiWlGYiNStlNPliYplCnenFkktfkrilhi   
2rh1   ( 283 )    tlcWlpFFiVNivhviqdn-----------lirkevyillNwiGYvNSgfNpliYc-rspdfriAfqellcl   
2vt4A  ( 300 )    tlCWlpFFlvnivnvfnrd-----------lvpdwlfvafnwlGYAnSAmnpiiYc-rspdfrkAfkrlla    
4iaq   ( 324 )    ivCWlpFFiiSlvmpi                hlaifdffTwlGYlNSliNPiiYtmsnedFkqafhklirfk  
3pblA  ( 339 )    ivCWlpFFltHvlnthçqt--------ç--hvspelysattwlGYvNsalNPviYttfnieFrkAflkilsc   
4ib4   ( 334 )    llmWcpFFitNitLvlçds--------çnqttlqmlLeifvWiGYvSSGvNPlvyTlfnktFrdAfgrYitcnyr
2ydv   ( 243 )    alCWlpLHiiNcftffçpd--------ç-shaplwlMylAivlSHtNSvvNPfiyAyrireFrqTFrkiirshvl
3v2w   ( 266 )    iacwapLFiLLllDvgçkvk------tç---diLfrAeyfLvlAvlNSgtNPiiytltNkemrrafiri      
1u19A  ( 262 )    liCWlpYAgvAfyIfthqgsd----------fgpifMTipAFfAKtSAvyNPviYimmnkqFrnCmvttlccgkn
2z73A  ( 271 )    llSWspYAvvAllAQfgplew----------VtpyaAQlpVMfAKaSaihNPmiYsvsHpkFreAIsqtfpwvLt
                  aaaaaaaaaaaaaaa                  aaaaaaaaaaaaa   aaaaaaaa  aaaaaaaaa       

                      380       390       400       410 
2ydv   ( 309 )    rqqepfkaa                             
1u19A  ( 327 )    plgddeasttVsktetsqvapa                
2z73A  ( 336 )    ccqfddketeddkdaeteipage               

Event: Seminar on Allosteric Drug Design, April 2014

On April 30th 2014 the University of Strathclyde will host a seminar on Allosteric Drug Design organised by the collaboration. The goal of the seminar is to bring together academic and industrial researchers with an interest in allosteric drug design and development with a view to identifying future collaborative and funding opportunities.

The seminar consists of a series of three talks by : Dr. Gerard JP van Westen (EMBL-EBI ; ChEMBL), Prof. Dr. Leonardo Scapozza (University of Geneva) and Dr. Laurent Galibert (Alpine Institute for Drug Discovery). The talks will cover various of drug design in relation to allosteric drug targets. Talks are aimed at a broad audience.

To register and for further information please go to

Thursday, 20 March 2014

New Drug Approvals 2014 - Pt. III - Droxidopa (Northera ™)

ATC Code: Unavailable
Wikipedia: Droxidopa

On February 18th the FDA approved Droxidopa (tarde name Northera™) for the treatment of neurogenic orthostatic hypotension (NOH). NOH is a rare, chronic and often debilitating drop in blood pressure upon standing, and is associated with Parkinson's disease, multiple-system atrophy, and pure autonomic failure. Symptoms of NOH include dizziness, light-headedness, blurred vision, fatigue and fainting when a person stands. 

Droxidopa (also known as L-DOPS, L-threo-dihydroxyphenylserine, and SM-5688) is a prodrug which can be converted to norepinephrine (noradrenaline) by Aromatic L-amino acid decarboxylase (Uniprot P20711 ; EC Norepinephrine in turn can be converted to epinephrine by Phenylethanolamine N-methyltransferase ( Uniprot P11086 ). Droxidopa can cross the blood brain barrier, contrary to epinephrine and norepinephrine.  Patients with NOH suffer from depleted levels of epinephrine and norepinephrine. Droxidopa increases the levels of both in the peripheral nervous system and leads to an increased heart rate and blood pressure.

Droxidopa (CHEMBL2103827Pubchem : 92974 ) is a small molecule drug with a molecular weight of 213.2 Da, an AlogP of -2.92, 3 rotatable bonds, and no rule of 5 violations.

Canonical SMILES : N[C@@H]([C@H](O)c1ccc(O)c(O)c1)C(=O)O
InChi: InChI=1S/C9H11NO5/c10-7(9(14)15)8(13)4-1-2-5(11)6(12)3-4/h1-3,7-8,11-13H,10H2,(H,14,15)/t7-,8+/m0/s1

Droxidopa starting dose is 100mg three times daily (which can be titrated to a maximum of 600 mg three times daily). One dose should be taken in late afternoon at least 3 hours prior to bedtime to reduce the potential for supine hypertension during sleep.

Neuroleptic malignant syndrome (NMS) has been reported with Droxidopa use during post-marketing surveillance in Japan. NMS is an uncommon but life-threatening syndrome characterized by fever or hyperthermia, muscle rigidity, involuntary movements, altered consciousness, and mental status changes.

Ischemic Heart Disease, Arrhythmias, and Congestive Heart Failure
Droxidopa may exacerbate existing ischemic heart disease, arrhythmias and congestive heart failure.

Cmax of droxidopa were reached by 1 - 4 hours post-dose in healthy volunteers. High-fat meals have a moderate impact on droxidopa exposure with Cmax and AUC decreasing by 35% and 20% respectively, and delaying Cmax by approximately 2 hours.

Droxidopa exhibits plasma protein binding of 75% at 100 ng/mL and 26% at 10,000 ng/mL with an apparent volume of distribution of about 200 L.

The metabolism of droxidopa is mediated by catecholamine pathway and not through the cytochrome P450 system. Plasma norepinephrine levels peak within 3 to 4 hours (generally < 1 ng/mL) and variable with no consistent relationship with dose. The contribution of the metabolites of droxidopa other than norepinephrine to its pharmacological effects is not well understood.

The mean elimination half-life of droxidopa is 2.5 hours. The major route of elimination of droxidopa and its metabolites is via the kidneys.

Drug Interactions 
No dedicated drug-drug interaction studies were performed for droxidopa. Carbidopa, a peripheral dopa-decarboxylase inhibitor, could prevent the conversion of droxidopa to norepinephrine outside of the central nervous system (CNS).

L-DOPA/dopa-decarboxylase inhibitor combination drugs decreased clearance of droxidopa, increased AUC to droxidopa approximately 100%, and increased exposure to 3-OM-DOPS of approximately 50%. However, it was found that the decreased clearance was not associated with a significant need for a different treatment dose or increases in associated adverse events.

Dopamine agonists, amantadine derivatives, and MAO-B inhibitors do not appear to effect droxidopa clearance, no dose adjustments are required. 

Droxidopa is classified as pregnancy category C. There are no adequate and well controlled trials in pregnant women.

The license holder is Chelsea Therapeutics, the prescribing information can be found here.

Wednesday, 19 March 2014

New Drug Approvals 2013 - Pt. XXX - Umeclidinium bromide and Vilanterol (Anoro Ellipta™)

ATC codeR03AL03
WikipediaUmeclidinium bromide (and vilanterol)

ChEMBLCHEMBL1187833 (and CHEMBL1198857)

On December 18, 2013, the FDA approved Anoro Ellipta for the once-daily, long-term maintenance treatment of airflow obstruction in patients with obstructive pulmonary disease (COPD). Anoro is a combination of umeclidinium (62.5 mcg - more details below) and vilanterol inhalation powder (25 mcg - already approved in a different formulation). Ellipta is the single inhaler device:

The majority of COPD cases are due to cigarette smoking and this lung disease is a leading cause of death in the United States. Patients affected by COPD experience breathing difficulties worsening with the time as well as chronic cough and chest tightness.

Umeclidinium (also known as umeclidinium bromide, GSK573719A and GSK573719) is a small molecule with a molecular weight of 428.6 Da and AlogP of 3.34, 8 rotatable bounds and no Lipinski's rule of five violation.

Molecular formula: C29H34NO2
Canonical SMILES: OC(c1ccccc1)(c2ccccc2)C34CC[N+](CCOCc5ccccc5)(CC3)CC4
Standard InChI: InChI=1S/C29H34NO2/c31-29(26-12-6-2-7-13-26,27-14-8-3-9-15-27)28-16-19-30(20-17-28,21-18-28)22-23-32-24-25-10-4-1-5-11-25/h1-15,31H,16-24H2/q+1
Alternate form of the molecule in ChEMBL: CHEMBL523299

Mechanism of action

Anoro Ellipta relaxes the muscles located around the airways of the lung to increase the airflow in patients. This mechanism of action is mediated via umeclidinium, anticholinergic stopping muscle tightening in combination with vilanterol, a long-acting beta2-adrenergic agonist (LABA).

Safety information

The phase III trials for Anoro Ellipta included seven clinical studies, involving around 6,000 patients with COPD. The mainly reported side-effect were narrowing and obstruction of the respiratory airway (paradoxical bronchospasm), cardiovascular effects, increased pressure in the eyes (acute narrow-angle glaucoma), and worsening of urinary retention.

Note that Anoro Ellipta is not indicated for the treatment of asthma and displays a boxed warning for this indication.

Anoro Ellipta is manufactured by GlaxoSmithKline, Research Triangle Park, N.C.

Tuesday, 18 March 2014

New Drug Approvals 2014 - Pt. I Elosulfase Alfa (Vimizim™)

 ATC code: A16AB12

On February 14, 2014, the FDA approved elosulfase alfa for the treatment of Mucopolysaccharidosis Type IVA (Morquio A syndrome). Elosulfase alfa is intended to replace the missing GALNS enzyme involved in an important metabolic pathway. Absence of this enzyme leads to problems with bone development, growth and mobility.

Mucopolysaccharidoses comprise a group of lysosomal storage disorders caused by the deficiency of
specific lysosomal enzymes required for the catabolism of glycosaminoglycans (GAG). Mucopolysaccharidosis IVA (MPS IVA, Morquio A Syndrome) is characterized by the absence or marked reduction in N-acetylgalactosamine-6-sulfatase activity. The sulfatase activity deficiency resultsin the accumulation of the GAG substrates, KS and C6S, in the lysosomal compartment of cells throughout the body. The accumulation leads to widespread cellular, tissue, and organ dysfunction. It is a rare autosomal recessive disease, affecting approximately 800 people in the US, and significantly shortens life expectancy, with most patients dying at an early age. Sulfonase alfa is the first approved treatment for Morquio A syndrome.

Elosulfase alfa is intended to provide the exogenous enzyme N-acetylgalactosamine-6-sulfatase that will be taken up into the lysosomes and increase the catabolism of the GAGs KS and C6S. Elosulfase alfa uptake by cells into lysosomes is mediated by the binding of mannose-6-phosphate-terminated oligosaccharide chains of elosulfase alfa to mannose-6-phosphate receptors.

N-acetylgalactosamine-6-sulfatase homodimer (from PDB 4FDI)
Elosulfase alfa is a soluble glycosylated dimeric protein with two oligosaccharide chains per monomer. Each monomeric peptide chain contains 496 amino acids and has an approximate molecular mass of 55 kDa (59 kDa including the oligosaccharides). One of the oligosaccharide chains contains bis-mannose­ 6-phosphate (bisM6P). bisM6P binds a receptor at the cell surface and the binding mediates cellular uptake of the protein to the lysosome. 

Its sequence is the following:


The recommended dose is 2mg per kg given intravenously over a minimum range of 3.5 to 4.5 hours, based on infusion volume, once every week. Pre-treatment with antihistamines with or without antipyretics is recommended 30 to 60 minutes prior to the start of the infusion. The mean AUC0-t at first administration is 238 min x μg/mL, but increases to 577 by week 22 of treatment, likely due to the development of neutralising antibodies. The mean elimination half-life likewise was measured as 7.52 min at first dosage, and 35.9 min at week 22 of treatment.

Elosulfase alfa comes with a boxed warning for potentially life-threatening anaphylactic reactions in some patients.

The license holder for Vimizim™is BioMarin, and the full prescribing information can be found here.

New Drug Approvals 2014 - Pt. II - Tasimelteon (HetliozTM)

ATC Code: N05CH
Wikipedia: Tasimelteon

On January 31st 2014, the FDA approved Tasimelteon (Tradename: Hetlioz; Research Code(s): VEC-162, BMS-214778), a melatonin receptor agonist, for the treatment of Non-24-hour sleep-wake disorder (Non-24).

Non-24-hour sleep–wake disorder (Non-24) is a chronic circadian rhythm sleep disorder, mostly affecting blind people. It is characterised by insomnia or excessive sleepiness related to abnormal synchronization between the 24-hour light–dark cycle and the endogenous circadian cycle (slightly longer than 24 hours). This deviation can be corrected by exposure to solar light, which resets the internal clock, however, the loss of photic input, and the absence of light perception in the majority of patients, prevents them from drifting back into normal alignment.

Tasimelteon is an agonist at melatonin MT1 and MT2 receptors, with a relative greater affinity for MT2. These receptors are thought to be involved in the control of circadian rhythms, consequently, the binding of tasimelteon to these receptors, and the resulting induced somnolence, is believed to be the mechanism by which tasimelteon aids in the synchronisation of the internal circadian clock with the 24-hour light–dark cycle.

Melatonin receptors (Uniprot accession: P48039 and P49286; ChEMBL ID: CHEMBL2094268) are members of the G-protein coupled receptor 1 family. There are no known 3D structures for these particular proteins though, however there are now several relevant homologous structures of other members of the family (see here for a current list of representative rhodopsin-like GPCR structures).

The -melteon USAN/INN stem covers selective melatonin receptor agonists. Tasimelteon is the second approved agent in this class, following the approval of Takeda's Ramelteon in 2005. Contrary to its predecessor, tasimelteon is not currently indicated to treat insomnia, and has received orphan-product designation by the FDA. Agomelatine is another member of this class, but only approved in Europe (PMID: 18673165).

Tasimelteon (IUPAC Name: N-[[(1R,2R)-2-(2,3-dihydro-1-benzofuran-4-yl)cyclopropyl]methyl]propanamide; Canonical smiles: CCC(=O)NC[C@@H]1C[C@H]1c2cccc3OCCc23; ChEMBL: CHEMBL2103822; PubChem: 10220503; ChemSpider: 8395995; Standard InChI Key: PTOIAAWZLUQTIO-GXFFZTMASA-N) is a synthetic small molecule , with a molecular weight of 245.3 Da, 2 hydrogen bond acceptors, 1 hydrogen bond donor, and has an ALogP of 2.2. The compound is therefore fully compliant with the rule of five.

Tasimelteon is available as oral capsules and the recommended daily dose is one single capsule of 20 mg, taken before bedtime, at the same time every night. The peak concentration (Cmax) is reached at 0.5 to 3 hours after fasted oral administration, and at steady-state in young healthy subjects, the apparent oral volume of distribution (Vd/F) is approximately 56-126 L. Tasimelteon should not be administered with food, since food decreases its bioavailability, lowering the Cmax by 44%, and delaying the Tmax by approximately 1.75 hours. At therapeutic concentrations, tasimelteon is strongly bound to plasma proteins (90%).

The primary enzymatic systems involved in the biotransformation of tasimelteon in the liver are CYP1A2 and CYP3A4. Therefore, co-administration of tasimelteon with inhibitors of CYP1A2 and CYP3A4 or inducers of CYP3A4 may significantly alter the plasma concentration of tasimelteon. Metabolism of tasimelteon consists primarily of oxidation at multiple sites and oxidative dealkylation resulting in opening of the dihydrofuran ring followed by further oxidation to give a carboxylic acid. Phenolic glucuronidation is the major phase II metabolic route. Following oral administration of radiolabeled tasimelteon, 80% of total radioactivity is excreted in urine and approximately 4% in feces. The mean elimination half-life (t1/2) for tasimelteon is 1.3 ± 0.4 hours.

The license holder for HetliozTM is Vanda Pharmaceuticals, and the full prescribing information can be found here.