ChEMBL Resources

Resources:
ChEMBL
|
SureChEMBL
|
ChEMBL-NTD
|
ChEMBL-Malaria
|
The SARfaris: GPCR, Kinase, ADME
|
UniChem
|
DrugEBIlity
|
ECBD

Thursday, 26 February 2015

Using the New ChEMBL Web Services



As promised in our earlier post, here are some more details on making the most of the new ChEMBL web services. The best place to get started is to head over to the  documentation page: https://www.ebi.ac.uk/chembl/api/data/docs. There you will find the list of resources (e.g. Molecule, Target and Assay) that are available and their methods. More importantly you can also execute each method with your own or default parameters, and view the URL, the response content and response status code. This is definitely the quickest way to start familiarizing yourself with the new ChEMBL web services.

Looking at the resources in more detail, you will find that each resource has three basic methods:

1. https://www.ebi.ac.uk/chembl/api/data/RESOURCE - will return all available objects of type RESOURCE from ChEMBL. An example could be https://www.ebi.ac.uk/chembl/api/data/molecule which returns all molecules (remember that data is paginated - more on this later).

2. https://www.ebi.ac.uk/chembl/api/data/RESOURCE/ID - will return a single object of type RESOURCE, identified by ID. For some resources, there can be more than one type of ID, for example the Molecule resource will accept:

Note that all three examples above will return the same molecule. The same identifiers are valid for 'substructure' and 'similarity' resources.

Some resources can return more than one object for given ID, for example 'atc_class' will accept any ATC Level as it's ID. Only level 5 is unique across atc_class objects so https://www.ebi.ac.uk/chembl/api/data/atc_class/D05AX05 will return a single resource, but https://www.ebi.ac.uk/chembl/api/data/atc_class/D will return a list of dermatologicals.

3. https://www.ebi.ac.uk/chembl/api/data/RESOURCE/set/ID_1;ID_2;...;ID_N - will return N objects of type RESOURCE with IDs in set (ID_1, ID_2,...,ID_N), for example: https://www.ebi.ac.uk/chembl/api/data/molecule/set/CHEMBL900;CHEMBL1096643;CHEMBL1490 (note that ID separator is semicolon ';').


Format


If you do not specify a format, the default serialisation is XML. Other available formats include JSON and JSONP. There are three ways to provide format information:
  1. Extension, for example https://www.ebi.ac.uk/chembl/api/data/molecule/CHEMBL1.json
  2. Format parameter, for example https://www.ebi.ac.uk/chembl/api/data/molecule?format=json
  3. Accept header, for example:


For images the available formats are:
  1. png (https://www.ebi.ac.uk/chembl/api/data/image/CHEMBL1.png)
  2. svg (https://www.ebi.ac.uk/chembl/api/data/image/CHEMBL1.svg)
  3. json (https://www.ebi.ac.uk/chembl/api/data/image/CHEMBL1.json)
For png images, please remember that the default chemical rendering uses RDKit but you can always switch to Indigo like this:

https://www.ebi.ac.uk/chembl/api/data/image/CHEMBL1.png?engine=indigo

Changing size works by adding 'dimensions' parameter:

https://www.ebi.ac.uk/chembl/api/data/image/CHEMBL1.png?engine=indigo&dimensions=100

The last thing worth noting about image format is that when you add X-Requested-With: XMLHttpRequest header to your request (i.e. you make an Ajax call), the resulting image will be base64-encoded. For example to make ajax call using jQuery library and render result as image you can use this code:



Alternatively, if you don't want to explicitly include the header in your jQuery code, you can use crossDomain: false parameter (yes, you are doing a cross-domain request, but we do support CORS so this will work as discussed here):


 

Pagination


When you make the following request https://www.ebi.ac.uk/chembl/api/data/molecule only the first 20 molecules will be returned. This corresponds to the first page of the molecule result set being requested. Pagination has been introduced to help reduce server load and also protect us from inadvertent DDoS attacks. It also allows clients to quickly obtain a portion of data without having to wait for the full data set. The most important page parameters are limit and offset, which are illustrated in the image below:


The red border presents a page of limit=10 and offset=10. Limit is a maximum number of objects on single page. Offset is a distance between the first element in result set and the first element in page. Please note, that objects are indexed staring from 0. The default limit is 20 and default offset is 0 and this is why accessing https://www.ebi.ac.uk/chembl/api/data/molecule provides first 20 elements. You can increase page size by providing bigger limit parameter, however the  maximum allowed limit value is 1000.

All paginated results come in an 'envelope', which contains a resource object section and page metadata section. An example molecule page in json format looks like:


As can be seen, a 'page' of 20 molecule objects is stored in an array called 'molecules'. A 'page_meta' block provides information on page limit and offset. The 'page_meta' block also provides links (when available), to the previous and next pages and the total object count. In this example you can see that there are 1,463,270 compounds available in ChEMBL. The same information viewed in XML format:


Filtering

 

Filtering can be complex so let's start with the example. In the Molecule resource, there is a 'max_phase' numeric field. This is the maximum phase of development reached by a molecule. 4 is the highest phase and this means the molecule has been approved by the relevant regulatory body, such as the FDA. So let's select all approved drugs by adding a filter to the max_phase field:

https://www.ebi.ac.uk/chembl/api/data/molecule.json?max_phase=4

OK, so now we know that the filter can be passed as parameters and the simplest form is <field_name>=<value>, which means that we expect to get only items with field_name exactly matching the specified value. Right, let's add another filter. Inside a Molecule resource, we can find 'molecule_properties' object nested. One of the properties is a number of aromatic rings, so let's select compounds with at least two:

https://www.ebi.ac.uk/chembl/api/data/molecule.json?max_phase=4&molecule_properties__aromatic_rings__gte=2

Now we see, that many filters can be joined together using '&' sign. If the filter applies to the nested attribute we have to provide the name of the nested object first, followed by the name of the attribute, using double underscore '__' as a separator. Because we don't want to have an exact match, we have to explicitly specify the name of the relation, in our case 'greater then or equal' (gte). There are many other types of relations we can use in filters:

Filter Types
Description
Example
exact (iexact)
Exact match with query
contains (icontains)
Wild card search with query
startswith (istartswith)
Starts with query
endswith (iendswith)
Ends with query
regex (iregex)
Regular expression query
gt (gte)
Greater than (or equal)
lt (lte)
Less than (or equal)
range
Within a range of values
in
Appears within list of query values
isnull
Field is null

Please note, that you can't use every relation for every type. For example regex matching is not allowed on numeric fields and ordering is forbidden on text. If you want to check, which filters can be applied to which resource, you can take a look at the resource schema, for example the 'molecule' resource, schema is available here:

https://www.ebi.ac.uk/chembl/api/data/molecule/schema.json

Ordering


Ordering is similar to filtering. As an example let's sort molecules by their molecular weight. There is a field called 'full_mwt' inside molecule properties, so ordering will look like this:

https://www.ebi.ac.uk/chembl/api/data/molecule.json?order_by=molecule_properties__full_mwt

And as we see, Helium is our first compound. In order to sort, we just have to add 'order_by' parameter with the value being the name of the field, prefixed with all intermediate nested objects. By default we sort ascending. To reverse this order and get molecules sorted from the heaviest to the lightest we have to add minus '-' sign before field name like this:

https://www.ebi.ac.uk/chembl/api/data/molecule.json?molecule_properties__isnull=false&order_by=-molecule_properties__full_mwt

This time the first element is a very heavy compound. Note that we had to add a filter to eliminate compounds without specified weight, otherwise, they will stick to the top of the results (This is because NULLS FIRST is the default for descending order in Oracle DB which we are using in production as described here). We can have multiple 'order_by' params in the URL like here:

https://www.ebi.ac.uk/chembl/api/data/molecule.json?order_by=molecule_properties__aromatic_rings&order_by=-molecule_properties__full_mwt

In which case molecules will be first sorted by the number of aromatic rings in ascending order, followed by molecular weight in descending order.

Filtering and ordering can be mixed together, but we leave it as an example for the reader.

Equivalent Web Service Requests 


We will continue to support the 'old' web services until the end of the year. To help users with the upcoming migration process, the table below provides a mapping between the example web services requests found in the old documentation to the equivalent call in the new web services. It will be up to the end user to handle the different response format and the pagination of the returned data.


Description
Old Web Service URL
New Web Service URL
Check API status
Get compound by ChEMBLID
Get compound by Standard InChiKey
Get list of compounds matching Canonical SMILES
Get list of compounds containing the substructure represented by a given Canonical SMILES
Get list of compounds similar to the one represented by a given Canonical SMILES, at a given cutoff percentage
Get image of a ChEMBL compound by ChEMBLID
Get individual compound bioactivities
Get alternative compound forms (e.g. parent and salts) of a compound
Get mechanism of action details for compound (where compound is a drug)
Get all targets
Get target by ChEMBLID
Get target by UniProt Accession Identifier
Get individual target bioactivities
Get approved drugs for target
Get assay by ChEMBLID
Get individual assay bioactivities

ChEMBL Targets and UniProt Accessions


It is also worth noting that requests for ChEMBL targets by UniProt accessions is handled slightly differently in the new web services. It is now possible to return complexes, protein families and other target types, which contain the UniProt accession, not just the single protein targets. For example https://www.ebi.ac.uk/chembl/api/data/target?target_components__accession=Q13936&target_type=SINGLE%20PROTEIN, will return the ChEMBL target for Q13936. Removing the target_type filter, https://www.ebi.ac.uk/chembl/api/data/target?target_components__accession=Q13936, will return multiple targets (CHEMBL2095229, CHEMBL2363032 and CHEMBL1940 in ChEMBL release 20).

Expect some more web service related blog posts over the coming weeks and if you have any questions please get in touch.
The ChEMBL Team

 

 




Thursday, 19 February 2015

New ChEMBL Web Services




Following on from our recent ChEMBL 20 release we are pleased to announce an updated version of the ChEMBL web services. First things first, some of the most important bits of information:
  • You can explore new resources using online documentation available here: https://www.ebi.ac.uk/chembl/api/data/docs
  • The code is Apache 2.0 licensed and available on GitHub: https://github.com/chembl/chembl_webservices_2
  • The basic URL for accessing web services is https://www.ebi.ac.uk/chembl/api/data/ so in order to retrieve some information about molecules, you would construct following URL: https://www.ebi.ac.uk/chembl/api/data/molecule
  • The 'old' web services are still available, so any applications and code relying on these services will continue to work. We do encourage you to review and migrate any code which uses these services to the updated versions described in this blog post.

 

And now some short Q&A session:

 

Q: But you already have web services available here: https://www.ebi.ac.uk/chemblws/ and they are sooo great, so why release something new?

A: With each new ChEMBL release more and more tables are being added to the underlying database schema, which have not been reflected in the web services for a long time. Additionally we had many suggestions from our users, which we have tried to address where possible. So in order expose more data and provide new functionality, it made sense to release the new and improved ChEMBL web services. The table below summaries the most important differences between old and updated ChEMBL web services: 

Feature
Original ChEMBL Web Services
Updated ChEMBL Web Services
Base URL
https://www.ebi.ac.uk/chemblws
https://www.ebi.ac.uk/chembl/api/data
Number of resources
5
17
Pagination
No
Yes
Filtering
No
Yes
Ordering
No
Yes
Raster Images
Yes
Yes
Vector Images
No
Yes
JSONP support
Yes
Yes
CORS support
Yes
Yes
Online Documentation
Python client library
Yes
Yes
Available REST verbs
GET, POST
GET, POST
Support for SMILES in GET
Partial
Full

As you see, the most important difference are the number of new resources, for example we now  include 'activity', 'cell_line', 'document' and many more. Existing resources (e.g. molecule, target, assay) have also been updated to include many new important attributes.

Another useful feature is filtering, you can now for example retrieve all molecules with a molecule weight less than 300 Da and preferred name ending with 'nib' using following URL:
https://www.ebi.ac.uk/chembl/api/data/molecule?molecule_properties__mw_freebase__lte=300&pref_name__iendswith=nib. You can apply filtering to every resource and filter by most of the fields belonging to given resource so for example targets that contain 'kinase' in pref_name would be: https://www.ebi.ac.uk/chembl/api/data/target?pref_name__contains=kinase. Don't worry if you do not follow the filtering query language, there will be a separate blog post on this topic.

Another new (and much requested) feature is the ability to easily retrieve multiple records from a resource. There are currently many ways to do this (filtering is one of them), but a more intuitive way is to get multiple records in bulk via the list of IDs. This is now possible, for example to retrieve ChEMBL molecules CHEMBL900, CHEMBL1096643 and CHEMBL1490, you can use: https://www.ebi.ac.uk/chembl/api/data/molecule/set/CHEMBL900;CHEMBL1096643;CHEMBL1490.

Q: Are the updated ChEMBL web services compatible with the old ones?
A: No. The URLs patterns and responses returned by the web services have changed, so they are not compatible. We recommend you review any code which currently consumes the old services and look to migrated to the updated version described here.

Q: But I have a lot of code that depends on the old web services! Will it break?
A: No. The old web services are still accessible and have been updated to serve data from ChEMBL 20 release. We will continue to support the old web services, but as part of our deprecation plan, we probably aim to stop supporting them by the end of the year. Please remember that the old code will always be available on GitHub and PyPI, so it is possible to create your own instance.

Q: How can I retrieve large data sets in the updated ChEMBL web services?
A: You can now retrieve all entities of a given resource, for example https://www.ebi.ac.uk/chembl/api/data/molecule returns all molecules in ChEMBL. This is achieved by returning the data in 'pages'. In order to implement paging, data is now wrapped in an envelope(<response> tag for XML output and outermost dictionary for JSON). Inside the envelope you can find collection of objects ('molecules' for molecule resource) and meta information looking like this:



Meta data provides information about current page: limit (how many objects can be on the page) and offset (serial number of the first object), as well as links to the previous and next page and total object count. Using the meta data it is now possible to loop through the large result set.

Q: Speaking about your python client library: will it include the old or the new version? Will it be backwards compatible?
A: Yes, version  0.8.x of the chembl_webresource_client package includes support for the updated web services. In order to access them, use following line of code:


If you list attributes of new_client object you will find that it contains all resources. The usage will be covered in separate blog post. We plan to drop support for the old services in version 0.9.x of the library, so the new_client will become default client.

Q: You claim that new web services supports GET and POST methods but online documentation lists only GET endpoints. I actually tried POST and I'm getting 405 Method now allowed error.
A: In the REST context GET, POST, PUT and DELETE are verbs meaning RETRIEVE, CREATE, UPDATE and DELETE accordingly. Because our web services are read-only we should only allow GET method. But GET has one very big limitation: all parameters (like identifiers, filtering, paging, ordering info) are appended to URL. All web servers (and some browsers) impose limits on URL length. In Apache, this limitation is 4000 characters by default. This means that if you want to get data about some (very) big molecule using SMILES as the identifier, it may happen that the resulting URL will be too long.

On the other hand, POST doesn't have this limitation as parameters are included in request body, not in URL. This is why we allow POST to retrieve data. But to be consistent with REST standard you have to let us now: 'OK, in this request I want to GET some data but I'm using POST just to pass more parameters'. In order to use it, just add header X-HTTP-Method-Override:GET to your POST request and you won't see 405 error anymore.

We hope you find the update useful and if you experience any problems or have any questions please get in touch.

The ChEMBL Team