ChEMBL Resources

Resources:
ChEMBL
|
SureChEMBL
|
ChEMBL-NTD
|
ChEMBL-Malaria
|
The SARfaris: GPCR, Kinase, ADME
|
UniChem
|
DrugEBIlity
|
ECBD

Wednesday, 5 April 2017

Technical internships at ChEMBL

Technical internships at ChEMBL.


Intern-image-960x658.png

We are looking for skilled Computer Science (and related fields) students with strong programming skills to join our team for 3-6 month internships. This is not necessarily a summer internship program, you can start whenever convenient for you after being accepted. Please take a look at some of the research ideas / candidate profiles below:

1. Java programmer -  we are looking for a person with experience in Java to develop a prototype of new KNIME nodes for interacting with the ChEMBL API. Experience with REST and/or KNIME is a plus but not a requirement - you can learn it during your internship. A very important thing to note that you should be excited about UX and creating user-friendly and pragmatic GUIs.

2. C++ programmer - we would like to invite a person passionate about C++ and pattern recognition / image processing to experiment with optimising the open-source OSRA code. OSRA is like OCR but for molecules. We want to make it faster and more accurate.

3. C++ programmer with a graph theory knowledge. Chemical compounds are represented as graphs in-silico. We want to be able to quickly generate random graphs that would also be valid compounds. Experience with distributed computing, computing grids, network file systems and map-reduce is a plus but not required.

4. JavaScript programmer - "any application that can be written in JavaScript, will eventually be written in JavaScript". This is why we are looking for a person with JS experience to experiment with:
  • Creating prototypes of reusable chemical web widgets using polymer.
  • Using emscripten to cross compile some core chemical software written in C++ to JS.
5. A person with a data visualisation skills to explore Kibana and Kibi tools to create beautiful and informative datavis widgets from ChEMBL data.

6. Someone with the Natural Language Processing background to:
  • Create a dictionary of common spelling mistakes in chemistry patents.
  • Create a network of patent relations using textrank algorithm.
  • Explore different approaches to the Named Entity Classification problem.

How to apply?


Just send your CV to kholmes @ ebi.ac.uk with 'ChEMBL Tech Internships' subject.

When to apply?


You can apply anytime but we will only contact selected candidates.

Will all those internships start at the same time?


No, in fact we are planning to select max. 2 most interesting candidates at a given time.

Will I get paid?


The internship is paid 800 GBP per month OR funded by your alma mater (whatever is better for you).

Sunday, 19 March 2017

Finding Compounds in Databases using UniChem




Have you ever identified an interesting compound and wondered what else is known about it?  For example is there any bioactivity data on it in ChEMBL or PubChem?  Is there any toxicity data on it (CompTox)?  Then having found interesting data on a compound wondered if it can be purchased or whether it has been patented.  All this can be done using UniChem.  Interested?  

Come along to our webinar on 29th March at 2pm BST (3pm CEST, 9am EDT)
You will however need to register by emailing chembl-help. Places are limited so please let us know as soon as possible if you register but are then unable to attend.

If you want to know more about UniChem please read on.

UniChem (https://www.ebi.ac.uk/unichem/  is a simple system we have developed to cross-reference compounds across databases both internal to EMBL-EBI and externally. Currently we have cross-references to 140 million compounds in 30 different databases. Information about the sources indexed in UniChem can be found here. UniChem is updated weekly with new compounds from these source databases.

So, for example, you can input a database identifier or an InChIKey into UniChem and see links to all the other indexed databases that have information about that compound.

If we take the drug paroxetine and search for it in UniChem, it is found in 22 databases and the UniChem webpage gives links to the paroxetine entries in those databases.

You don’t have to do this compound by compound using the web interface though.  UniChem has a comprehensive set of  web services that you can use to retrieve data or alternatively all the database files and source to source mapping files are available for download.

UniChem relies on the InChIKey to do the mapping between databases and this works fine if two databases have exactly the same structure for a compound.  We all know however that this isn’t always the case.  Sometimes a different salt or isotope was tested or a mistake was made in the stereocentre assignment meaning the InChIKeys no longer match.

However don’t despair.  UniChem connectivity searching can help. https://www.ebi.ac.uk/unichem/info/widesearchInfo  It turns out that because of the clever way that the InChI is built up with layers, this can be deconstructed and mapping can be done such that the relationship between compounds that differ by stereochemistry, isotopes, protonation state etc can all be identified and mapped. You can do this on single components or mixtures.

Taking our paroxetine example:

We have paroxetine and a number of related compounds in ChEMBL. For example:
Maybe someone wanted to genuinely test these related compounds or maybe they are errors (or a mixture of both).  Whatever the reason by using the UniChem connectivity searching feature we can identify any compounds that match paroxetine on the InChI connectivity layer.
The matches identified from a connectivity search starting with paroxetine can be found here:

At the webinar on 29th March we will describe how this is done in more detail and discuss some use cases.  If you are interested don’t forget to register.

If you want to read more here are links to two papers about UniChem:
Chambers, J., Davies, M., Gaulton, A., Hersey, A., Velankar, S., Petryszak, R., Hastings, J., Bellis, L., McGlinchey, S. and Overington, J.P. 
UniChem: A Unified Chemical Structure Cross-Referencing and Identifier Tracking System.
Journal of Cheminformatics2013, 5:3 (January 2013).


Chambers, J., Davies, M., Gaulton, A., Papadatos, G., Hersey and Overington, J.P.
UniChem: extension of InChI-based compound mapping to salt, connectivity and stereochemistry layers.
Journal of Cheminformatics2014, 6:43 (September 2014)

Tuesday, 14 March 2017

Chemogenomics Analyst Wanted


We are looking to recruit a scientist to support our work for the Horizon 2020 project “Coordinated Research Infrastructures Building Enduring Life-science services” (CORBEL). The role is to facilitate scientists in their use of chemogenomics resources by enabling database searching and evaluation of data.
  • To be responsible for liaising with scientists engaged in CORBEL and advising on the use of chemogenomics resources to progress their projects;
  • To help in the identification and analysis of bioactivity data from multiple database resources;
  • To construct and utilize appropriate workflows to facilitate the pharmacological profiling of molecules and chemotypes, the identification of potential off-target effects and the development of target prediction models;
  • To identify interoperability gaps between resources and help with developing solutions;
  • To organize and run appropriate training courses for scientists engaged in the CORBEL project;

 For full details of the position, or to apply see:
 https://www.embl.de/jobs/searchjobs/index.php?ref=EBI_00897&newlang=1

The closing date is 9th April 2017

Monday, 27 February 2017

Position to work on tractability in Open Targets



There is currently an opening for a Protein Computational Scientist to work on methods to assess and quantify the tractability (druggability) of potential new targets for drug discovery. This is a two year position funded by the Open Targets initiative.

The appointee will work with scientists from the Open Targets partners to assess, validate and develop methods for quantifying target tractability with the ultimate goal of incorporating such methodologies into the target validation platform (https://www.targetvalidation.org/). The initial focus will be on “small molecule” tractability but we are also interested in other modalities in due course (e.g. antibody therapies). Many of the current methods to assess small molecule tractability are based on the use of 3D protein structures, but such information is only available for a subset of potential targets; a key component of the project is to determine robust methods and pipelines that can be applied to novel targets where there is much more limited information.

For more details or to apply, click here

Closing date is 9th March


(the image above is taken from the Fpocket publication: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-168)

Thursday, 9 February 2017

ChEMBL Webinars



We will be running a new series of webinars over the next few months. These will cover a range of topics including basic introductions to the Chemogenomics resources (ChEMBL, SureChEMBL, UniChem) as well as more detailed topics, a schema walkthrough and ChEMBL web services.

The first webinar will be a basic introduction to ChEMBL and will be on 22nd February at 2pm GMT (3pm CET, 9am EST).

If you would like to attend the webinar, please email to register.
Please note, spaces are limited so please let us know as soon as possible if you register but are then unable to attend.

We will post further details of upcoming webinars here, so watch this space!

The ChEMBL Team