Cluster mols: Difference between revisions
(→GUI Options: cleaned up the language a bit) |
(pmw requirement) |
||
(10 intermediate revisions by 2 users not shown) | |||
Line 2: | Line 2: | ||
cluster_mols is a PyMOL plugin that allows the user to quickly select compounds from a virtual screen to be purchased or synthesized. | cluster_mols is a PyMOL plugin that allows the user to quickly select compounds from a virtual screen to be purchased or synthesized. | ||
It helps the user by automatically clustering input compounds based on their molecular | It helps the user by automatically clustering input compounds based on their molecular fingerprints [http://openbabel.org/wiki/FP2] and loading them into the PyMOL window. cluster_mols also highlights both good and bad polar interactions between the ligands and a user specified receptor. Additionally there are a number of keyboard controls for selecting and extracting compounds, as well as functionality for searching online to see if there are vendors for a selected compound. | ||
= Description = | = Description = | ||
Line 18: | Line 18: | ||
= Download = | = Download = | ||
The most up to date version (recommended) of cluster_mols is available through | The most up to date version (recommended) of cluster_mols is available through BitBucket at: https://bitbucket.org/mpb21/cluster_mols_py/overview | ||
= Installation = | = Installation = | ||
Line 26: | Line 26: | ||
Python packages (install using easy_install or pip) | Python packages (install using easy_install or pip) | ||
# openbabel | # openbabel | ||
# numpy | # numpy | ||
# scipy | # scipy | ||
# Tkinter | # Tkinter | ||
# fastcluster | # fastcluster | ||
# | # Pmw-py3 '''Important:''' Pmw 2.0.1 does not work; install the Pmw-py3 package instead of Pmw to get version 2.1 | ||
Command line tools (These must be accessible through your PATH environment variable): | Command line tools (These must be accessible through your PATH environment variable): | ||
# | # obabel -- from http://openbabel.org | ||
Recent versions of cluster_mols do not require sdsorter, but it is still a very useful tool for dealing with sdf files. | |||
# sdsorter -- https://sourceforge.net/projects/sdsorter/ | # sdsorter -- https://sourceforge.net/projects/sdsorter/ | ||
Line 49: | Line 50: | ||
The program requires that the input be a '.sdf' or '.sdf.gz' file. If your compounds are not in that format, use the 'babel' tool from OpenBabel to convert them. | The program requires that the input be a '.sdf' or '.sdf.gz' file. If your compounds are not in that format, use the 'babel' tool from OpenBabel to convert them. | ||
== GUI Options == | |||
[[File:cluster_mols_screen_1_desc.png|200px|thumb]] | |||
[[File:cluster_mols_screen_2_desc.png|200px|thumb]] | |||
In the 'Compute Similarities' tab, there are options for selecting a new ligand and for specifying how many CPUs you want to run the similarity calculation on. Clicking the 'Compute Similarity' button will start the similarity calculations. If you check the 'Ignore saved results?' box it will ignore any saved intermediate results files. This could be useful if you change the contents of the original input file while keeping the file name the same. | In the 'Compute Similarities' tab, there are options for selecting a new ligand and for specifying how many CPUs you want to run the similarity calculation on. Clicking the 'Compute Similarity' button will start the similarity calculations. If you check the 'Ignore saved results?' box it will ignore any saved intermediate results files. This could be useful if you change the contents of the original input file while keeping the file name the same. | ||
Line 54: | Line 59: | ||
Depending on how many compounds there are, the similarity calculations may take between 1 and 10 minutes. If you launched PyMOL from the command line, you will be able to see the progress printing out in the console. The similarity results are saved to a file so if you want to re-cluster the same input file, you do not need to wait to recompute the similarities. | Depending on how many compounds there are, the similarity calculations may take between 1 and 10 minutes. If you launched PyMOL from the command line, you will be able to see the progress printing out in the console. The similarity results are saved to a file so if you want to re-cluster the same input file, you do not need to wait to recompute the similarities. | ||
The first option on the Cluster Compounds tab defines how the clusters will be sorted. The default is to sort by the 'minimizedAffinity' which is inserted into the output sdf file after minimization with 'smina' (An enhanced version of AutoDock Vina. Available at: http://www.smina.sf.net). You can also sort the clusters by any SD tag that exists in the input file, or by the Title (alphabetically) or by the size of the cluster. | |||
The first option on the Cluster Compounds tab defines how the clusters will be sorted. The default is to sort by the 'minimizedAffinity' which is inserted into the output sdf file after minimization with 'smina' (An enhanced version of AutoDock Vina. Available at: smina.sf.net). You can also sort the clusters by any SD tag that exists in the input file, or by the Title (alphabetically) or by the size of the cluster. | |||
The second option is the height at which the hierarchical clustering tree is cut. The units are arbitrary, but a higher number leads to a small number of large clusters of less similar compounds, and lower cutoffs lead to more small clusters of more similar compounds. Play around with the cutoff until you get a clustering that you like. The third option is a check box for whether to group clusters with only one compound into one ‘singletons’ cluster. The forth option enables the show_contacts tool that is described below. There is also a field to enter a PyMOL selection string to compute the hydrogen bonds to. Finally, there is a button to create the clusters and load them into PyMOL. | The second option is the height at which the hierarchical clustering tree is cut. The units are arbitrary, but a higher number leads to a small number of large clusters of less similar compounds, and lower cutoffs lead to more small clusters of more similar compounds. Play around with the cutoff until you get a clustering that you like. The third option is a check box for whether to group clusters with only one compound into one ‘singletons’ cluster. The forth option enables the show_contacts tool that is described below. There is also a field to enter a PyMOL selection string to compute the hydrogen bonds to. Finally, there is a button to create the clusters and load them into PyMOL. | ||
== Keyboard Controls == | == Keyboard Controls == | ||
Once you have finished the similarity calculations and clustering mentioned above, you can navigate the Familiar to gamers, you can move through clusters using WASD, (W for up, S for down, A for left, D for right). | Once you have finished the similarity calculations and clustering mentioned above, you can navigate the clusters using the keyboard. Familiar to gamers, you can move through clusters using the WASD keys, (W for up, S for down, A for left, D for right). | ||
The one important caveat is that due to limitations in PyMOL, the WASD movement needs to be used with the Control (or Alt) key. Meaning | The one important caveat is that due to [[Set_Key#KEYS_WHICH_CAN_BE_REDEFINED|limitations]] in PyMOL, the WASD movement needs to be used with the Control (or Alt) key. Meaning Ctrl-W moves up. It seems weird, but you quickly get used to it. | ||
Line 75: | Line 78: | ||
Ctrl-D – Move to the next compound in the cluster | Ctrl-D – Move to the next compound in the cluster | ||
Ctrl-F -- Check for vendors | |||
If you acquired your compounds from ZINCPharmer (http://zincpharmer.csb.pitt.edu/) and/or your compounds have title that start with a ZINC ID (http://www.docking.zinc.org) or a MolPort ID (http://www.molport.com), you can hit 'Ctrl-F' to see if there are any vendors available. | |||
Compound selection | Compound selection | ||
In addition to moving through the clusters, you can also extract compounds that you like for later viewing using the following controls. | In addition to moving through the clusters, you can also extract compounds that you like for later viewing using the following controls. Pressing F3 will append the current compounds into a new object with the suffix '_selected'. | ||
F1 – Print title of currently selected molecule | F1 – Print title of currently selected molecule | ||
Line 88: | Line 95: | ||
F4, F12 – Print List | F4, F12 – Print List | ||
== show_contacts == | == show_contacts == | ||
show_contacts is | show_contacts is an expanded version of list_hbonds[http://pldserver1.biochem.queensu.ca/~rlc/work/pymol/] that shows both favorable and unfavorable contacts between ligands and a protein receptor. show_contacts has been integrated into cluster_mols as a function and is executed automatically when clustering. It can also be run by itself, not in the context of cluster_mols. In the standalone case, the usage is as follows: | ||
show_contacts(selection,selection2,result="contacts",cutoff=3.6, bigcutoff = 4.0): | show_contacts(selection,selection2,result="contacts",cutoff=3.6, bigcutoff = 4.0): | ||
The arguments are as follows: | The arguments are as follows: | ||
selection -- pymol selection string for the protein | #selection -- pymol selection string for the protein | ||
selection2 -- pymol selection string for the ligands | #selection2 -- pymol selection string for the ligands | ||
results -- prefix of the object that the distances should be shown in. (Default "contacts") | #results -- prefix of the object that the distances should be shown in. (Default "contacts") | ||
cutoff -- Distance cutoff for what is considered an ideal hydrogen bond. | #cutoff -- Distance cutoff for what is considered an ideal hydrogen bond. | ||
bigcutoff -- Distance cutoff for a non-ideal hydrogen bond. | #bigcutoff -- Distance cutoff for a non-ideal hydrogen bond. | ||
Output: | Output: | ||
Line 113: | Line 116: | ||
# thin-yellow lines -- Non ideal hydrogen bonds -- bigcutoff | # thin-yellow lines -- Non ideal hydrogen bonds -- bigcutoff | ||
# thick-red lines -- Polar clashes, i.e. Donor-Donor, Acceptor-Acceptor -- cutoff | # thick-red lines -- Polar clashes, i.e. Donor-Donor, Acceptor-Acceptor -- cutoff | ||
= Citing ClusterMols = | |||
If you use ClusterMols in your work, please cite the following. | |||
Baumgartner, Matthew (2016) IMPROVING RATIONAL DRUG DESIGN BY INCORPORATING NOVEL BIOPHYSICAL INSIGHT. Doctoral Dissertation, University of Pittsburgh. | |||
= Authors = | = Authors = | ||
The main cluster_mols.py script was conceived of by Matthew P Baumgartner (mpb21 [at] pitt.edu) and Dr. David Koes while working in the lab of Dr. Carlos Camacho at the University of Pittsburgh. The cluster_mols.py script was implemented (and later rewritten) by MPB. The show_contacts functionality and the first version of the objectfocus.py keyboard controls was written by DK. | The main cluster_mols.py script was conceived of by Matthew P Baumgartner (mpb21 [at] pitt.edu) and Dr. David Koes while working in the lab of Dr. Carlos Camacho at the University of Pittsburgh. The cluster_mols.py script was implemented (and later rewritten) by MPB. The show_contacts functionality and the first version of the objectfocus.py keyboard controls was written by DK. | ||
Please send questions/comments/bug reports to | Please send questions/comments/bug reports to matthew.p.baumgartner [at] gmail.com. | ||
[[Category:Plugins]] | [[Category:Plugins]] |
Latest revision as of 08:22, 28 April 2022
cluster_mols is a PyMOL plugin that allows the user to quickly select compounds from a virtual screen to be purchased or synthesized.
It helps the user by automatically clustering input compounds based on their molecular fingerprints [1] and loading them into the PyMOL window. cluster_mols also highlights both good and bad polar interactions between the ligands and a user specified receptor. Additionally there are a number of keyboard controls for selecting and extracting compounds, as well as functionality for searching online to see if there are vendors for a selected compound.
Description
The basic work flow of cluster_mols.py can be broken up into three parts.
- Computing a similarity matrix from the input compounds
- Performing hierarchical clustering on the results from 1)
- Cutting the tree at a user-specified height and creating and sorting clusters
The results of 1 and 2 are saved to python pickle files so you do not have to recompute them in subsequent runs.
In addition, it also highlights both good and bad polar contacts between the ligand and a user specified protein using the 'show_contacts' module described below.
This script also integrates keyboard controls which allows for WASD movement through the clusters as well as keyboard shortcuts for pulling out compounds. See below for usage.
Download
The most up to date version (recommended) of cluster_mols is available through BitBucket at: https://bitbucket.org/mpb21/cluster_mols_py/overview
Installation
This plugin has a number of dependencies that are required. And it is currently only supported on Linux and OSX.
Python packages (install using easy_install or pip)
- openbabel
- numpy
- scipy
- Tkinter
- fastcluster
- Pmw-py3 Important: Pmw 2.0.1 does not work; install the Pmw-py3 package instead of Pmw to get version 2.1
Command line tools (These must be accessible through your PATH environment variable):
- obabel -- from http://openbabel.org
Recent versions of cluster_mols do not require sdsorter, but it is still a very useful tool for dealing with sdf files.
- sdsorter -- https://sourceforge.net/projects/sdsorter/
Once you have the required dependencies, install it through PyMOL's Plugin menu.
PyMOL > Plugin > Install Plugin
Usage
The GUI is relatively straightforward, if you follow it from top to bottom, and then then left to right through the tabs.
The program requires that the input be a '.sdf' or '.sdf.gz' file. If your compounds are not in that format, use the 'babel' tool from OpenBabel to convert them.
GUI Options
In the 'Compute Similarities' tab, there are options for selecting a new ligand and for specifying how many CPUs you want to run the similarity calculation on. Clicking the 'Compute Similarity' button will start the similarity calculations. If you check the 'Ignore saved results?' box it will ignore any saved intermediate results files. This could be useful if you change the contents of the original input file while keeping the file name the same.
Depending on how many compounds there are, the similarity calculations may take between 1 and 10 minutes. If you launched PyMOL from the command line, you will be able to see the progress printing out in the console. The similarity results are saved to a file so if you want to re-cluster the same input file, you do not need to wait to recompute the similarities.
The first option on the Cluster Compounds tab defines how the clusters will be sorted. The default is to sort by the 'minimizedAffinity' which is inserted into the output sdf file after minimization with 'smina' (An enhanced version of AutoDock Vina. Available at: http://www.smina.sf.net). You can also sort the clusters by any SD tag that exists in the input file, or by the Title (alphabetically) or by the size of the cluster.
The second option is the height at which the hierarchical clustering tree is cut. The units are arbitrary, but a higher number leads to a small number of large clusters of less similar compounds, and lower cutoffs lead to more small clusters of more similar compounds. Play around with the cutoff until you get a clustering that you like. The third option is a check box for whether to group clusters with only one compound into one ‘singletons’ cluster. The forth option enables the show_contacts tool that is described below. There is also a field to enter a PyMOL selection string to compute the hydrogen bonds to. Finally, there is a button to create the clusters and load them into PyMOL.
Keyboard Controls
Once you have finished the similarity calculations and clustering mentioned above, you can navigate the clusters using the keyboard. Familiar to gamers, you can move through clusters using the WASD keys, (W for up, S for down, A for left, D for right). The one important caveat is that due to limitations in PyMOL, the WASD movement needs to be used with the Control (or Alt) key. Meaning Ctrl-W moves up. It seems weird, but you quickly get used to it.
Navigation Controls
Ctrl-W – Move up a cluster
Ctrl-S – Move down a cluster
Ctrl-A – Move to the previous compound in a cluster
Ctrl-D – Move to the next compound in the cluster
Ctrl-F -- Check for vendors
If you acquired your compounds from ZINCPharmer (http://zincpharmer.csb.pitt.edu/) and/or your compounds have title that start with a ZINC ID (http://www.docking.zinc.org) or a MolPort ID (http://www.molport.com), you can hit 'Ctrl-F' to see if there are any vendors available.
Compound selection
In addition to moving through the clusters, you can also extract compounds that you like for later viewing using the following controls. Pressing F3 will append the current compounds into a new object with the suffix '_selected'.
F1 – Print title of currently selected molecule
F2 – Remove most recently added compound
F3 – Add currently visible compound to list (Most commonly used)
F4, F12 – Print List
show_contacts
show_contacts is an expanded version of list_hbonds[2] that shows both favorable and unfavorable contacts between ligands and a protein receptor. show_contacts has been integrated into cluster_mols as a function and is executed automatically when clustering. It can also be run by itself, not in the context of cluster_mols. In the standalone case, the usage is as follows:
show_contacts(selection,selection2,result="contacts",cutoff=3.6, bigcutoff = 4.0):
The arguments are as follows:
- selection -- pymol selection string for the protein
- selection2 -- pymol selection string for the ligands
- results -- prefix of the object that the distances should be shown in. (Default "contacts")
- cutoff -- Distance cutoff for what is considered an ideal hydrogen bond.
- bigcutoff -- Distance cutoff for a non-ideal hydrogen bond.
Output:
The output of show_contacts are a set of pymol distance objects. They are color-coded and size coded to indicate different interactions between the ligand and protein. They are controlled by the parameter indicated.
- thin-purple lines -- all possible polar contacts (acc-acc, don-don, acc-don) -- bigcutoff
- thick-yellow lines -- All ideal hydrogen bonds -- cutoff
- thin-yellow lines -- Non ideal hydrogen bonds -- bigcutoff
- thick-red lines -- Polar clashes, i.e. Donor-Donor, Acceptor-Acceptor -- cutoff
Citing ClusterMols
If you use ClusterMols in your work, please cite the following.
Baumgartner, Matthew (2016) IMPROVING RATIONAL DRUG DESIGN BY INCORPORATING NOVEL BIOPHYSICAL INSIGHT. Doctoral Dissertation, University of Pittsburgh.
Authors
The main cluster_mols.py script was conceived of by Matthew P Baumgartner (mpb21 [at] pitt.edu) and Dr. David Koes while working in the lab of Dr. Carlos Camacho at the University of Pittsburgh. The cluster_mols.py script was implemented (and later rewritten) by MPB. The show_contacts functionality and the first version of the objectfocus.py keyboard controls was written by DK.
Please send questions/comments/bug reports to matthew.p.baumgartner [at] gmail.com.