Cealign plugin: Difference between revisions

From PyMOLWiki
Jump to navigation Jump to search
mNo edit summary
Line 1: Line 1:
== Introduction ==
'''Go directly to [[Cealign#Version_0.8-RBS|DOWNLOAD]]'''
'''Go directly to [[Cealign#Version_0.8-RBS|DOWNLOAD]]'''
== Introduction ==


This page is the home page of the open-source CEAlign PyMOL plugin.  The CE algorithm is a fast and accurate protein structure alignment algorithm, pioneered by Drs. Shindyalov and Bourne (See  
This page is the home page of the open-source CEAlign PyMOL plugin.  The CE algorithm is a fast and accurate protein structure alignment algorithm, pioneered by Drs. Shindyalov and Bourne (See  

Revision as of 17:10, 14 April 2007

Introduction

Go directly to DOWNLOAD

This page is the home page of the open-source CEAlign PyMOL plugin. The CE algorithm is a fast and accurate protein structure alignment algorithm, pioneered by Drs. Shindyalov and Bourne (See References). There are a few changes from the original CE publication (See Notes).

The source code is implemented in C with the rotations finally done by Numpy in Python. Because the computationally complex portion of the code is written in C, it's quick. That is, on my machines --- relatively fast 64-bit machines --- I can align two 400+ amino acid structures in about 0.300 s with the C++ implementation.

This plugs into PyMol very easily. See the code and examples for installation and usage.

Comparison to PyMol

Why should you use this?

PyMOL's structure alignment algorithm is fast and robust. However, its first step is to perform a sequence alignment of the two selections. Thus, proteins in the twilight zone or those having a low sequence identity, may not align well. Because CE is a structure-based alignment, this is not a problem. Consider the following example. The image at LEFT was the result of CE-aligning two proteins (1C0M chain B to 1BCO). The result is 152 aligned (alpha carbons) residues (not atoms) at 4.96 Angstroms. The image on the RIGHT shows the results from PyMol's align command: an alignment of 221 atoms (not residues) at an RMSD of 15.7 Angstroms.

Examples

Usage

Syntax

CEAlign has the semantic, and syntactic formalism of

 cealign MASTER, TARGET

where a post-condition of the algorithm is that the coordinates of the MASTER protein are unchanged. This allows for easier multi-protein alignments. For example,

 cealign 1AUE, 1BZ4
 cealign 1AUE, 1B68
 cealign 1AUE, 1A7V
 cealign 1AUE, 1CPR

will superimpose all the TARGETS onto the MASTER.

Examples
cealign 1cll and i. 42-55, 1ggz and c. A
cealign 1kao, 1ctq
cealign 1fao, 1eaz
Multiple Structure Alignments

Use the alignto command, now provided with cealign. Just type,

alignto PROT

to align all your proteins in PyMOL to the one called, PROT.


Results

See Changes for updates. But, overall, the results here are great.

Installation

note: Windows installer coming soon.

Requirements

  1. Numpy
  2. Python 2.4+ with distutils
  3. C compiler

Directions

  1. uncompress the distribution file cealign-VERSION.tgz
  2. cd cealign-VERSION
  3. sudo python setup.py install
  4. insert "run DIR_TO_CEALIGN/cealign.py" and "run DIR_TO_CEALIGN/qkabsch.py" into your .pymolrc file, or just run the two Python scripts by hand.
  5. load some molecules
  6. run, cealign molecule1, molecule2
  7. enjoy

Pre-compiled Hackish Install

For those people that prefer to use the pre-compiled version of PyMOL, here are the basics for your install. This is a poor method of installing Cealign. I suggest users compile and install their own PyMOL. The final goal is to get

  1. ccealign.so module into PYMOL/ext/lib/python2.4/site-packages
  2. numpy installed (get the numpy directory into (or linked into) PYMOL/ext/lib/python2.4/site-packages
  3. and be able to run cealign.py and qkabsch.py from PyMOL.

If you can do the above three steps, cealign should run from the pre-compiled PyMOL.

In more detail, on a completely fictitious machine --- that is, I created the following commands from a fake machine and I don't expect a copy/paste of this to work anywhere, but the commands should be helpful enough to those who need it:

# NOTES:
# This is fake code: don't copy/paste it.
#
# PYMOL='dir to precompiled PyMOL install'
# CEALIGN='dir where you will unpack cealign'
# replace lib with lib64 for x86-64
# install numpy
apt-get install numpy

# link numpy to PyMOL
ln -s /usr/local/lib/python2.4/site-packages/numpy PYMOL/ext/lib/python2.4/site-packages

# download and install Cealign
wget http://www.pymolwiki.org/images/e/ed/Cealign-0.6.tar.bz2
tar -jxvf Cealign-0.6.tar.bz2
cd cealign-0.6
sudo python setup.py build
cp build/lib-XYZ-linux/ccealign.so PYMOL/ext/lib/python2.4/site-packages

# run pymol and try it out
pymol
run CEALIGN/cealign.py
run CEALIGN/qkabsch.py
fetch 1cew 1mol, async=0
cealign 1c, 1m

The Code

Please unpack and read the documentation. All comments/questions should be directed to Jason Vertrees (javertre _at_ utmb ...dot... edu).

LATEST IS v0.8-RBS. (Dedicated to Bryan Sutton for allowing me to use his computer for testing.)

Version 0.8-RBS

Version 0.7

Version 0.6

Coming Soon

  • Windows binary
  • Linux Binaries (32bit, x86-64)
  • Better instructions for precompiled distributions
  • Optimization

Updates

2007-04-14

v0.8-RBS source updated. Found the bug that had been plaguing 32-bit machines. This should be the least release for a little while.


2007-03-27

v0.6 source code updated. The important things about this release are:

  • a correction in the scoring algorithm (with improvements up to, and over 2Ang in RMSD)!
  • cleaned up some memory issues
  • provided the simple alignto command
  • added (poor) installation notes for those people with a pre-compiled version.

2007-03-07

This change was too small to make a whole new release. I just added a small script to do multiple structure alignments. Skip to the Multiple Structure Alignment Section on this page.

Also, I provide the option of aligning based solely upon RMSD or upon the better CE-Score. See the References for information on the CE Score.

Troubleshooting

Post your problems/solutions here.

Unicode Issues in Python/Numpy

Problem: Running/Installing cealign gives

Traceback (most recent call last):
  File "/home/byron/software/pymol_1.00b17/pymol/modules/pymol/parser.py",
line 308, in parse
  File "/home/byron/software/pymol_1.00b17/pymol/modules/pymol/parsing.py",
line 410, in run_file
  File "qkabsch.py", line 86, in ?
    import numpy
  File "/usr/lib/python2.4/site-packages/numpy/__init__.py", line 36, in ?
    import core
  File "/usr/lib/python2.4/site-packages/numpy/core/__init__.py", line 5, in ?
    import multiarray
ImportError: /home/byron/software/pymol/ext/lib/python2.4/site-packages/numpy/core/multiarray.so:
undefined symbol: _PyUnicodeUCS4_IsWhitespace

where the important line is

undefined symbol: _PyUnicodeUCS4_IsWhitespace

This problem indicates that your Numpy Unicode is using a different byte-size for unicode characters than is the Python distribution your PyMOL is running from. For example, this can happen if you use the pre-built PyMOL and some other pre-built Numpy package.


Solution: Hand-install Numpy.


LinAlg Module Not Found

Problem: Running CE Align gives the following error message:

run qkabsch.py
Traceback (most recent call last):
File "/usr/lib/python2.4/site-packages/pymol/parser.py", line 285, in parse
parsing.run_file(exp_path(args[nest][0]),pymol_names,pymol_names)
File "/usr/lib/python2.4/site-packages/pymol/parsing.py", line 407, in run_file
execfile(file,global_ns,local_ns)
File "qkabsch.py", line 86, in ?
import numpy
File "/usr/lib/python2.4/site-packages/numpy/__init__.py", line 40, in ?
import linalg
ImportError: No module named linalg


Solution: You do not have the linear algebra module installed (or Python can't find it) on your machine. One workaround is to install Scientific Python. (on debian/ubuntu this can be done by: sudo apt-get install python-scipy) Another is to reinstall the Numpy package from source, ensuring that you have the necessary requirements for the linear algebra module (linpack, lapack, fft, etc.).

CCEAlign & NumPy Modules Not Found

Problem: Running CE Align gives the following error message:

PyMOL>run cealign.py
Traceback (most recent call last):
  File "/home/local/warren/MacPyMOL060530/build/Deployment/MacPyMOL.app/pymol/modules/pymol/parser.py", line 297, in parse
  File "/home/local/warren/MacPyMOL060530/build/Deployment/MacPyMOL.app/pymol/modules/pymol/parsing.py", line 408, in run_file
  File "/usr/local/pymol/scripts/cealign-0.1/cealign.py", line 59, in ?
    from ccealign import ccealign
ImportError: No module named ccealign
run qkabsch.py
Traceback (most recent call last):
File "/home/local/warren/MacPyMOL060530/build/Deployment/MacPyMOL.app/pymol/modules/pymol/parser.py", line 297, in parse
File "/home/local/warren/MacPyMOL060530/build/Deployment/MacPyMOL.app/pymol/modules/pymol/parsing.py", line 408, in run_file
File "qkabsch.py", line 86, in ?
import numpy
ImportError: No module named numpy


Solution: This problem occurs under Apple Mac OS X if (a) the Apple's python executable on your machine (/usr/bin/python, currently version 2.3.5) is superseded by Fink's python executable (/sw/bin/python, currently version 2.5) and (b) you are using precompiled versions of PyMOL (MacPyMOL, PyMOLX11Hybrid or PyMOL for Mac OS X/X11). These executables ignore Fink's python and instead use Apple's - so, in order to run CE Align, one must install NumPy (as well as CE Align itself) using Apple's python. To do so, first download the Numpy source code archive (currently version 1.0.1), unpack it, change directory to numpy-1.0.1 and specify the full path to Apple's python executable during installation: sudo /usr/bin/python setup.py install | tee install.log. Then, donwload the CE Align source code archive (currently version 0.2), unpack it, change directory to cealign-0.2 and finally install CE Align as follows: sudo /usr/bin/python setup.py install | tee install.log. Luca Jovine 05:11, 25 January 2007 (CST).

References

Text taken from PubMed and formatted for the wiki. The first reference is the most important for this code.

  1. Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998 Sep;11(9):739-47. PMID: 9796821 [PubMed - indexed for MEDLINE]
  2. Jia Y, Dewey TG, Shindyalov IN, Bourne PE. A new scoring function and associated statistical significance for structure alignment by CE. J Comput Biol. 2004;11(5):787-99. PMID: 15700402 [PubMed - indexed for MEDLINE]
  3. Pekurovsky D, Shindyalov IN, Bourne PE. A case study of high-throughput biological data processing on parallel platforms. Bioinformatics. 2004 Aug 12;20(12):1940-7. Epub 2004 Mar 25. PMID: 15044237 [PubMed - indexed for MEDLINE]
  4. Shindyalov IN, Bourne PE. An alternative view of protein fold space. Proteins. 2000 Feb 15;38(3):247-60. PMID: 10713986 [PubMed - indexed for MEDLINE]

License

The CEAlign and all its subprograms that I wrote, are released under the open source Free BSD License (BSDL).