Findseq

From PyMOLWiki
Revision as of 18:15, 13 January 2012 by Tlinnet (talk | contribs)
Jump to navigation Jump to search
Type Python Script
Download findseq.py
Author(s) Jason Vertrees
License BSD
This code has been put under version control in the project Pymol-script-repo

Overview & Motivation

Anyone ever give you a protein and then say, find the sequence "FLVEW"? Well, this script will find any string or regular expression in a given object and return that selection for you. Here's an example,

reinitialize
import findseq
# fetch two sugar-binding PDB
fetch 1tvn, async=0
# Now, find FATEW in 1tvn, similarly
findseq FATEW, 1tvn
# lower-case works, too
findseq fatew, 1tvn
# how about a regular expression?
findseq F.*W, 1tvn
 
# Find the regular expression:
#  ..H[TA]LVWH
# in the few proteins loaded.
# I then showed them as sticks and colored them to highlight matched AAs
for x in cmd.get_names(): findseq.findseq("..H[TA]LVWH", x, "sele_"+x, firstOnly=1)
Red residues were those matching the regular expression '..H[TA]LVWH'.

Usage

I built this to be rather flexible. You call it as:

findseq needle, haystack[, selName[, het[, firstOnly ]]]

where the options are:

needle the sequence of amino acids to find. Should be a string of one letter amino acid abbreviations. Can also be a string-style regular expression (eg. FW.*QQ).
haystack the PyMOL object or selection in which to search
selName the name of the returned selection. If you leave this blank, it'll be foundSeqXYZ where XYZ is some random integer (eg. foundSeq1435); if you supply sele then the usual PyMOL (sele) is used; and, finally, if it's anything else, then that will be used verbatim. Defaults to foundSeqXYZ so as not to overwrite any selections you might have in sele.
het 0/1 -- if 0 then heteroatoms are not considered; if 1 then they are; defaults to 0
firstOnly 0/1 -- if 0 then all matches are selected and returned; if 1 then only the first is returned