Difference between revisions of "Advanced Scripting"

From PyMOLWiki
Jump to navigation Jump to search
(link to python api was broken)
 
(5 intermediate revisions by 2 users not shown)
Line 4: Line 4:
 
Python while incredibly useful, is much slower at math than some other strictly typed languages and sometimes we have libraries built in other languages.  It's faster, for complicated problems to package your data, send it to C, do some math, and pass the results back to Python than to just do everything in Python.  The beauty of the Python API, is that we can do just that.
 
Python while incredibly useful, is much slower at math than some other strictly typed languages and sometimes we have libraries built in other languages.  It's faster, for complicated problems to package your data, send it to C, do some math, and pass the results back to Python than to just do everything in Python.  The beauty of the Python API, is that we can do just that.
  
This is more advanced scripting, and requires some knowledge of the [http://docs.python.org/api/api.html Python API], and some outside language.  The example shown here is in C.  The C++ extensions are very similar.
+
This is more advanced scripting, and requires some knowledge of the [https://docs.python.org/2/extending/extending.html Python API], and some outside language.  The example shown here is in C.  The C++ extensions are very similar.
  
  
Line 47: Line 47:
 
==== Unpacking the Data ====
 
==== Unpacking the Data ====
 
Let's unpack the data in '''args'''.  Remember, '''args''' has a Python [http://docs.python.org/lib/typesseq.html list of lists].  So, to unpack that we do the following inside of funName:
 
Let's unpack the data in '''args'''.  Remember, '''args''' has a Python [http://docs.python.org/lib/typesseq.html list of lists].  So, to unpack that we do the following inside of funName:
<source lang="c" line="1">
+
 
 +
<source lang="c">
 
static PyObject*
 
static PyObject*
 
funName_funName(PyObject* self, PyObject* args)
 
funName_funName(PyObject* self, PyObject* args)
Line 63: Line 64:
 
  ... more code ...
 
  ... more code ...
 
</source>
 
</source>
 +
 
Line 4 creates the two C objects that we will unpack the lists into.  They are pointers to PyObjects.
 
Line 4 creates the two C objects that we will unpack the lists into.  They are pointers to PyObjects.
Line 6 is where the magic happens.  We call, '''[http://docs.python.org/api/arg-parsing.html PyArg_ParseTuple]''' passing it the args we got from Python.  The '''(OO)''' is Python's code for ''I'm expecting two <u>O</u>bjects inside a list <u>()</u>''.  Were it three objects, then '''(OOO)'''.  The first object will be put into '''&listA''' and the second into '''&listB'''.  The exact [http://docs.python.org/api/arg-parsing.html argument building specifications] are very useful.  
+
Line 6 is where the magic happens.  We call, '''[http://docs.python.org/api/arg-parsing.html PyArg_ParseTuple]''' passing it the args we got from Python.  The '''(OO)''' is Python's code for ''I'm expecting two <u>O</u>bjects inside a list <u>()</u>''.  Were it three objects, then '''(OOO)'''.  The first object will be put into '''&listA''' and the second into '''&listB'''.  The exact [http://docs.python.org/api/arg-parsing.html argument building specifications] are very useful.
  
 
==== Reference Counting ====
 
==== Reference Counting ====
Line 87: Line 89:
  
 
Now, we should have access to the data the user sent us, in '''listA''' and '''listB,''' and it should be there and be clean.  But, not forgetting that '''listA''' and '''listB''' are list of 3D coordinates, let's unpack them further into sets of coordinates.  Because we know the length of the lists, we can do something like the following:
 
Now, we should have access to the data the user sent us, in '''listA''' and '''listB,''' and it should be there and be clean.  But, not forgetting that '''listA''' and '''listB''' are list of 3D coordinates, let's unpack them further into sets of coordinates.  Because we know the length of the lists, we can do something like the following:
<source lang="c" line="1">
+
 
 +
<source lang="c">
 
       // make space for the current coords; pcePoint is just a float[3]
 
       // make space for the current coords; pcePoint is just a float[3]
 
       pcePoint coords = (pcePoint) malloc(sizeof(cePoint)*length);
 
       pcePoint coords = (pcePoint) malloc(sizeof(cePoint)*length);
Line 118: Line 121:
 
  ... more code ...
 
  ... more code ...
 
</source>
 
</source>
 +
 
Where, '''pcePoint''' is just a float[3].  Line 2 just gets some memory ready for the 3xlenght list of coordinates.  Then, for each item for 1..length, we unpack the list using '''[http://docs.python.org/api/listObjects.html PyList_GetItem]''', into '''curCoord'''.  This then gets further unpacked into the float[3], '''coords'''.
 
Where, '''pcePoint''' is just a float[3].  Line 2 just gets some memory ready for the 3xlenght list of coordinates.  Then, for each item for 1..length, we unpack the list using '''[http://docs.python.org/api/listObjects.html PyList_GetItem]''', into '''curCoord'''.  This then gets further unpacked into the float[3], '''coords'''.
  
Line 171: Line 175:
 
static PyMethodDef CEMethods[] = {
 
static PyMethodDef CEMethods[] = {
 
         {"ccealign", ccealign_ccealign, METH_VARARGS, "Align two proteins using the CE Algorithm."},
 
         {"ccealign", ccealign_ccealign, METH_VARARGS, "Align two proteins using the CE Algorithm."},
         {NULL, NULL, 0, NULL}
+
         {NULL, NULL, 0, NULL}     /* Always use this as the last line in your table. */
 
};
 
};
 
</source>
 
</source>
 +
'''[http://docs.python.org/ext/methodTable.html METH_VARARGS]''' can also be '''METH_KEYWORDS''', where the former tells C that it should expect a simple tuple or list which we will unpack with '''PyArg_ParseTuple''', and the latter tells C that it should expect to unpack the variables by name
 +
with the '''PyArg_ParseTupleAndKeywords'''.  When using '''METH_KEYWORDS''' your function needs to accept a third parameter, a '''Py_Object*''' that is the dictionary of names for unpacking.  For more information check out the [http://docs.python.org/ext/methodTable.html Python method table docs].
  
Each module undergoes initialization.  By default the modules init. function is: '''initNAME()'''.  So, in our example above, '''initccealign()".  During this initialization step, we need to call [Py_InitModule].  For or above example, we'd have
+
Each module undergoes initialization.  By default the modules initialization function is: '''initNAME()'''.  So, in our example above, '''initccealign()".  During this initialization step, we need to call [http://docs.python.org/ext/methodTable.html Py_InitModule].  For or above example, we'd have,
 
<source lang="c">
 
<source lang="c">
 
PyMODINIT_FUNC
 
PyMODINIT_FUNC
Line 196: Line 202:
 
</source>
 
</source>
  
This will be cleaned up later.
+
At this point, you should have a fully functioning program in C/C++ intergrated with PyMOL/Python.
  
 
=== Installing Your Module ===
 
=== Installing Your Module ===
 +
==== Overview ====
 +
The [http://www.python.org/doc/2.2.3/ext/distributing.html Python distutils pacakge] is a great method for distributing your modules over various platforms.  It handles platform specific issues as well as simplifying the overall install process.  For us, those module-builders, we need to create the distuils' setup.py script, and given the above -- that's the last step.
 +
 +
More detailed information can be found one the Python documentation page for [http://docs.python.org/ext/building.html installing C/C++ modules].  There is also information on [http://www.python.org/doc/2.2.3/ext/distributing.html how to build source and binary distribution packages]. 
  
You can use python's setuptools.  For example, I have cealign setup to install as simply as:
+
For example of how powerful disutils is, I have [cealign] setup to install as simply as:
 
<source lang="bash">
 
<source lang="bash">
 
python setup.py build cealign
 
python setup.py build cealign
 
python setup.py install cealign
 
python setup.py install cealign
 
</source>
 
</source>
and I'm set.
 
  
I'll fill this out later.
+
PyMOL also uses distutils for it's source-install.  If more people understood distutils, I think they would install PyMOL from source since you get all the latest features.
  
==== Notes ====
+
==== Setup.py ====
I'll finish this soon.
+
The setup file needs to know the following (at the very least): what source files comprise the project, what include directories to scan, the project name.  You can also add more metadata such as version number, author, author_email, url, etc.  For this example, let's assume we have the following directory structure,
[[User:Inchoate|Tree]]
+
<source lang="bash">
 +
.
 +
|-- build
 +
|-- dist
 +
|-- doc
 +
|  `-- funName
 +
|-- src
 +
|  |-- etc
 +
|  |  `-- tnt
 +
|  |      |-- doxygen
 +
|  |      |  `-- html
 +
|  |      `-- html
 +
|  `-- tnt
 +
</source>
 +
and we want to include all the ''.cpp'' files from the '''src''' directory, and all the include files in '''tnt'''.  We start setup.py as follows,
 +
<source lang="python">
 +
#
 +
# -- setup.py -- your module's install file
 +
#
  
* Add the calling functions in C
+
# import distutils
* discuss installation
+
from distutils.core import setup, Extension
 +
# for pasting together file lists
 +
from glob import glob
 +
# for handling path names in a os independent way
 +
from os.path import join;
 +
 
 +
# grab all of the .h and .cpp files in src/ and src/tnt
 +
srcList = [ x for x in glob(join("src", "*.cpp")) ]
 +
# set the include directories
 +
incDirs = [ join( "src", "tnt") ]
 +
</source>
 +
 
 +
Ok, now Python knows which files to include.  Now we need to create a new [http://docs.python.org/dist/module-distutils.extension.html Extension].  We can simply call,
 +
<source lang="python">
 +
# create the extension given the function name, ''funName,'' the souce list and include directories.
 +
ccealignMods = Extension( 'funName', sources=srcList, include_dirs=incDirs  )
 +
</source>
 +
 
 +
Lastly, all we have to do is call the final setup function, with the extension we just created and some metadata (if we want):
 +
<source lang="python">
 +
setup( name="funName",
 +
        version="0.1-alpha",
 +
        description="funName: A simple example to show users how to make C/C++ modules for PyMOL",
 +
        author="Your Name Here",
 +
        author_email="Your Email Goes Here",
 +
        url="The URL of your work",
 +
        ext_modules=[ccealignMods]
 +
        )
 +
</source>
 +
 
 +
And voila -- we're done.  The users should now be able to execute,
 +
<source lang="bash">
 +
python setup.py build
 +
# remove the brackets if you need to be root to install, see [Linux_Install#Installing_a_Script_Without_Superuser_Access Installing PyMOL w/o Superuser access] for an example.
 +
[sudo] python setup.py install
 +
</source>
 +
 
 +
== Notes ==
 
* discuss the pains of debugging
 
* discuss the pains of debugging
 +
 +
== Conclusion ==
 +
I hope you found this helpful and will spur you to actually write some PyMOL modules or help you overcome the speed limitations inherent in Python's math (in comparison to other strictly-typed languages).
 +
 +
I'm happy to hear any comments or questions you may have.  [[User:Inchoate|Tree]] 09:14, 19 May 2008 (CDT)
  
 
== Example ==
 
== Example ==
Line 230: Line 299:
  
  
 
+
[[Category:Scripting|Advanced_Scripting]]
 
[[Category:Development|Advanced_Scripting]]
 
[[Category:Development|Advanced_Scripting]]
 
[[Category:Tutorials|Advanced_Scripting]]
 
[[Category:Tutorials|Advanced_Scripting]]

Latest revision as of 16:18, 20 October 2014

On this page, we discuss more complex scripting. Python is great, but it is much slower at mathematics than C/C++/Java/FORTRAN. For that reason, you may find it more useful to export your data to another language, operate on it there and then import the results back into PyMOL. We discuss the Python API and the general operating procedure for successfully writing your own scripts.

Advanced Scripting

Python while incredibly useful, is much slower at math than some other strictly typed languages and sometimes we have libraries built in other languages. It's faster, for complicated problems to package your data, send it to C, do some math, and pass the results back to Python than to just do everything in Python. The beauty of the Python API, is that we can do just that.

This is more advanced scripting, and requires some knowledge of the Python API, and some outside language. The example shown here is in C. The C++ extensions are very similar.


Python, PyMOL and C

Here, I will show you how to write a C-module that plugs into Python and talks nicely with PyMOL. The example actually shows how to make a generic C-function and use it in Python.

First, let's assume that we want to call a function, let's call it funName. Let's assume funName will take a Python list of lists and return a list---for example passing the C++ program the XYZ coordinates of each atom, and returning a list of certain atoms with some property. I will also assume we have funName.h and funName.c for C code files. I have provided this, a more complex example, to show a real-world problem. If you were just sending an integer or float instead of packaged lists, the code is simpler; if you understand unpacking the lists then you'll certainly understand unpacking a simple scalar.

C++

If you tell Python that you're using C++ code (see the setup below) then it'll automatically call the C++ compiler instead of the C compiler. There are warnings you may want to be aware of though.

My experience with this has been pretty easy. I simple renamed my ".c" files to ".cpp", caught the few errors (darn it, I didn't typecast a few pointers from malloc) and the code compiled fine. My experience with this is also quite limited, YMMV.

Calling the External Function

So, to start, let's look at the Python code that will call the C-function:

#
# -- in someCode.py
#
# Call funName.  Pass it a list () of lists.  (sel1 and sel2 are lists.)
# Get the return value into rValFromC.
#
rValFromC = funName( (sel1, sel2) );

where sel1 and sel2 could be any list of atom coordinates, say, from PyMOL. (See above.)

Ok, this isn't hard. Now, we need to see what the code that receives this function call in C, looks like. Well, first we need to let C know we're integrating with Python. So, in your header file of funName.h we put:

// in funName.h
#include <Python.h>

Next, by default your C-function's name is funName_funName (and that needs to be setup, I'll show how, later). So, let's define funName:

static PyObject*
funName_funName(PyObject* self, PyObject* args)
{
...more code...

This is the generic call. funName is taking two pointers to PyObjects. It also returns a PyObject. This is how you get the Python data into and out of C. It shows up in "args" array of packaged Python objects and we then unpack it into C, using some helper methods. Upon completion of unpacking, we perform our C/C++ procedure with the data, package up the results using the Python API, and send the results back to Python/PyMOL.

Unpacking the Data

Let's unpack the data in args. Remember, args has a Python list of lists. So, to unpack that we do the following inside of funName:

static PyObject*
funName_funName(PyObject* self, PyObject* args)
{
       PyObject *listA, *listB;

       if ( ! PyArg_ParseTuple(args, "(OO)", &listA, &listB) ) {
                printf("Could not unparse objects\n");
                return NULL;
        }

        // let Python know we made two lists
        Py_INCREF(listA);
        Py_INCREF(listB);
 ... more code ...

Line 4 creates the two C objects that we will unpack the lists into. They are pointers to PyObjects. Line 6 is where the magic happens. We call, PyArg_ParseTuple passing it the args we got from Python. The (OO) is Python's code for I'm expecting two Objects inside a list (). Were it three objects, then (OOO). The first object will be put into &listA and the second into &listB. The exact argument building specifications are very useful.

Reference Counting

Next, we check for success. Unpacking could fail. If it does, complain and quit. Else, listA and listB now have data in them. To avoid memory leaks we need to manually keep track of PyObjects we're tooling around with. That is, I can create PyObjects in C (being sneaky and not telling Python) and then when Python quits later on, it'll not know it was supposed to clean up after those objects (making a leak). To, we let Python know about each list with Py_INCREF(listA) and Py_INCREF(listB). This is reference counting.

Now, just for safety, let's check the lists to make sure they actually were passed something. A tricky user could have given us empty lists, looking to hose the program. So, we do:

     // handle empty selections (should probably do this in Python, it's easier)
     const int lenA = PyList_Size(listA);
     if ( lenA < 1 ) {
             printf("ERROR: First selection didn't have any atoms.  Please check your selection.\n");
             // let Python remove the lists
             Py_DECREF(listA);
             Py_DECREF(listB);
             return NULL;
      }

We check the list size with, PyList_Size and if it's 0 -- we quit. But, before quitting we give control of the lists back to Python so it can clean up after itself. We do that with Py_DECREF.

More Complex Unpacking

If you're dealing with simple scalars, then you might be able to skip this portion.

Now, we should have access to the data the user sent us, in listA and listB, and it should be there and be clean. But, not forgetting that listA and listB are list of 3D coordinates, let's unpack them further into sets of coordinates. Because we know the length of the lists, we can do something like the following:

       // make space for the current coords; pcePoint is just a float[3]
       pcePoint coords = (pcePoint) malloc(sizeof(cePoint)*length);

       // loop through the arguments, pulling out the
       // XYZ coordinates.
       int i;
       for ( i = 0; i < length; i++ ) {
               PyObject* curCoord = PyList_GetItem(listA,i);
               Py_INCREF(curCoord);
       
               PyObject* curVal = PyList_GetItem(curCoord,0);
               Py_INCREF(curVal);
               coords[i].x = PyFloat_AsDouble(curVal);
               Py_DECREF(curVal);

               curVal = PyList_GetItem(curCoord,1);
               Py_INCREF(curVal);
               coords[i].y = PyFloat_AsDouble(curVal);
               Py_DECREF(curVal);

               curVal = PyList_GetItem(curCoord,2);
               Py_INCREF(curVal);
               coords[i].z = PyFloat_AsDouble(curVal);
               Py_DECREF(curVal);

               Py_DECREF(curCoord);
        }

 ... more code ...

Where, pcePoint is just a float[3]. Line 2 just gets some memory ready for the 3xlenght list of coordinates. Then, for each item for 1..length, we unpack the list using PyList_GetItem, into curCoord. This then gets further unpacked into the float[3], coords.

We now have the data in C++/C data structures that the user passed from PyMOL. Now, perform your task in C/C++ and then return the data to PyMOL.

Sending the Results back to Python/PyMOL

Once you're done with your calculations and want to send your data back to PyMOL, you need to package it up into a Python object, using the Python API, and then return it. You should be aware of the expected return value and how you're packaging the results. If you user calls,

(results1,results2) = someCFunction(parameters1,parameters2)

then you need to package a list with two values. To build values for returning to PyMOL, use Py_BuildValue. Py_BuildValue takes a string indicating the type, and then a list of values. Building values for return has been documented very well. Consider an example: if I want to package an array of integers, the type specifier for two ints for Py_BuildValue is, "[i,i]", so my call could be:

# Package the two ints into a Python pair of ints.
PyObject* thePair = Py_BuildValue( "[i,i]", int1, in2 );

# Don't forget to tell Python about the object.
Py_INCREF(thePair);

If you need to make a list of things to return, you iterate through a list and make a bunch of thePairs and add them to a Python list as follows:

# Make the python list
PyObject* theList = PyList_New(0);
# Tell Python about it
Py_INCREF(theList);

for ( int i = 0; i < someLim; i++ ) {
  PyObject* thePair = Py_BuildValue( "[i,i]", int1, in2 );
  Py_INCREF(thePair);
  PyList_Append(theList,thePair);

To add a list of lists, just make an outer list,

PyObject* outerList = PyList_New(0);

and iteratively add to it your inner lists:

PyObject* outerList = PyList_New(0);
Py_INCREF(outerList);

for ( int i = 0; i < someLim; i++ ) {
  // make the inner list, called curList;
  curList = PyObject* curList = PyList_New(0);
  Py_INCREF(curList);

  // fill the inner list, using PyList_Append with some data, shown above
  ...

  PyList_Append(outerList,curList);

Great, now we can extract data from Python, use it in C/C++, and package it back up for returning to Python. Now, we need to learn about the minimal baggage needed for C to operate with Python. Keep reading; almost done.

Initialization

We need to discuss how our functions will be called from Python. First, we need to create a method table.

static PyMethodDef CEMethods[] = {
        {"ccealign", ccealign_ccealign, METH_VARARGS, "Align two proteins using the CE Algorithm."},
        {NULL, NULL, 0, NULL}     /* Always use this as the last line in your table. */
};

METH_VARARGS can also be METH_KEYWORDS, where the former tells C that it should expect a simple tuple or list which we will unpack with PyArg_ParseTuple, and the latter tells C that it should expect to unpack the variables by name with the PyArg_ParseTupleAndKeywords. When using METH_KEYWORDS your function needs to accept a third parameter, a Py_Object* that is the dictionary of names for unpacking. For more information check out the Python method table docs.

Each module undergoes initialization. By default the modules initialization function is: initNAME(). So, in our example above, initccealign()". During this initialization step, we need to call Py_InitModule. For or above example, we'd have,

PyMODINIT_FUNC
initccealign(void)
{
    (void) Py_InitModule("ccealign", CEMethods);
}

Finally, the main function that starts the whole shebang should look something like:

int
main(int argc, char* argv[])
{
        Py_SetProgramName(argv[0]);
        Py_Initialize();
        initccealign();
        return(EXIT_SUCCESS);
}

At this point, you should have a fully functioning program in C/C++ intergrated with PyMOL/Python.

Installing Your Module

Overview

The Python distutils pacakge is a great method for distributing your modules over various platforms. It handles platform specific issues as well as simplifying the overall install process. For us, those module-builders, we need to create the distuils' setup.py script, and given the above -- that's the last step.

More detailed information can be found one the Python documentation page for installing C/C++ modules. There is also information on how to build source and binary distribution packages.

For example of how powerful disutils is, I have [cealign] setup to install as simply as:

python setup.py build cealign
python setup.py install cealign

PyMOL also uses distutils for it's source-install. If more people understood distutils, I think they would install PyMOL from source since you get all the latest features.

Setup.py

The setup file needs to know the following (at the very least): what source files comprise the project, what include directories to scan, the project name. You can also add more metadata such as version number, author, author_email, url, etc. For this example, let's assume we have the following directory structure,

.
|-- build
|-- dist
|-- doc
|   `-- funName
|-- src
|   |-- etc
|   |   `-- tnt
|   |       |-- doxygen
|   |       |   `-- html
|   |       `-- html
|   `-- tnt

and we want to include all the .cpp files from the src directory, and all the include files in tnt. We start setup.py as follows,

#
# -- setup.py -- your module's install file
#

# import distutils
from distutils.core import setup, Extension
# for pasting together file lists
from glob import glob
# for handling path names in a os independent way
from os.path import join;

# grab all of the .h and .cpp files in src/ and src/tnt
srcList = [ x for x in glob(join("src", "*.cpp")) ]
# set the include directories
incDirs = [ join( "src", "tnt") ]

Ok, now Python knows which files to include. Now we need to create a new Extension. We can simply call,

# create the extension given the function name, ''funName,'' the souce list and include directories.
ccealignMods = Extension( 'funName', sources=srcList, include_dirs=incDirs  )

Lastly, all we have to do is call the final setup function, with the extension we just created and some metadata (if we want):

setup( name="funName",
        version="0.1-alpha",
        description="funName: A simple example to show users how to make C/C++ modules for PyMOL",
        author="Your Name Here",
        author_email="Your Email Goes Here",
        url="The URL of your work",
        ext_modules=[ccealignMods]
         )

And voila -- we're done. The users should now be able to execute,

python setup.py build
# remove the brackets if you need to be root to install, see [Linux_Install#Installing_a_Script_Without_Superuser_Access Installing PyMOL w/o Superuser access] for an example.
[sudo] python setup.py install

Notes

  • discuss the pains of debugging

Conclusion

I hope you found this helpful and will spur you to actually write some PyMOL modules or help you overcome the speed limitations inherent in Python's math (in comparison to other strictly-typed languages).

I'm happy to hear any comments or questions you may have. Tree 09:14, 19 May 2008 (CDT)

Example

See the source code for cealign.


See Also

stored, iterate_state, identify.