Difference between revisions of "Fetching scripts"

From PyMOLWiki
Jump to: navigation, search
(Implementation)
Line 47: Line 47:
 
* We'll probably make use of some screen scraping library. I don't know the state of the art here, but I've seen at least the following, and would love some comments:
 
* We'll probably make use of some screen scraping library. I don't know the state of the art here, but I've seen at least the following, and would love some comments:
 
** Generic interfaces
 
** Generic interfaces
*** [http://scrappy.org scrappy]
+
*** [http://scrapy.org scrapy] (looks reasonable, [https://github.com/clofresh/couch-crawler/blob/master/python/couchcrawler/spiders/wiki.py this page] may also be useful.
 
*** [http://www.crummy.com/software/BeautifulSoup/ Beautiful Soup]
 
*** [http://www.crummy.com/software/BeautifulSoup/ Beautiful Soup]
 
*** [http://lxml.de lxml]
 
*** [http://lxml.de lxml]

Revision as of 14:26, 4 May 2011

Overview

I'm considering building in a mechanism for automatically fetching scripts from PyMOLWiki. The goal is to allow users to say

fetch findSurfaceResidues, type=script
findSurfaceResidues doShow=True, cutoff=0.5

The convenience benefits are obvious, especially for new users, and I think that lowering the barrier to script usage will greatly increase both the number of people who use various scripts and the incentive to place scripts on the wiki (especially if the fetch mechanism makes it easy for script authors to provide a citation/DOI/etc.).

Issues

Security

Running untrusted code is trouble. Some ideas

  • MediaWiki allows us to protect pages so that only administrators can edit them. We could protect all approved scripts.
  • Alternately, we could have a protected page that links to the approved scripts. I lean towards the first option from a security standpoint, although it's obviously less convenient for script authors, as it requires them to get an administrator involved when making changes. Maybe a hybrid system where scripts have a development version and a release version? I don't want to make too much overhead, though.
  • We should print a warning each time a new script is fetched anyway
  • Can fetched scripts persist across saved sessions? Perhaps not.
  • Plugins? This is probably more worth considering for a future version, but it would be nice to be able to load plugins as well. Since plugins are (now) installed permanently, we have to think carefully about the implications.

Convenience options

The main benefit is to make things as convenient and easy as possible, especially for new users.

  • Local cache. This would make reloading scripts with each new session easier and faster. You could then stick a bunch of "fetch" lines in your pymolrc.
  • A command to list all available scripts?

Validation

  • How will users know that their script is doing the correct thing
  • Perhaps we should have two classes of scripts: approved and validated


Format

My guess is that we'll require fetchable scripts to follow a certain format on the wiki pages. That should include some metadata like

  • Version number. This makes debugging easier and it makes smart caching possible
  • Citation. Script authors should be able to provide a preferred citation, DOI, etc. One of the benefits is to get script authors more credit.
  • Documentation. Or should this be handled in the doc string?

Implementation

  • This will obviously be written in Python.
  • We'll probably make use of some screen scraping library. I don't know the state of the art here, but I've seen at least the following, and would love some comments: