Jay's journal: python

Showing posts with label python. Show all posts

Tuesday, April 8, 2014

List All Installed Python Packages

There are times that we need a list of all installed python packages. One way to accomplish it is using

help('modules')

in python shell. It shows all installed packages including stdlib. If we are not interested in stdlib, command pip works:

pip freeze

It generates installed packages in requirements format.

Monday, December 16, 2013

Learning PyLucene by Example

Apache Lucene is a a full text search framework built in Java. It has many appealing features such as high-performance indexing, powerful and efficient search algorithms, and cross-platform solution. PyLucene is a Python extension for using (Java) Lucene. Its goal is to allow users to use Lucene for text indexing and searching within Python. PyLucene is not a port but a Python wrapper around Java Lucene, which embeds Lucene running in a JVM into a Python process.

This is a quick guide on PyLucene. We show code snippets for full-text searching on bible versers, which are stored in a dictionary data structure. There are three steps in buliding the index: create an index, fill the index and close resources. In the first step, we choose StandardAnalyzer as the analyzer, SimpleFSDirectory as (file) storage scheme for our IndexWriter. In the second step, each verse (document) is labelled with five fields that serve as index for future search. The text of each verse is labelled as "Text." Our search will be primarily on this field. However, the text of each verse is for indexing only but not stored (Field.Store.NO) in the index, since all verses are already stored in our main data store (bible dictionary). Label "Testament" allows us to distinguish if the verse is in Old or New testament. The last three fields: book, chapter, verse are keys to the data store that allow us to retrieve the text of the specified verse. Once the index is built, we close all the resources in the last step.

The snippet for building the index is as follows:

def make_index():
    '''Make index from data source -- bible
    Some global variables used:
        bible: a dictionary that stores all bible verses
        OTbooks: a list of books in old testament
        NTbooks: a list of books in new testament
        chapsInBook: a list of number of chapters in each book
    '''
    lucene.initVM()
    path = raw_input("Path for index: ")
    # 1. create an index
    index_path = File(path)
    analyzer = StandardAnalyzer(Version.LUCENE_35)
    index = SimpleFSDirectory(index_path)
    config = IndexWriterConfig(Version.LUCENE_35, analyzer)
    writer = IndexWriter(index, config)

    # 2 construct documents and fill the index
    for book in bible.keys():
        if book in OTbooks:
            testament = "Old"
        else:
            testament = "New"
        for chapter in xrange(1, chapsInBook[book]+1):
            for verse in xrange(1, len(bible[book][chapter])+1):
                verse_text = bible[book][chapter][verse]
                doc = Document()
                doc.add(Field("Text", verse_text, Field.Store.NO, Field.Index.ANALYZED))
                doc.add(Field("Testament", testament, Field.Store.YES, Field.Index.ANALYZED))
                doc.add(Field("Book", book, Field.Store.YES, Field.Index.ANALYZED))
                doc.add(Field("Chapter", str(chapter), Field.Store.YES, Field.Index.ANALYZED))
                doc.add(Field("Verse", str(verse), Field.Store.YES, Field.Index.ANALYZED))
                writer.addDocument(doc)

    # 3. close resources
    writer.close()
    index.close()

There are five steps in our simple search: open the index, parse the query string, search the index, display results and close resources. In the first step, we use IndexReader to open the index built before for our simple search. The query string (kwds) is parsed by the QueryParser in the second step. The search job is done in step three by IndexSearcher, and the results are stored in the object hits. In step four, we get (getField) the book, chapter, and verse fields from the documents returned in the previous step, which allow us to retrieve the bible verses from the data store and display them. Finally, we close all resources in step five.

The snippet for searching and displaying results is as follows:

def search(indexDir, kwds):
    '''Simple Search
    Input paramenters:
        1. indexDir: directory name of the index
        2. kwds: query string for this simple search
    display_verse(): procedure to display the specified bible verse
    '''
    lucene.initVM()
    # 1. open the index
    analyzer = StandardAnalyzer(Version.LUCENE_35)
    index = SimpleFSDirectory(File(indexDir)
    reader = IndexReader.open(index)
    n_docs = reader.numDocs()

    # 2. parse the query string
    queryparser = QueryParser(Version.LUCENE_35, "Text", analyzer)
    query = queryparser.parse(kwds)

    # 3. search the index
    searcher = IndexSearcher(reader)
    hits = searcher.search(query, n_docs).scoreDocs

    # 4. display results
    for i, hit in enumerate(hits):
        doc = searcher.doc(hit.doc)
        book = doc.getField('Book').stringValue()
        chapter = doc.getField('Chapter').stringValue()
        verse = doc.getField('Verse').stringValue()
        display_verse(book, int(chapter), int(verse))

    # 5. close resources
    searcher.close()

Wednesday, December 11, 2013

Using Java Within Python Applications -- Part II

In the second part, we exam the question: how to add JAR files to the sys.path at runtime? One's first thought is that setting CLASSPATH (an environment variable) should resolve the issue. Why would one bother to add JAR files at runtime? This is the one of the topics in the Jython book entitled "working with CLASSPATH." It provides two good reasons along with a solution for this question.

The first reason is to make end users' life easier for not to know anything about environment variables. The second and more compelling reason is "when there is no normal user account to provide environment variables." The issue "add JAR files to the sys.path at runtime" in Jython is similar to "load classes (JAR files) at runtime" in the Java world. Fortunately, solution exists for the Java case. The classPathHacker (Listing B-11) presented in Jython book is a translation of that solution from the Java world.

In the second part of this blog we will go through a more practical example to finish our tour on using Java within Python applications. The Java example (SnS.java) selected here is modified from the APIExamples.java that comes with the JSword.

The only task provided by SnS class is defined in the searchAndShow method, which takes a string (kwds) as key words, searches (bible.find()) through all books in the bible, and returns (html formatted) findings in a string (result.toString()). There are over a dozen jar files in the distribution of JSword package. One may start JSword application with a shell script (BiblDesktop.sh) that sets environment variables (including CLASSPATH) properly then initiates the GUI application.

In this example, we use Jython to add JAR files at runtime, to call SnS, and to print search results in text form. It is obvious that we have full knowledge of which JAR files are needed before starting the application, the example here is for educational purpose.

Here is the SnS Java class, which requires a few JAR files:

import java.net.URL;
import java.util.Iterator;
import java.util.List;
import java.util.Map;

import org.crosswire.common.util.NetUtil;
import org.crosswire.common.util.ResourceUtil;
import org.crosswire.common.xml.SAXEventProvider;
import org.crosswire.common.xml.TransformingSAXEventProvider;
import org.crosswire.common.xml.XMLUtil;
import org.crosswire.jsword.book.Book;
import org.crosswire.jsword.book.BookData;
import org.crosswire.jsword.book.BookException;
import org.crosswire.jsword.book.Books;
import org.crosswire.jsword.passage.Key;
import org.crosswire.jsword.passage.Passage;
import org.crosswire.jsword.passage.RestrictionType;
import org.xml.sax.SAXException;

/**
* All the methods in this class highlight some are of the API and how to use it.
*
* @see gnu.lgpl.License for license details.
*      The copyright to this program is held by it's authors.
* @author Joe Walker [joe at eireneh dot com]
*/
public class SnS
{
    /**
     * The name of a Bible to find
     */
    private static final String BIBLE_NAME = "KJV"; //$NON-NLS-1$

    /**
     * An example of how to do a search and then get text for each range of verses.
     * @throws BookException
     * @throws SAXException
     *
     * @param kwds keyword to search                         JSL
     * @return search results is returned in String format   JSL
     *
     */
    public String searchAndShow(String kwds) throws BookException, SAXException
    {
        Book bible = Books.installed().getBook(BIBLE_NAME);

        Key key = bible.find(kwds); //$NON-NLS-1$

        // Here is an example of how to iterate over the ranges and get the text for each
        // The key's iterator would have iterated over verses.

        // The following shows how to use a stylesheet of your own choosing
        String path = "xsl/cswing/simple.xsl"; //$NON-NLS-1$
        URL xslurl = ResourceUtil.getResource(path);

        Iterator rangeIter = ((Passage) key).rangeIterator(RestrictionType.CHAPTER); // Make ranges break on chapter boundaries.
        //
        // prepare for result
        //      using a StringBuilder to hold all search results
        //              JSL
        //
        StringBuilder result = new StringBuilder();
        while (rangeIter.hasNext())
        {
            Key range = (Key) rangeIter.next();
            BookData data = new BookData(bible, range);
            SAXEventProvider osissep = data.getSAXEventProvider();
            SAXEventProvider htmlsep = new TransformingSAXEventProvider(NetUtil.toURI(xslurl), osissep);
            String text = XMLUtil.writeToString(htmlsep);
            result.append(text);
        }
        return result.toString();           // search results --- JSL
    }

    public static void main(String[] args) throws BookException, SAXException
    {
        SnS examples = new SnS();
        System.out.println(examples.searchAndShow("+what +wilt +thou"));
    }

}

As we mentioned before the classPathHacker is used for adding JAR files at runtime. Let cphacker.py be the file where classPathHacker is defined. Here is our Jython codes, where the file directory (dir) for storing all jar files is passed by the command line paremeter sys.argv. The main steps in this Jython codes are: add (jarLoad.addFile()) JAR files one by one at runtime, import the SnS class, create an SnS object (example), and let the object complete its task (searchAndShow).

import sys, glob
from cphacker import *

#
# preparation for Jar files (classpath) first
#
#   we need to do this prior to import the SnS class
#       this is because SnS import lots of classes in
#           those jar files
#       it does not work if you move this section to the main
#
jarLoad = classPathHacker()
dir = sys.argv[1]
jars = dir + "/*.jar"
jarfiles = glob.glob(jars)
for jar in jarfiles:
    jarLoad.addFile(jar)
jarLoad.addFile('.')

import SnS

if __name__ == "__main__":
    example = SnS()
    print example.searchAndShow("+what +wilt +thou").encode("utf-8")

There is one thing we need to address before closing this post: the
classPathHacker described in Jython book does not work for Jython 2.5.2. A slightly modified version can be obtained from glasblog, which works fine under

Using Java Within Python Applications -- Part I

There are two common approaches for using Java within Python applications. One is to construct a bridge between the Java virtual machine (JVM) and the Python interpreter, where Java and Python codes run separately. JPype, Py4J, JCC and Pyjnius are some examples in this category. The other, as Jython did, is to run Python within a JVM. Jython is described as:

an implementation of the Python programming language which is
designed to run on the Java(tm) Platform.

It is perceived that Jython is easier to install and to use compared with other options. The following examples give a brief tour of Jython. In our first example, we will see some Java objects (java.lang.Math) being used within Jython.

>>> from java.lang import Math
>>> Math.max(317, 220)
317L
>>> Math.pow(2, 4)
16.0

In our second example, we will create a Java object (Person) and use it within a Jython application.

Definition of the Person object: Person.java

public class Person {
    private String firstName;
    private String lastName;
    private int age;
    public Person(String firstName, String lastName, int age){
        this.firstName = firstName;
        this.lastName = lastName;
        this.age = age;
    }
    public String getFirstName() {
        return firstName;
    }
    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }
    public int getAge() {
        return age;
    }
    public void setAge(int age) {
        this.age = age;
    }
}

Using Person.java in Jython (we need to compile the java code first):

>>> import Person
>>> john = Person("john", "dole", 27)
>>> john.getFirstName()
u'john'
>>> john.getAge()
27
>>> john.setFirstName("alias")
>>> john.getFirstName()
u'alias'

Our first two examples demonstrate that there is no distinction in using Java or Python objects within Jython: there is no need to start up JVM for using any Java objects, since Jython did that for us.

It is possible to extend (subclass) Java classes via Jython classes. Our third example, taken from the Jython book, show this.

The Java code that defines two methods: Calculator.java

/**
* Java calculator class that contains two simple methods
*/
public class Calculator {
    public Calculator(){
    }
    public double calculateTip(double cost, double tipPercentage){
        return cost * tipPercentage;
    }
    public double calculateTax(double cost, double taxPercentage){
        return cost * taxPercentage;
    }
}

The Python code (with minor corrections) to extend the Java class: JythonCalc.py

import Calculator
from java.lang import Math
class JythonCalc(Calculator):
    def __init__(self):
        pass
    def calculateTotal(self, cost, tip, tax):
        return cost + self.calculateTip(cost, tip) + self.calculateTax(cost, tax)
if __name__ == "__main__":
    calc = JythonCalc()
    cost = 23.75
    tip = .15
    tax = .07
    print "Starting Cost: ", cost
    print "Tip Percentage: ", tip
    print "Tax Percentage: ", tax
    print Math.round(calc.calculateTotal(cost, tip, tax))

The result will be:

Starting Cost: 23.75
Tip Percentage: 0.15
Tax Percentage: 0.07
29

One question remains: Is Jython the same language as Python? A short answer is Yes. In fact, there are differences. The main one is that the current version of Jython (v2.5) cannot use CPython extension modules written in C. If one wants to use such a module, one should look for an equivalent written in pure Java or Python. However, it is claimed that future release will eliminate this difference.

Tuesday, October 8, 2013

An Introduction to Fabric

Fabric is described as a simple, Pythonic tool for application deployment or systems administration tasks. The fab tool simply imports your fabfile (fabfile.py, by default) and executes the function or functions at your disposal. There is no magic about fabric – anything we can do in a Python script can be done in a fabfile. Here is a small but complete fabfile that defines one task:

def hello(name="World"):
print("Hello %s!" % name)

Calling fab hello will display the familiar message "Hello World!" However, we can personalize it by issuing:

$fab hello:name=Jay

The message shown will be: "Hello Jay!"

Below is another small fabfile that allows us to find kernel release as well as machine hardware of a remote system by using SSH:

from fabric.api import run

def host_type():
run("uname -srm")

Since there is no remote host defined in the fabfile, we need to specify it (hostname) using option -H:

$fab -H hostname host_type

Fabric provides many command-execution functions, the following five are frequently used:

run(command) -- Run a shell command on a remote host.
sudo(comand) -- Run (with superuser privileges) a shell command on a remote host.
local(command) -- Run a command on the local host.
get(remote_path, local_path) -- Download one or more files from a remote host.
put(local_path, remote_path) -- Upload one or more files to a remote host.

Fabric also includes context managers for use with the with statement, the following three are commonly used:

settings(*args, **kwargs) -- Nest context managers and/or override env variables.
lcd(path) -- Update local current working directory.
cd(path) -- Update remote current working directory.

The third sample fabfile allows us to deploy a Django project to three production servers (serv1, serv2 and serv3). In fact, it can be extended to deploy various applications. The deployment consists of the following steps:

testing the project -- python manage.py test apps
packing the project -- tar czf /tmp/project.tgz .
moving the packed file to server -- put("/tmp/project.tgz", "/path/to/serv/tmp/project.tgz")
unpacking the file -- run("tar xzf /path/to/serv/tmp/project.tgz"), and
deploying the project on server -- run("touch manage.py")

To use it, we issue:
$fab install

from __future__ import with_statement
from fabric.api import *
from fabric.contrib.console import confirm

env.hosts = ['serv1', 'serv2', 'serv3']
env.user= "admin"

def test():
    """
    Run test; if it fails prompt the user for action.
    """
    src_dir = "/path/to/local/src/directory"
    with settings(warn_only=True), lcd(src_dir):
        result = local("python manage.py test apps", capture=True)
    if result.failed and not confirm("Tests failed. Continue anyway?"):
        abort("Aborting at user request.")

def pack_move():
    """
    Archive our current code and upload to servers.
    """
    src_dir = "/path/to/local/src/directory"
    with settings(warn_only=True), lcd(src_dir):
        local( "tar czf /tmp/project.tgz .")
    put( "/tmp/project.tgz", "/path/to/serv/tmp/project.tgz" )
    local( "rm /tmp/project.tgz" )

def install():
    """
    Deploy the project on servers.
    """
    test()
    pack_move()
    dst_dir = "/path/to/serv/dst/directory"
    with settings(hide("warnings"), warn_only=True):
        if run("test -d %s" % dst_dir).failed:
            run("mkdir -p %s" % dst_dir)
    with cd(dst_dir):
        run("tar xzf /path/to/serv/tmp/project.tgz")
        run("touch manage.py")
    run("rm -f /path/to/serv/tmp/project.tgz")

Wednesday, September 11, 2013

Python Decompiler

Need a decompiler for your python byte-code?

Here is one: uncompyle2, which works for python 2.

Sunday, August 25, 2013

Automated Remote Backups with Rdiff-backup

One subtle feature of rdiff-backup is that it allows users to make remote backups over Internet using SSH, which makes remote backups very secure since data transferred is encrypted.

One problem is that SSH requires a password for logging, which is not convenient if we want to run rdiff-backup as a cron job. Here we show how to initiate rdiff-backups from a central backup server, and pull data from a farm of hosts to be backed up. For security reasons, the central server uses a non-root user account (rdiffbk) to perform backups, whereas root account is used on each host being backed up. Though root accounts are used on hosts being backed up, they are protected by SSH public-key authentication mechanism with forced-command-only option.

For convenience, I'll call the central backup server canine and three hosts to be backed up beagle, shepherd and terrier. For short, only works on canine and beagle will be shown.

Here is the procedure for backup server canine:

generate one passphrase-free SSH key pair for each host being backed up,
move corresponding ssh key to each host,
create SSH configuration file, and
create a cron job file

Step 1: generate one passphrase-free SSH key pair for each host being backed up

To generate RSA type pair for host beagle, we issue

ssh-keygen -t rsa -f id_beagle-backup

where private key will be saved in file id_beagle-backup and public key id_beagle-backup.pub.

Step 2: move corresponding ssh key to each host

To move id_beagle-backup.pub to host beagle, we may choose to use any preferred method (for example, ftp, sftp, or ssh-copy-id), since public key is not sensitive. Other hosts can be done similarly.

Step 3: create SSH configuration file

To define how to connect to host beagle with backup key, we place the following lines into file ~rdiffbk/.ssh/config. Other hosts need to be configured similarly.

host beagle-backup
    hostname beagle
    user root
    identifyfile ~rdiffbk/.ssh/id_beagle-backup
    protocol 2

Step 4: create a cron job file

The following cron job file automates the remote backups daily at 200am, 210am, and 220am, respectively.

0 2 * * * rdiff-backup beagle-backup::/remote_dir beagle/remote_dir
10 2 * * * rdiff-backup shepherd-backup::/remote_dir shepherd/remote_dir
20 2 * * * rdiff-backup terrier-backup::/remote_dir terrier/remote_dir

By default setting, rdiff-backup uses SSH to pipe remote data. Therefore, both SSH server and rdiff-backup are required in hosts to be backed up.
What left on host beagle and others (shepherd, terrier) is simply to give permission to canine to access it (through SSH) and run rdiff-backup. This can be done in the following two steps:

Step I: create an authorized-keys file for root account

To enable SSH public key authentication for root account, we need to create the file /root/.ssh/authorized_keys, which consists public key for user rdiffbk@canine, forced command and other options. The public key (id_beagle-backup.pub) should be available for beagle once we have done Step 2. A sample authorized_keys file is as follows:

command="rdiff-backup --server --restrict-read-only /",from="canine",no-port-forwarding,no-X11-forwarding,no-pty ssh-rsa AAAAB3.... rdiffbk@canine

Here, for security reason, rdiff-backup server is restricted to real only, and
we disable port-forward, X11-forward and pty options. See here for more details.

Step II: configure SSH server for root access

As we saw here, this can be done by put the following line in the SSH server configuration file (sshd_config):

PermitRootLogin forced-commands-only

Sunday, July 7, 2013

Load Python Module Dynamically

There are times that we need to import some python module and, for whatever reasons, we will not know the name of the module until run-time. This can be achieved by built-in function __import__().

Dynamic loading python module is useful, for example, if we plan to read configuration from a file. Here is an example that allows us to load the module specified in sys.argv[1]:

config_file = sys.argv[1]
config = __import__(config_file)

For newer versions (2.7 and up), there are convenience wrappers for __import__(). We may use module importlib:

import importlib
config_file = sys.argv[1]
config = importlib.import_module(config_file)

Both __import__() and importlib.import_module() search modules in certain locations (e.g., sys.modules, sys.meta_path, sys.path). There are times that modules being loaded are not in those default locations. This can be achieved by using imp.load_source(), which allows us to load and initialize modules implemented as Python source files:

import imp
pathname = sys.argv[1]
config = imp.load_source(name, pathname)

where pathname is the path name of the configuration file.

Monday, April 22, 2013

poor men's PGP

I just finished the pmPGP, a CLI for sending/receiving openPGP mime messages.

The pmPGP is based on python and gnupg; it supports sending emails in the following formats:

plain -- regular email
sign -- RFC3156
encrypt -- RFC3156
sign-encrypt -- RFC3156
Sencrypt -- Symmetric encryption (for fun and personal usage)
sign-Sencrypt -- (for fun and personal usage)




Poor man may use pmPGP to store/backup files on email servers.

Sounds interesting? Get it from:

https://github.com/liujay/pmPGP

Wednesday, April 3, 2013

How to test Django development server remotely?

Django development server is intended only for use while developing. There are times that we need to test Django applications remotely through development server. Thought we may bind the development server to some public IPs to achieve this, it is not a good practice in general for various reasons. Ssh port forwarding provides a much secure way for remote testing of Django development server. Let serv be the server where Django development server resides, test the remote host where you want to conduct the tests. Here is a how-to.

1. On host serv, start your development server:
python manage.py runserver 8080
where 8080 is any port number of your choice.

2. On host test, start ssh port forwarding:
ssh -L 9090:localhost:8080 user@serv
where 9090 is the port number providing remote service.

3. On host test, point your browser to localhost:9090 to perform tests.

If you want to test the development server through other hosts, say third, you should enable gateway forwarding on host test by issuing:

ssh -g -L 9090:localhost:8080 user@serv

Then you may perform tests through host third by pointing your browser to test:9090.

Tuesday, January 1, 2013

deploying django with gunicorn and nginx

Gunicorn is a wsgi HTTP server. As its document says:

it is best to use Gunicorn behind HTTP proxy server.

Nginx is strongly suggested as the proxy server. Here is a quick note for deploying django with nginx and gunicorn.

The nginx server can be configured as follows:

server {
    listen   80; ## listen for ipv4;
    listen   [::]:80 default ipv6only=on; ## listen for ipv6

    server_name your.app.domain.name;

    location /static {
        root /path/to/your/staticfiles;
    }

    location / {
        proxy_pass http://127.0.0.1:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

Start gunicorn to serve your django application. Be sure to bind gunicorn to the port number (8080) specified in the previous configuration file:

gunicorn -b localhost:8080 django_application.wsgi:application

Tuesday, December 25, 2012

hooray

It works!

[django + mezzanine + apache2 + wsgi] on fedora 17 and ubuntu 12.04

how to:

collect static files for mezzanine
python manage.py collectstatic
put static and template files under the mezzaine application directory
configure mod_wsgi (conf.d/wsgi.conf) for apache2
there are examples on site of djangoproject:
https://docs.djangoproject.com/en/1.4/howto/deployment/wsgi/modwsgi/