Tuesday, April 8, 2014

List All Installed Python Packages

There are times that we need a list of all installed python packages. One way to accomplish it is using

help('modules')

in python shell. It shows all installed packages including stdlib. If we are not interested in stdlib, command pip works:

pip freeze

It generates installed packages in requirements format.

Monday, December 16, 2013

Learning PyLucene by Example

Apache Lucene is a a full text search framework built in Java. It has many appealing features such as high-performance indexing, powerful and efficient search algorithms, and cross-platform solution. PyLucene is a Python extension for using (Java) Lucene. Its goal is to allow users to use Lucene for text indexing and searching within Python. PyLucene is not a port but a Python wrapper around Java Lucene, which embeds Lucene running in a JVM into a Python process.

This is a quick guide on PyLucene. We show code snippets for full-text searching on bible versers, which are stored in a dictionary data structure. There are three steps in buliding the index: create an index, fill the index and close resources. In the first step, we choose StandardAnalyzer as the analyzer, SimpleFSDirectory as (file) storage scheme for our IndexWriter. In the second step, each verse (document) is labelled with five fields that serve as index for future search. The text of each verse is labelled as "Text."  Our search will be primarily on this field. However, the text of each verse is for indexing only but not stored (Field.Store.NO) in the index, since all verses are already stored in our main data store (bible dictionary). Label "Testament" allows us to distinguish if the verse is in Old or New testament. The last three fields: book, chapter, verse are keys to the data store that allow us to retrieve the text of the specified verse. Once the index is built, we close all the resources in the last step.

The snippet for building the index is as follows:

def make_index():
    '''Make index from data source -- bible
    Some global variables used:
        bible: a dictionary that stores all bible verses
        OTbooks: a list of books in old testament
        NTbooks: a list of books in new testament
        chapsInBook: a list of number of chapters in each book
    '''
    lucene.initVM()
    path = raw_input("Path for index: ")
    # 1. create an index
    index_path = File(path)
    analyzer = StandardAnalyzer(Version.LUCENE_35)
    index = SimpleFSDirectory(index_path)
    config = IndexWriterConfig(Version.LUCENE_35, analyzer)
    writer = IndexWriter(index, config)

    # 2 construct documents and fill the index
    for book in bible.keys():
        if book in OTbooks:
            testament = "Old"
        else:
            testament = "New"
        for chapter in xrange(1, chapsInBook[book]+1):
            for verse in xrange(1, len(bible[book][chapter])+1):
                verse_text = bible[book][chapter][verse]
                doc = Document()
                doc.add(Field("Text", verse_text, Field.Store.NO, Field.Index.ANALYZED))
                doc.add(Field("Testament", testament, Field.Store.YES, Field.Index.ANALYZED))
                doc.add(Field("Book", book, Field.Store.YES, Field.Index.ANALYZED))
                doc.add(Field("Chapter", str(chapter), Field.Store.YES, Field.Index.ANALYZED))
                doc.add(Field("Verse", str(verse), Field.Store.YES, Field.Index.ANALYZED))
                writer.addDocument(doc)

    # 3. close resources
    writer.close()
    index.close()


There are five steps in our simple search: open the index, parse the query string, search the index, display results and close resources. In the first step, we use IndexReader to open the index built before for our simple search. The query string (kwds) is parsed by the QueryParser in the second step. The search job is done in step three by IndexSearcher, and the results are stored in the object hits. In step four, we get (getField) the book, chapter, and verse fields from the documents returned in the previous step, which allow us to retrieve the bible verses from the data store and display them. Finally, we close all resources in step five.

The snippet for searching and displaying results is as follows:

def search(indexDir, kwds):
    '''Simple Search
    Input paramenters:
        1. indexDir: directory name of the index
        2. kwds: query string for this simple search
    display_verse(): procedure to display the specified bible verse
    '''
    lucene.initVM()
    # 1. open the index
    analyzer = StandardAnalyzer(Version.LUCENE_35)
    index = SimpleFSDirectory(File(indexDir)
    reader = IndexReader.open(index)
    n_docs = reader.numDocs()

    # 2. parse the query string
    queryparser = QueryParser(Version.LUCENE_35, "Text", analyzer)
    query = queryparser.parse(kwds)

    # 3. search the index
    searcher = IndexSearcher(reader)
    hits = searcher.search(query, n_docs).scoreDocs

    # 4. display results
    for i, hit in enumerate(hits):
        doc = searcher.doc(hit.doc)
        book = doc.getField('Book').stringValue()
        chapter = doc.getField('Chapter').stringValue()
        verse = doc.getField('Verse').stringValue()
        display_verse(book, int(chapter), int(verse))


    # 5. close resources
    searcher.close()



Wednesday, December 11, 2013

Using Java Within Python Applications -- Part II

In the second part, we exam the question: how to add JAR files to the sys.path at runtime? One's first thought is that setting CLASSPATH (an environment variable) should resolve the issue. Why would one bother to add JAR files at runtime? This is the one of the topics in the Jython book entitled "working with CLASSPATH." It provides two good reasons along with a solution for this question.

The first reason is to make end users' life easier for not to know anything about environment variables. The second and more compelling reason is "when there is no normal user account to provide environment variables." The issue "add JAR files to the sys.path at runtime" in Jython is similar to "load classes (JAR files) at runtime" in the Java world. Fortunately, solution exists for the Java case. The classPathHacker (Listing B-11) presented in Jython book is a translation of that solution from the Java world.

In the second part of this blog we will go through a more practical example to finish our tour on using Java within Python applications. The Java example (SnS.java) selected here is modified from the APIExamples.java that comes with the JSword.

The only task provided by SnS class is defined in the searchAndShow method, which takes a string (kwds) as key words, searches (bible.find()) through all books in the bible, and returns (html formatted) findings in a string (result.toString()). There are over a dozen jar files in the distribution of JSword package. One may start JSword application with a shell script (BiblDesktop.sh) that sets environment variables (including CLASSPATH) properly then initiates the GUI application.

In this example, we use Jython to add JAR files at runtime, to call SnS, and to print search results in text form. It is obvious that we have full knowledge of which JAR files are needed before starting the application, the example here is for educational purpose.

Here is the SnS Java class, which requires a few JAR files:

import java.net.URL;
import java.util.Iterator;
import java.util.List;
import java.util.Map;

import org.crosswire.common.util.NetUtil;
import org.crosswire.common.util.ResourceUtil;
import org.crosswire.common.xml.SAXEventProvider;
import org.crosswire.common.xml.TransformingSAXEventProvider;
import org.crosswire.common.xml.XMLUtil;
import org.crosswire.jsword.book.Book;
import org.crosswire.jsword.book.BookData;
import org.crosswire.jsword.book.BookException;
import org.crosswire.jsword.book.Books;
import org.crosswire.jsword.passage.Key;
import org.crosswire.jsword.passage.Passage;
import org.crosswire.jsword.passage.RestrictionType;
import org.xml.sax.SAXException;

/**
 * All the methods in this class highlight some are of the API and how to use it.
 *
 * @see gnu.lgpl.License for license details.
 *      The copyright to this program is held by it's authors.
 * @author Joe Walker [joe at eireneh dot com]
 */
public class SnS
{
    /**
     * The name of a Bible to find
     */
    private static final String BIBLE_NAME = "KJV"; //$NON-NLS-1$

    /**
     * An example of how to do a search and then get text for each range of verses.
     * @throws BookException
     * @throws SAXException
     *
     * @param kwds keyword to search                         JSL
     * @return search results is returned in String format   JSL
     *
     */
    public String searchAndShow(String kwds) throws BookException, SAXException
    {
        Book bible = Books.installed().getBook(BIBLE_NAME);

        Key key = bible.find(kwds); //$NON-NLS-1$

        // Here is an example of how to iterate over the ranges and get the text for each
        // The key's iterator would have iterated over verses.

        // The following shows how to use a stylesheet of your own choosing
        String path = "xsl/cswing/simple.xsl"; //$NON-NLS-1$
        URL xslurl = ResourceUtil.getResource(path);

        Iterator rangeIter = ((Passage) key).rangeIterator(RestrictionType.CHAPTER); // Make ranges break on chapter boundaries.
        //
        // prepare for result
        //      using a StringBuilder to hold all search results
        //              JSL
        //
        StringBuilder result = new StringBuilder();
        while (rangeIter.hasNext())
        {
            Key range = (Key) rangeIter.next();
            BookData data = new BookData(bible, range);
            SAXEventProvider osissep = data.getSAXEventProvider();
            SAXEventProvider htmlsep = new TransformingSAXEventProvider(NetUtil.toURI(xslurl), osissep);
            String text = XMLUtil.writeToString(htmlsep);
            result.append(text);
        }
        return result.toString();           // search results --- JSL
    }

    public static void main(String[] args) throws BookException, SAXException
    {
        SnS examples = new SnS();
        System.out.println(examples.searchAndShow("+what +wilt +thou"));
    }

}


As we mentioned before the classPathHacker is used for adding JAR files at runtime. Let cphacker.py be the file where classPathHacker is defined. Here is our Jython codes, where the file directory (dir) for storing all jar files is passed by the command line paremeter sys.argv. The main steps in this Jython codes are: add (jarLoad.addFile()) JAR files one by one at runtime, import the SnS class, create an SnS object (example), and let the object complete its task (searchAndShow).

import sys, glob
from cphacker import *

#
# preparation for Jar files (classpath) first
#
#   we need to do this prior to import the SnS class
#       this is because SnS import lots of classes in
#           those jar files
#       it does not work if you move this section to the main
#
jarLoad = classPathHacker()
dir = sys.argv[1]
jars = dir + "/*.jar"
jarfiles = glob.glob(jars)
for jar in jarfiles:
    jarLoad.addFile(jar)
jarLoad.addFile('.')

import SnS

if __name__ == "__main__":
    example = SnS()
    print example.searchAndShow("+what +wilt +thou").encode("utf-8")



There is one thing we need to address before closing this post: the
classPathHacker described in Jython book does not work for Jython 2.5.2. A slightly modified version can be obtained from glasblog, which works fine under

Using Java Within Python Applications -- Part I

There are two common approaches for using Java within Python applications. One is to construct a bridge between the Java virtual machine (JVM) and the Python interpreter, where Java and Python codes run separately. JPype, Py4J, JCC and Pyjnius are some examples in this category. The other, as Jython did, is to run Python within a JVM. Jython is described as:
an implementation of the Python programming language which is
designed to run on the Java(tm) Platform.
It is perceived that Jython is easier to install and to use compared with other options. The following examples give a brief tour of Jython. In our first example, we will see some Java objects (java.lang.Math) being used within Jython.

>>> from java.lang import Math
>>> Math.max(317, 220)
317L
>>> Math.pow(2, 4)
16.0

In our second example, we will create a Java object (Person) and use it within a Jython application.

Definition of the Person object: Person.java

public class Person {
    private String firstName;
    private String lastName;
    private int age;
    public Person(String firstName, String lastName, int age){
        this.firstName = firstName;
        this.lastName = lastName;
        this.age = age;
    }
    public String getFirstName() {
        return firstName;
    }
    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }
    public int getAge() {
        return age;
    }
    public void setAge(int age) {
        this.age = age;
    }
}

Using Person.java in Jython (we need to compile the java code first):

>>> import Person
>>> john = Person("john", "dole", 27)
>>> john.getFirstName()
u'john'
>>> john.getAge()
27
>>> john.setFirstName("alias")
>>> john.getFirstName()
u'alias'

Our first two examples demonstrate that there is no distinction in using Java or Python objects within Jython: there is no need to start up JVM for using any Java objects, since Jython did that for us.

It is possible to extend (subclass) Java classes via Jython classes. Our third example, taken from the Jython book, show this.

The Java code that defines two methods: Calculator.java

/**
* Java calculator class that contains two simple methods
*/
public class Calculator {
    public Calculator(){
    }
    public double calculateTip(double cost, double tipPercentage){
        return cost * tipPercentage;
    }
    public double calculateTax(double cost, double taxPercentage){
        return cost * taxPercentage;
    }
}


The Python code (with minor corrections) to extend the Java class: JythonCalc.py

import Calculator
from java.lang import Math
class JythonCalc(Calculator):
    def __init__(self):
        pass
    def calculateTotal(self, cost, tip, tax):
        return cost + self.calculateTip(cost, tip) + self.calculateTax(cost, tax)
if __name__ == "__main__":
    calc = JythonCalc()
    cost = 23.75
    tip = .15
    tax = .07
    print "Starting Cost: ", cost
    print "Tip Percentage: ", tip
    print "Tax Percentage: ", tax
    print Math.round(calc.calculateTotal(cost, tip, tax))


The result will be:

Starting Cost:  23.75
Tip Percentage:  0.15
Tax Percentage:  0.07
29


One question remains: Is Jython the same language as Python? A short answer is Yes. In fact, there are differences. The main one is that the current version of Jython (v2.5) cannot use CPython extension modules written in C. If one wants to use such a module, one should look for an equivalent written in pure Java or Python. However, it is claimed that future release will eliminate this difference.

Tuesday, October 8, 2013

An Introduction to Fabric

Fabric is described as a simple, Pythonic tool for application deployment or systems administration tasks. The fab tool simply imports your fabfile (fabfile.py, by default) and executes the function or functions at your disposal. There is no magic about fabric – anything we can do in a Python script can be done in a fabfile. Here is a small but complete fabfile that defines one task:

def hello(name="World"):
    print("Hello %s!" % name)



Calling fab hello will display the familiar message "Hello World!" However, we can personalize it by issuing:

$fab hello:name=Jay

The message shown will be: "Hello Jay!"

Below is another small fabfile that allows us to find kernel release as well as machine hardware of a remote system by using SSH:

from fabric.api import run

def host_type():
    run("uname -srm")


Since there is no remote host defined in the fabfile, we need to specify it (hostname) using option -H:

$fab -H hostname host_type

Fabric provides many command-execution functions, the following five are frequently used:
  • run(command) -- Run a shell command on a remote host.
  • sudo(comand) -- Run (with superuser privileges) a shell command on a remote host.
  • local(command) -- Run a command on the local host.
  • get(remote_path, local_path) -- Download one or more files from a remote host.
  • put(local_path, remote_path) -- Upload one or more files to a remote host.
Fabric also includes context managers for use with the with statement, the following three are commonly used:
  • settings(*args, **kwargs) -- Nest context managers and/or override env variables.
  • lcd(path) -- Update local current working directory.
  • cd(path) -- Update remote current working directory. 
The third sample fabfile allows us to deploy a Django project to three production servers (serv1, serv2 and serv3). In fact, it can be extended to deploy various applications. The deployment consists of the following steps:
  1. testing the project -- python manage.py test apps
  2. packing the project  -- tar czf /tmp/project.tgz .
  3. moving the packed file to server -- put("/tmp/project.tgz", "/path/to/serv/tmp/project.tgz") 
  4. unpacking the file -- run("tar xzf /path/to/serv/tmp/project.tgz"), and 
  5. deploying the project on server -- run("touch manage.py") 
To use it, we issue:
$fab install



from __future__ import with_statement
from fabric.api import *
from fabric.contrib.console import confirm

env.hosts = ['serv1', 'serv2', 'serv3']
env.user= "admin"

def test():
    """
    Run test;  if it fails prompt the user for action.
    """
    src_dir = "/path/to/local/src/directory"
    with settings(warn_only=True), lcd(src_dir):
        result = local("python manage.py test apps", capture=True)
    if result.failed and not confirm("Tests failed. Continue anyway?"):
        abort("Aborting at user request.")

def pack_move():
    """
    Archive our current code and upload to servers.
    """
    src_dir = "/path/to/local/src/directory"   
    with settings(warn_only=True), lcd(src_dir):
        local( "tar czf /tmp/project.tgz .")
    put( "/tmp/project.tgz", "/path/to/serv/tmp/project.tgz" )
    local( "rm /tmp/project.tgz" )

def install():
    """
    Deploy the project on servers.
    """
    test()
    pack_move()
    dst_dir = "/path/to/serv/dst/directory"
    with settings(hide("warnings"), warn_only=True):
        if run("test -d %s" % dst_dir).failed:
            run("mkdir -p %s" % dst_dir)   
    with cd(dst_dir):
        run("tar xzf /path/to/serv/tmp/project.tgz")
        run("touch manage.py")
    run("rm -f /path/to/serv/tmp/project.tgz")

      


Wednesday, October 2, 2013

Git -- Cheat Sheet

Here is a cheat sheet for git.

1. Create
git init                  # create a local repository
git clone <url>    # clone a repository from url

2. Commit
git commit -m "commit message"

3. Browse
git log                  # history of change
git status             # files changed in working directory
git diff                 # diff between working directory and the index
git diff HEAD       # diff between working directory and the most recent commit
git diff --cached   # diff between the index and the most recent commit
git show <object>   # show object
gitk                             # git repository (GUI) browser

4. Stage
git add <file>                  # add file to the index
git reset HEAD <file>    # unstage the staged file

5. Undo
git commit -a --amend          # fix the last commit
git reset --hard <commit>  # discard any changes and reset to the commit
git revert HEAD                      # revert the last commit
git revert <commit>             # revert the specific commit
git checkout -- <file>       # unmodify the modified file

6. Branch
git branch <new_branch>  # create a branch named new_branch based on HEAD
git branch -d <old_branch> # delete the branch named old_branch
git checkout <branch>           # switch to the branch
git checkout -b <branch>    # create a new branch then switch to it
git merge <branch>                # merge the specified branch into HEAD

7. Update
git fetch               # download latest changes from origin
git pull                 # fetch from and integrate (merge) with origin

8. Publish
git push                 # update origin

Monday, September 30, 2013

Git -- basics

Git is a distributed version control system (DVCS) designed to handle things from small to very large projects. DVCS features many advantages over the traditional centralized VCS, because users have the entire history of the project on their local disks (repository). Two of them are:
  • It allows users to work productively without network connection, and
  • it makes most operations much faster, since most operations are local.
Git employs snapshots instead of file diffs to track files. Every time we do a commit, it basically takes a picture of working directory at that moment and stores that snapshot. In addition to the working directory, Git has two main data structures:
  • index (also called staging area or cache) -- a mutable file that caches information about the working directory, and 
  • object database -- an immutable, append-only storage that stores files and meta-data for our project.
The object database contains four types of objects:
  • blob (binary large object) -- each file that we add to the repository is turned into a blob object.
  • tree -- each directory is turned into a tree object.
  • commit -- a commit is a snapshot of the working directory at a point in time.
  • tag -- a container that contains reference to another object.
Each object is identified by a SHA-1 hash of its contents. In general, git stores each object in a directory matching the first two characters of its hash; the file name used is the rest of the hash for that object. The command git cat-file or git show allows us to view content of objects.

The index (staging area) serves as a bridge between the working directory and the object database. Information stored in the index will go into the object database at our next commit. The following figure shows the normal work flow among them.

 Working        Index             Object
directory                             database      

     <--------checkout-----------|
     |-----add---->
                           |---commit--->


Each file in our working directory can be in one of two states: tracked or untracked. Tracked files are those that are in the last snapshot or in the index (staging area); they can be further classified as unmodified, modified, or staged. A staged file is one that has been modified/created and added to the index. Untracked files are everything else. The lifecycle of files in working directory is illustrated in the following figure:

 untracked         unmodified        modified        staged
                      
     |----------------------add------------------------->                  
     <---------------------reset------------------------|

                                  |-----edit------> 
                                                           |------add----->                                  
                                 <----------commit-------------|


The command git ls-files lists tracked files, whereas git ls-file -o (option o) lists untracked files.