Jay's journal: 2013

Monday, December 16, 2013

Learning PyLucene by Example

Apache Lucene is a a full text search framework built in Java. It has many appealing features such as high-performance indexing, powerful and efficient search algorithms, and cross-platform solution. PyLucene is a Python extension for using (Java) Lucene. Its goal is to allow users to use Lucene for text indexing and searching within Python. PyLucene is not a port but a Python wrapper around Java Lucene, which embeds Lucene running in a JVM into a Python process.

This is a quick guide on PyLucene. We show code snippets for full-text searching on bible versers, which are stored in a dictionary data structure. There are three steps in buliding the index: create an index, fill the index and close resources. In the first step, we choose StandardAnalyzer as the analyzer, SimpleFSDirectory as (file) storage scheme for our IndexWriter. In the second step, each verse (document) is labelled with five fields that serve as index for future search. The text of each verse is labelled as "Text." Our search will be primarily on this field. However, the text of each verse is for indexing only but not stored (Field.Store.NO) in the index, since all verses are already stored in our main data store (bible dictionary). Label "Testament" allows us to distinguish if the verse is in Old or New testament. The last three fields: book, chapter, verse are keys to the data store that allow us to retrieve the text of the specified verse. Once the index is built, we close all the resources in the last step.

The snippet for building the index is as follows:

def make_index():
    '''Make index from data source -- bible
    Some global variables used:
        bible: a dictionary that stores all bible verses
        OTbooks: a list of books in old testament
        NTbooks: a list of books in new testament
        chapsInBook: a list of number of chapters in each book
    '''
    lucene.initVM()
    path = raw_input("Path for index: ")
    # 1. create an index
    index_path = File(path)
    analyzer = StandardAnalyzer(Version.LUCENE_35)
    index = SimpleFSDirectory(index_path)
    config = IndexWriterConfig(Version.LUCENE_35, analyzer)
    writer = IndexWriter(index, config)

    # 2 construct documents and fill the index
    for book in bible.keys():
        if book in OTbooks:
            testament = "Old"
        else:
            testament = "New"
        for chapter in xrange(1, chapsInBook[book]+1):
            for verse in xrange(1, len(bible[book][chapter])+1):
                verse_text = bible[book][chapter][verse]
                doc = Document()
                doc.add(Field("Text", verse_text, Field.Store.NO, Field.Index.ANALYZED))
                doc.add(Field("Testament", testament, Field.Store.YES, Field.Index.ANALYZED))
                doc.add(Field("Book", book, Field.Store.YES, Field.Index.ANALYZED))
                doc.add(Field("Chapter", str(chapter), Field.Store.YES, Field.Index.ANALYZED))
                doc.add(Field("Verse", str(verse), Field.Store.YES, Field.Index.ANALYZED))
                writer.addDocument(doc)

    # 3. close resources
    writer.close()
    index.close()

There are five steps in our simple search: open the index, parse the query string, search the index, display results and close resources. In the first step, we use IndexReader to open the index built before for our simple search. The query string (kwds) is parsed by the QueryParser in the second step. The search job is done in step three by IndexSearcher, and the results are stored in the object hits. In step four, we get (getField) the book, chapter, and verse fields from the documents returned in the previous step, which allow us to retrieve the bible verses from the data store and display them. Finally, we close all resources in step five.

The snippet for searching and displaying results is as follows:

def search(indexDir, kwds):
    '''Simple Search
    Input paramenters:
        1. indexDir: directory name of the index
        2. kwds: query string for this simple search
    display_verse(): procedure to display the specified bible verse
    '''
    lucene.initVM()
    # 1. open the index
    analyzer = StandardAnalyzer(Version.LUCENE_35)
    index = SimpleFSDirectory(File(indexDir)
    reader = IndexReader.open(index)
    n_docs = reader.numDocs()

    # 2. parse the query string
    queryparser = QueryParser(Version.LUCENE_35, "Text", analyzer)
    query = queryparser.parse(kwds)

    # 3. search the index
    searcher = IndexSearcher(reader)
    hits = searcher.search(query, n_docs).scoreDocs

    # 4. display results
    for i, hit in enumerate(hits):
        doc = searcher.doc(hit.doc)
        book = doc.getField('Book').stringValue()
        chapter = doc.getField('Chapter').stringValue()
        verse = doc.getField('Verse').stringValue()
        display_verse(book, int(chapter), int(verse))

    # 5. close resources
    searcher.close()

Wednesday, December 11, 2013

Using Java Within Python Applications -- Part II

In the second part, we exam the question: how to add JAR files to the sys.path at runtime? One's first thought is that setting CLASSPATH (an environment variable) should resolve the issue. Why would one bother to add JAR files at runtime? This is the one of the topics in the Jython book entitled "working with CLASSPATH." It provides two good reasons along with a solution for this question.

The first reason is to make end users' life easier for not to know anything about environment variables. The second and more compelling reason is "when there is no normal user account to provide environment variables." The issue "add JAR files to the sys.path at runtime" in Jython is similar to "load classes (JAR files) at runtime" in the Java world. Fortunately, solution exists for the Java case. The classPathHacker (Listing B-11) presented in Jython book is a translation of that solution from the Java world.

In the second part of this blog we will go through a more practical example to finish our tour on using Java within Python applications. The Java example (SnS.java) selected here is modified from the APIExamples.java that comes with the JSword.

The only task provided by SnS class is defined in the searchAndShow method, which takes a string (kwds) as key words, searches (bible.find()) through all books in the bible, and returns (html formatted) findings in a string (result.toString()). There are over a dozen jar files in the distribution of JSword package. One may start JSword application with a shell script (BiblDesktop.sh) that sets environment variables (including CLASSPATH) properly then initiates the GUI application.

In this example, we use Jython to add JAR files at runtime, to call SnS, and to print search results in text form. It is obvious that we have full knowledge of which JAR files are needed before starting the application, the example here is for educational purpose.

Here is the SnS Java class, which requires a few JAR files:

import java.net.URL;
import java.util.Iterator;
import java.util.List;
import java.util.Map;

import org.crosswire.common.util.NetUtil;
import org.crosswire.common.util.ResourceUtil;
import org.crosswire.common.xml.SAXEventProvider;
import org.crosswire.common.xml.TransformingSAXEventProvider;
import org.crosswire.common.xml.XMLUtil;
import org.crosswire.jsword.book.Book;
import org.crosswire.jsword.book.BookData;
import org.crosswire.jsword.book.BookException;
import org.crosswire.jsword.book.Books;
import org.crosswire.jsword.passage.Key;
import org.crosswire.jsword.passage.Passage;
import org.crosswire.jsword.passage.RestrictionType;
import org.xml.sax.SAXException;

/**
* All the methods in this class highlight some are of the API and how to use it.
*
* @see gnu.lgpl.License for license details.
*      The copyright to this program is held by it's authors.
* @author Joe Walker [joe at eireneh dot com]
*/
public class SnS
{
    /**
     * The name of a Bible to find
     */
    private static final String BIBLE_NAME = "KJV"; //$NON-NLS-1$

    /**
     * An example of how to do a search and then get text for each range of verses.
     * @throws BookException
     * @throws SAXException
     *
     * @param kwds keyword to search                         JSL
     * @return search results is returned in String format   JSL
     *
     */
    public String searchAndShow(String kwds) throws BookException, SAXException
    {
        Book bible = Books.installed().getBook(BIBLE_NAME);

        Key key = bible.find(kwds); //$NON-NLS-1$

        // Here is an example of how to iterate over the ranges and get the text for each
        // The key's iterator would have iterated over verses.

        // The following shows how to use a stylesheet of your own choosing
        String path = "xsl/cswing/simple.xsl"; //$NON-NLS-1$
        URL xslurl = ResourceUtil.getResource(path);

        Iterator rangeIter = ((Passage) key).rangeIterator(RestrictionType.CHAPTER); // Make ranges break on chapter boundaries.
        //
        // prepare for result
        //      using a StringBuilder to hold all search results
        //              JSL
        //
        StringBuilder result = new StringBuilder();
        while (rangeIter.hasNext())
        {
            Key range = (Key) rangeIter.next();
            BookData data = new BookData(bible, range);
            SAXEventProvider osissep = data.getSAXEventProvider();
            SAXEventProvider htmlsep = new TransformingSAXEventProvider(NetUtil.toURI(xslurl), osissep);
            String text = XMLUtil.writeToString(htmlsep);
            result.append(text);
        }
        return result.toString();           // search results --- JSL
    }

    public static void main(String[] args) throws BookException, SAXException
    {
        SnS examples = new SnS();
        System.out.println(examples.searchAndShow("+what +wilt +thou"));
    }

}

As we mentioned before the classPathHacker is used for adding JAR files at runtime. Let cphacker.py be the file where classPathHacker is defined. Here is our Jython codes, where the file directory (dir) for storing all jar files is passed by the command line paremeter sys.argv. The main steps in this Jython codes are: add (jarLoad.addFile()) JAR files one by one at runtime, import the SnS class, create an SnS object (example), and let the object complete its task (searchAndShow).

import sys, glob
from cphacker import *

#
# preparation for Jar files (classpath) first
#
#   we need to do this prior to import the SnS class
#       this is because SnS import lots of classes in
#           those jar files
#       it does not work if you move this section to the main
#
jarLoad = classPathHacker()
dir = sys.argv[1]
jars = dir + "/*.jar"
jarfiles = glob.glob(jars)
for jar in jarfiles:
    jarLoad.addFile(jar)
jarLoad.addFile('.')

import SnS

if __name__ == "__main__":
    example = SnS()
    print example.searchAndShow("+what +wilt +thou").encode("utf-8")

There is one thing we need to address before closing this post: the
classPathHacker described in Jython book does not work for Jython 2.5.2. A slightly modified version can be obtained from glasblog, which works fine under

Using Java Within Python Applications -- Part I

There are two common approaches for using Java within Python applications. One is to construct a bridge between the Java virtual machine (JVM) and the Python interpreter, where Java and Python codes run separately. JPype, Py4J, JCC and Pyjnius are some examples in this category. The other, as Jython did, is to run Python within a JVM. Jython is described as:

an implementation of the Python programming language which is
designed to run on the Java(tm) Platform.

It is perceived that Jython is easier to install and to use compared with other options. The following examples give a brief tour of Jython. In our first example, we will see some Java objects (java.lang.Math) being used within Jython.

>>> from java.lang import Math
>>> Math.max(317, 220)
317L
>>> Math.pow(2, 4)
16.0

In our second example, we will create a Java object (Person) and use it within a Jython application.

Definition of the Person object: Person.java

public class Person {
    private String firstName;
    private String lastName;
    private int age;
    public Person(String firstName, String lastName, int age){
        this.firstName = firstName;
        this.lastName = lastName;
        this.age = age;
    }
    public String getFirstName() {
        return firstName;
    }
    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }
    public int getAge() {
        return age;
    }
    public void setAge(int age) {
        this.age = age;
    }
}

Using Person.java in Jython (we need to compile the java code first):

>>> import Person
>>> john = Person("john", "dole", 27)
>>> john.getFirstName()
u'john'
>>> john.getAge()
27
>>> john.setFirstName("alias")
>>> john.getFirstName()
u'alias'

Our first two examples demonstrate that there is no distinction in using Java or Python objects within Jython: there is no need to start up JVM for using any Java objects, since Jython did that for us.

It is possible to extend (subclass) Java classes via Jython classes. Our third example, taken from the Jython book, show this.

The Java code that defines two methods: Calculator.java

/**
* Java calculator class that contains two simple methods
*/
public class Calculator {
    public Calculator(){
    }
    public double calculateTip(double cost, double tipPercentage){
        return cost * tipPercentage;
    }
    public double calculateTax(double cost, double taxPercentage){
        return cost * taxPercentage;
    }
}

The Python code (with minor corrections) to extend the Java class: JythonCalc.py

import Calculator
from java.lang import Math
class JythonCalc(Calculator):
    def __init__(self):
        pass
    def calculateTotal(self, cost, tip, tax):
        return cost + self.calculateTip(cost, tip) + self.calculateTax(cost, tax)
if __name__ == "__main__":
    calc = JythonCalc()
    cost = 23.75
    tip = .15
    tax = .07
    print "Starting Cost: ", cost
    print "Tip Percentage: ", tip
    print "Tax Percentage: ", tax
    print Math.round(calc.calculateTotal(cost, tip, tax))

The result will be:

Starting Cost: 23.75
Tip Percentage: 0.15
Tax Percentage: 0.07
29

One question remains: Is Jython the same language as Python? A short answer is Yes. In fact, there are differences. The main one is that the current version of Jython (v2.5) cannot use CPython extension modules written in C. If one wants to use such a module, one should look for an equivalent written in pure Java or Python. However, it is claimed that future release will eliminate this difference.

Tuesday, October 8, 2013

An Introduction to Fabric

Fabric is described as a simple, Pythonic tool for application deployment or systems administration tasks. The fab tool simply imports your fabfile (fabfile.py, by default) and executes the function or functions at your disposal. There is no magic about fabric – anything we can do in a Python script can be done in a fabfile. Here is a small but complete fabfile that defines one task:

def hello(name="World"):
print("Hello %s!" % name)

Calling fab hello will display the familiar message "Hello World!" However, we can personalize it by issuing:

$fab hello:name=Jay

The message shown will be: "Hello Jay!"

Below is another small fabfile that allows us to find kernel release as well as machine hardware of a remote system by using SSH:

from fabric.api import run

def host_type():
run("uname -srm")

Since there is no remote host defined in the fabfile, we need to specify it (hostname) using option -H:

$fab -H hostname host_type

Fabric provides many command-execution functions, the following five are frequently used:

run(command) -- Run a shell command on a remote host.
sudo(comand) -- Run (with superuser privileges) a shell command on a remote host.
local(command) -- Run a command on the local host.
get(remote_path, local_path) -- Download one or more files from a remote host.
put(local_path, remote_path) -- Upload one or more files to a remote host.

Fabric also includes context managers for use with the with statement, the following three are commonly used:

settings(*args, **kwargs) -- Nest context managers and/or override env variables.
lcd(path) -- Update local current working directory.
cd(path) -- Update remote current working directory.

The third sample fabfile allows us to deploy a Django project to three production servers (serv1, serv2 and serv3). In fact, it can be extended to deploy various applications. The deployment consists of the following steps:

testing the project -- python manage.py test apps
packing the project -- tar czf /tmp/project.tgz .
moving the packed file to server -- put("/tmp/project.tgz", "/path/to/serv/tmp/project.tgz")
unpacking the file -- run("tar xzf /path/to/serv/tmp/project.tgz"), and
deploying the project on server -- run("touch manage.py")

To use it, we issue:
$fab install

from __future__ import with_statement
from fabric.api import *
from fabric.contrib.console import confirm

env.hosts = ['serv1', 'serv2', 'serv3']
env.user= "admin"

def test():
    """
    Run test; if it fails prompt the user for action.
    """
    src_dir = "/path/to/local/src/directory"
    with settings(warn_only=True), lcd(src_dir):
        result = local("python manage.py test apps", capture=True)
    if result.failed and not confirm("Tests failed. Continue anyway?"):
        abort("Aborting at user request.")

def pack_move():
    """
    Archive our current code and upload to servers.
    """
    src_dir = "/path/to/local/src/directory"
    with settings(warn_only=True), lcd(src_dir):
        local( "tar czf /tmp/project.tgz .")
    put( "/tmp/project.tgz", "/path/to/serv/tmp/project.tgz" )
    local( "rm /tmp/project.tgz" )

def install():
    """
    Deploy the project on servers.
    """
    test()
    pack_move()
    dst_dir = "/path/to/serv/dst/directory"
    with settings(hide("warnings"), warn_only=True):
        if run("test -d %s" % dst_dir).failed:
            run("mkdir -p %s" % dst_dir)
    with cd(dst_dir):
        run("tar xzf /path/to/serv/tmp/project.tgz")
        run("touch manage.py")
    run("rm -f /path/to/serv/tmp/project.tgz")

Wednesday, October 2, 2013

Git -- Cheat Sheet

Here is a cheat sheet for git.

1. Create
git init                  # create a local repository
git clone <url>    # clone a repository from url

2. Commit
git commit -m "commit message"

3. Browse
git log                  # history of change
git status             # files changed in working directory
git diff                 # diff between working directory and the index
git diff HEAD       # diff between working directory and the most recent commit
git diff --cached   # diff between the index and the most recent commit
git show <object>   # show object
gitk                             # git repository (GUI) browser

4. Stage
git add <file>                  # add file to the index
git reset HEAD <file>    # unstage the staged file

5. Undo
git commit -a --amend          # fix the last commit
git reset --hard <commit> # discard any changes and reset to the commit
git revert HEAD                    # revert the last commit
git revert <commit>             # revert the specific commit
git checkout -- <file>       # unmodify the modified file

6. Branch
git branch <new_branch> # create a branch named new_branch based on HEAD
git branch -d <old_branch> # delete the branch named old_branch
git checkout <branch>           # switch to the branch
git checkout -b <branch>    # create a new branch then switch to it
git merge <branch>                # merge the specified branch into HEAD

7. Update
git fetch               # download latest changes from origin
git pull                 # fetch from and integrate (merge) with origin

8. Publish
git push                 # update origin

Monday, September 30, 2013

Git -- basics

Git is a distributed version control system (DVCS) designed to handle things from small to very large projects. DVCS features many advantages over the traditional centralized VCS, because users have the entire history of the project on their local disks (repository). Two of them are:

It allows users to work productively without network connection, and
it makes most operations much faster, since most operations are local.

Git employs snapshots instead of file diffs to track files. Every time we do a commit, it basically takes a picture of working directory at that moment and stores that snapshot. In addition to the working directory, Git has two main data structures:

index (also called staging area or cache) -- a mutable file that caches information about the working directory, and
object database -- an immutable, append-only storage that stores files and meta-data for our project.

The object database contains four types of objects:

blob (binary large object) -- each file that we add to the repository is turned into a blob object.
tree -- each directory is turned into a tree object.
commit -- a commit is a snapshot of the working directory at a point in time.
tag -- a container that contains reference to another object.

Each object is identified by a SHA-1 hash of its contents. In general, git stores each object in a directory matching the first two characters of its hash; the file name used is the rest of the hash for that object. The command git cat-file or git show allows us to view content of objects.

The index (staging area) serves as a bridge between the working directory and the object database. Information stored in the index will go into the object database at our next commit. The following figure shows the normal work flow among them.

Working        Index           Object
directory                       database

     <--------checkout-----------|
     |-----add---->
                           |---commit--->

Each file in our working directory can be in one of two states: tracked or untracked. Tracked files are those that are in the last snapshot or in the index (staging area); they can be further classified as unmodified, modified, or staged. A staged file is one that has been modified/created and added to the index. Untracked files are everything else. The lifecycle of files in working directory is illustrated in the following figure:

untracked         unmodified        modified        staged

     |----------------------add------------------------->
     <---------------------reset------------------------|
                                  |-----edit------>
                                                           |------add----->
                                 <----------commit-------------|

The command git ls-files lists tracked files, whereas git ls-file -o (option o) lists untracked files.

Saturday, September 21, 2013

SELINUX -- Bits and Pieces

Here are some bits and pieces on SELINUX:

How to view the current SELinux status?
$sestatus

Where is main configuration file?
/etc/selinux/config

How to set booleans?
$setsebool -P httpd_read_user_content 1
or,
$semanage boolean -m --on httpd_read_user_content

How to list booleans?
$getsebool httpd_read_user_content
or,
$semanage boolean -l |grep httpd_read_user_content

How to allow the Apache HTTP server to provide service on port 9876?
$semanage port -a -t http_port_t -p tcp 9876

How to allow the Apache HTTP server to connect to your database server?
$semanage boolean -m --on httpd_can_network_connect_db

How to allow the Apache HTTP server to send mail?
$semanage boolean -m --on httpd_can_sendmail

How to execute multiple commands within a single transaction?
$semanage -i command-file

How to change the security context (temporarily) on a file/directory?
$chcon -t my_type_t /path/to/file                  # on single file
$chcon -R -t my_type_t /path/to/directory # recursively on directory

How to change the security context (persistently) on a file/directory?
$semanage fcontext -a -t my_type_t /path/to/file
# this will add the specified rule to the local context file, then label it
$restorecon -v /path/to/myfile

How to check/correct the security context on filesystems?
$fixfiles -v check /path/to/file_or_directory       # check only
$fixfiles -v restore /path/to/file_or_directory   # restore/correct

How to restore default security contexts of a directory tree?
$restorecon -Rv /path/to/the/directory

How to relabel complete filesystem?
$touch /.autorelabel                                    # using init
$reboot
or,
$fixfiles restore                                          # using fixfiles

How to preserve file security contextx when copying?
$cp --preserve=context /path/to/src /path/to/dst

How to change file security contextx when copying?
$install --context=new_context /path/to/src /path/to/dst

How to create archives that retain security contexts?
$tar --selinux -cvzf archive.tgz /path/to/directory       # create archive
$tar --selinux -xvzf archive.tgz                            # extract files from archive
# star should be used, if option selinux is not supported in tar

How to mount a device with a specific security context?
$mount -o context=SELinux_user:role:type:level device dir

How to start SELINUX troubleshooting tool?
$sealert -b

Where is log file?
/var/log/audit/audit.log            #audit on
or,
/var/log/messages                          #audit off

How to add new rules regarding xxxx to policy?
$grep xxxx /var/log/audit/audit.log | audit2allow -M xxxxlocal
$semodule -i xxxxlocal.pp

Hot to start the SELinux management GUI tool?
$system-config-selinux
# we need to install package policycoreutils-gui first

Saturday, September 14, 2013

SELINUX -- Concepts

Security-Enhanced Linux (SELinux) is an implementation of a mandatory access control (MAC) mechanism in the Linux kernel, which further enforces MAC after traditional discretionary access controls (DAC) are checked.

Processes and files are labeled with an SELinux context, which includes an SELinux user, role, type, and level. Within SELinux, all of this information are used to form the access control decisions. For performance reason, SELinux decisions are cached, and the cache is named the Access Vector Cache (AVC). In Fedora, SELinux provides a combination of Role-Based Access Control (RBAC), Type Enforcement (TE), and Multi-Level Security (MLS).

The command sestatus allows us to get the status of a system running SELinux. Here is an example output of command sestatus:

$setstatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28

The output shows that the SELinux is enabled and is currently running in the enforcing mode the same as current configuration. It also tells us that the configuration root directory is /etc/selinux and a targeted policy is used for MAC. The status could be enabled, disabled, or permissive, where permissive means that SELinux should print warnings instead of enforcing. The command setenforce allows us to modify the running mode of SELinux. Changes should be made in the configuration file /etc/selinux/config if we want it be persistent.

All processes and files are labeled with a type (part of SELinux context). The option -Z allows us to find the type (security context) of a file/process. For examples:

$ls -Z /etc/shadow
----------. root root system_u:object_r:shadow_t:s0    /etc/shadow

It shows that the security context (user, role, type, level) of the file /etc/shadow is system_u:object_r:shadow_t:s0.

$ps -eZ|grep passwd
unconfined_u:unconfined_r:passwd_t:s0-s0:c0.c1023 2630 pts/0 00:00:00 passwd

It tells that the security context of the process passwd run by a regular user is unconfined_u:unconfined_r:passwd_t:s0-s0:c0.c1023.

The type of a process defines its domain. Processes are separated from each other by running in their own domains. The SELinux policy defines rules that determine how processes interact with files, and how processes interact with each other. Only what is specifically allowed by the rules is permitted. By default, every operation is denied and audited. The audited log will be saved in file /var/log/audit/audit.log or /var/log/messages depending on whether audit daemon (auditd) is running or not.

The command sesearch allows us to search the rules in a SELinux policy. The rules will be displayed in the following format:

allow <src_domain> <dst_type> : <class> { permission [ permission [ ... ] ] } ;

To verify that the process passwd is allowed to access the shadow password file /etc/shadow, we may issue:

$sesearch -s passwd_t -t shadow_t -c file -p write -A

and its output will be something similar to:

allow passwd_t shadow_t : file { ioctl read write create ... } ;

Wednesday, September 11, 2013

Python Decompiler

Need a decompiler for your python byte-code?

Here is one: uncompyle2, which works for python 2.

Tuesday, September 10, 2013

Untangle Links

How to find the terminal pathname of a file/directory? For example, the file /usr/lib64/mozilla/plugins/libjavaplugin.so will be installed by IcedTea-web java plugin. It is a symbolic link to another link. The command namei enables us to untangle links by following each pathname until the endpoint is found.

Here is an example usage and its output:

%namei /usr/lib64/mozilla/plugins/libjavaplugin.so
f: /usr/lib64/mozilla/plugins/libjavaplugin.so
d /
d usr
d lib64
d mozilla
d plugins
l libjavaplugin.so -> /etc/alternatives/libjavaplugin.so.x86_64
   d /
   d etc
   d alternatives
   l libjavaplugin.so.x86_64 -> /usr/lib64/IcedTeaPlugin.so
     d /
     d usr
     d lib64
     - IcedTeaPlugin.so

which untangles /usr/lib64/mozilla/plugins/libjavaplugin.so by following links and gets its terminal pathname:
/usr/lib64/IcedTeaPlugin.so.

Monday, September 9, 2013

Practical Examples on RPM package management

Here are some practical examples on usages of rpm or yum for RPM based package management. A package name should be used as parameter in most examples, and it was omitted. A '?????' indicates that I have no clue on how to complete the task using rpm/yum. All comments are welcome if you know proper solution.

1. Install an RPM Package
rpm -ivh
yum install

2. Remove an RPM Package
rpm -evh
yum remove

3. Upgrade an RPM Package
rpm -Uvh
yum upgrade

4. List all installed RPM Packages
rpm -qa
yum list installed

5. List all (installed + available) RPM Packages
?????
yum list

6. Check dependencies of an RPM Package
rpm -qR
yum deplist

7. Query the information of Installed RPM Package
rpm -qi
yum info                           # both installed + available

8. List all files of an installed RPM package
rpm -ql
repoquery -ql                 # need to install yum-utils

9. Query a file that belongs which RPM Package
rpm -qf /path/to/file
yum whatprovides /path/to/file

10. Query the Information of RPM Package Before Installing
rpm -qip foo.rpm            # rpm package needs to be downloaded first
yum info

11. Search for a Package
?????
yum search

12. Find out which package provides some feature or file
rpm -qf /path/to/file       # for file only
yum whatprovides

13. List the package specific scriptlet
rpm -q --scripts
?????

Saturday, September 7, 2013

Manage Autostart Applications in GNOME

After login in GNOME a lot of applications can be automatically started to make life easier. System-wide autostart applications can be found in /etc/xdg/autostart and in /usr/share/gnome/autostart. Users have choices to edit a autostart application by disabling it, editing its name, command or description; additionally, users may add their own autostart applications.

Gnome provides the tool gnome-session-properties, which allows us to add, modify and remove autostart applications. Once we are done with gnome-session-properties, new entries (files) are generated and saved in directory ~/.config/autostart/.

Here is an example file skype.desktop created and saved after adding skype as a new autostart application:

[Desktop Entry]
Type=Application
Exec=/usr/bin/skype
X-GNOME-Autostart-enabled=true
Name=Skype

A new file named gnome-keyring-ssh.desktop is created (and saved in directory ~/.config/autostart) after removing the default autostart application SSH key agent. The contents in it are the same as that in file /etc/xdg/autostart/gnome-keyring-ssh.desktop, except in one line:

X-GNOME-Autostart-enabled=false

which specifies that the autostart is disabled.

Sunday, August 25, 2013

Automated Remote Backups with Rdiff-backup

One subtle feature of rdiff-backup is that it allows users to make remote backups over Internet using SSH, which makes remote backups very secure since data transferred is encrypted.

One problem is that SSH requires a password for logging, which is not convenient if we want to run rdiff-backup as a cron job. Here we show how to initiate rdiff-backups from a central backup server, and pull data from a farm of hosts to be backed up. For security reasons, the central server uses a non-root user account (rdiffbk) to perform backups, whereas root account is used on each host being backed up. Though root accounts are used on hosts being backed up, they are protected by SSH public-key authentication mechanism with forced-command-only option.

For convenience, I'll call the central backup server canine and three hosts to be backed up beagle, shepherd and terrier. For short, only works on canine and beagle will be shown.

Here is the procedure for backup server canine:

generate one passphrase-free SSH key pair for each host being backed up,
move corresponding ssh key to each host,
create SSH configuration file, and
create a cron job file

Step 1: generate one passphrase-free SSH key pair for each host being backed up

To generate RSA type pair for host beagle, we issue

ssh-keygen -t rsa -f id_beagle-backup

where private key will be saved in file id_beagle-backup and public key id_beagle-backup.pub.

Step 2: move corresponding ssh key to each host

To move id_beagle-backup.pub to host beagle, we may choose to use any preferred method (for example, ftp, sftp, or ssh-copy-id), since public key is not sensitive. Other hosts can be done similarly.

Step 3: create SSH configuration file

To define how to connect to host beagle with backup key, we place the following lines into file ~rdiffbk/.ssh/config. Other hosts need to be configured similarly.

host beagle-backup
    hostname beagle
    user root
    identifyfile ~rdiffbk/.ssh/id_beagle-backup
    protocol 2

Step 4: create a cron job file

The following cron job file automates the remote backups daily at 200am, 210am, and 220am, respectively.

0 2 * * * rdiff-backup beagle-backup::/remote_dir beagle/remote_dir
10 2 * * * rdiff-backup shepherd-backup::/remote_dir shepherd/remote_dir
20 2 * * * rdiff-backup terrier-backup::/remote_dir terrier/remote_dir

By default setting, rdiff-backup uses SSH to pipe remote data. Therefore, both SSH server and rdiff-backup are required in hosts to be backed up.
What left on host beagle and others (shepherd, terrier) is simply to give permission to canine to access it (through SSH) and run rdiff-backup. This can be done in the following two steps:

Step I: create an authorized-keys file for root account

To enable SSH public key authentication for root account, we need to create the file /root/.ssh/authorized_keys, which consists public key for user rdiffbk@canine, forced command and other options. The public key (id_beagle-backup.pub) should be available for beagle once we have done Step 2. A sample authorized_keys file is as follows:

command="rdiff-backup --server --restrict-read-only /",from="canine",no-port-forwarding,no-X11-forwarding,no-pty ssh-rsa AAAAB3.... rdiffbk@canine

Here, for security reason, rdiff-backup server is restricted to real only, and
we disable port-forward, X11-forward and pty options. See here for more details.

Step II: configure SSH server for root access

As we saw here, this can be done by put the following line in the SSH server configuration file (sshd_config):

PermitRootLogin forced-commands-only

Thursday, August 22, 2013

Tips on Forced Command for SSH

In general, an SSH connection invokes a remote command chosen by the client. There are times that server should decide which command the client will run. The forced command enables us to achieve this goal.

There are two ways to forced command. One is through public-key authentication configuration in the file authorized_keys as we saw here. The other is thought the usage of the keyword ForceCommand in sshd_config. To restrict users run nothing but the alpine command, we put the following line in sshd_config:

ForceCommand /usr/bin/alpine

The major difference between these two are: configuration though public-key authentication applies to one user, and each user may have her/his own option; configuration through ForceCommand keyword may be system-wide, keyword Match should be used combinedly if ForceCommand should apply to certain user(s).

What if we want the user to not only execute a single command, but few fixed commands at user's choice, such as:

show process list (ps aux),
print system information (uname -a),
show who is logged on (who), or
start rdiff-backup server (rdiff-backup --server --restrict-read-only /)

A simple wrapper script combined with the ForceCommand will suffice. Here is an example that allows user backup to invoke four different commands at user's choice.

With environment variable SSH_ORIGINAL_COMMAND,
the following script (wrapper.sh) wraps all permitted commands:

#!/bin/sh
# Script: /usr/local/bin/wrapper.sh

case "$SSH_ORIGINAL_COMMAND" in
    "ps")
        ps aux
        ;;
    "uname")
        uname -a
        ;;
    "who")
        who
        ;;
    "rdiff")
        rdiff-backup --server --restrict-read-only /
        ;;
    *)
        echo "Only the following commands are available to you:"
        echo "ps, uname, who and rdiff"
        exit 1
        ;;
esac

The configuration (sshd_config) of ForceCommand with Match (user backup) is as follows:

Match User backup
    ForceCommand /usr/local/bin/wrapper.sh

To show process list on ssh server, one issues:

ssh backup@server ps

where original command "ps" was passed to the wrapper script by environment variable SSH_ORIGINAL_COMMAND.

To backup directory tree /path_to_src on server to local directory /path_to_dst, one issues:

rdiff-backup --remote-schema "ssh -C %s rdiff" backup@server::/path_to_src /path_to_dst

Wednesday, August 21, 2013

Root Access Control for SSH

Sshd has a separate access control mechanism for the root (superuser). The keyword PermitRootLogin specifies its usage.

The argument (option) for PermitRootLogin must be "no", "yes'', "without-password'', or ``forced-commands-only''. If this option is set to "no'', root is not allowed to log in.

If this option is set to "without-password'', password authentication is disabled for root. However, root may login in with GSSAPIAuthentication, HostbasedAuthentication or PubkeyAuthentication, if they are set properly.

If this option is set to "forced-commands-only'', root login with public key authentication is allowed, but only if the command option is specified (which may be useful for remote backup as we saw in the example of public-key-based configuration). All other authentication methods are disabled in this setting.

Public-key-based Configuration for SSH server

Public key is one of the frequently used authentication methods in SSH. To set up public-key authentication for one's account on an SSH server, one creates an authentication file named authorized_keys (for OpenSSH), and lists key and options that provide access to one's account.

Each line (SSH protocol 2) in authorized_keys may contain:

An (optional) set of authorization options for the key.
A (required) key type string: ssh-dss for a DSA key, or ssh-rsa for an RSA key.
The (required) base64-encoded public key.
An (optional) descriptive comment.

The optional options consist of comma-separated option specifications, where no space is allowed, except within double quotes. Some common option specifications are:

command="command": Specifies that the command to be executed
from="pattern-list": Specifies the permitted client name or IP address
no-port-forwarding: Forbids TCP forwarding
no-X11-forwarding: Forbids X11 forwarding
no-pty: Prevents tty allocation

The following example file specifies that:
the command "rdiff-backup --server --restrict-read-only /" to be executed if client is from the machine named "beagle" where no port, X11 forwarding is allowed. Notice that all settings are in one line.

command="rdiff-backup --server --restrict-read-only /",from="beagle",no-port-forwarding,no-X11-forwarding ssh-rsa AAAAB3.... root@beagle

Saturday, August 17, 2013

Tips on SSH Client Configuration

OpenSSH client ssh obtains configuration data from sources in the following order:

command-line options,
user's configuration file (~/.ssh/config),
system-wide configuration file (/etc/ssh/ssh_config)

Since each parameter may be defined in different sources, order of parameter definition is important. The ssh manual page (ssh_config) says that:

For each parameter, the first obtained value will be used.

Here are some frequently used parameters:
Each configuration file contains sections separated by host specifications that applies to all matching hosts specified by the Host parameter. The host is the hostname argument given on the command line.

Hostname: Specifies the real host name to log into.

IdentityFile: Specifies a file from which the user's public key authentication identity is read.

Port: Specifies the port number to connect on the remote host (if it is not 22).

User: Specifies the user to log in as.

Here is an example configuration file (~/.ssh/config) for remote machine robert.some.net:

host bob
        hostname robert.some.net
        identityfile /somepath/.ssh/id_rsa_bob
        port 2222
        user root

With the above configuration file, once we issue:

ssh bob

which is equivalent to

ssh -i /somepath/.ssh/id_rsa_bob -p 2222 root@robert.some.net

Sunday, July 28, 2013

Rdiff-backup by Examples

Rdiff-backup is a python script that backs up one directory to another. Some features of rdiff-backup as claimed in its official site are: easy to use, creating mirror, keeping increments, and preserving all information. Here are some examples of its usage.

1. simple backing up (backup local directory foo to local directory bar):

rdiff-backup foo bar

2. simple remote backing up (backup local directory /some/local_dir to directory /whatever/remote_dir on machine hostname.net):

rdiff-backup /some/local_dir hostname.net:://whatever/remote_dir

Ssh will be used to open the necessary pipe for remote backing up.

3. simple restoring from previous backup (restore from bar/dir to foo/dir):

cp -a bar/dir foo/dir

4. simple restoring from the latest remote backup (restore from hostname.net:://whatever/remote_dir to local directory /some/local_dir):

rdiff-backup -r now hostname.net:://whatever/remote_dir /some/local_dir

5. restoring from a certain version of a remote backup (restore from backup done 15 days ago):

rdiff-backup -r 15D hostname.net:://whatever/remote_dir /some/local_dir

6. restoring from an increment file (restore file pg.py to its version dated 2011-11-30T00:28:38+08:00)

rdiff-backup hostname.net:://remote-dir/rdiff-backup-data/increments/pg.py.2011-11-30T00:28:38+08:00.diff.gz /local_dir/pg.py

Sunday, July 14, 2013

Tips on Some Stat Commands for FreeBSD

FreeBSD provides some handy commands to get various information: Command netstat shows network status, sockstat lists open sockets, fstat identifies active files, and procstat gets detailed process information. Here are some tips on their usages.

1. Show Internet routing table:

netstat -rn

2. Show all active Internet (IPv4) connections (including servers):
netstat -f inet -a -n
or,
sockstat -4
which gives us more information on command name and process identifier (PID) for each connection.

3. Show all Active TCP connections (including servers):
netstat -f inet -p tcp -a -n
or,
sockstat -4 -P tcp

4. Identify processes using a file or directory:
fstat -v /path/to/the/fileORdirectory

5. Identify all files opened by the specified process:
fstat -v -p PID
or,
procstat -f PID

6. Identify all files opened by the specified user:
fstat -v -u user

7. Get detailed process information on all processes:
procstat -a

8. Get environment variables on the specified process:
procstat -e PID

Saturday, July 13, 2013

Some Usages of fuser on Linux Systems

Command fuser allows us to identify processes using files or sockets. Here are some example usages of fuser.

1. Identify processes using a file or directory:

fuser -v /path/to/the/fileORdirectory

2. Identify processes using a particular TCP/UDP port#:

fuser -v -n tcp port#

where option -n enables us to specify an object in a different name space, tcp port# here.

3. Kill processes that are executing a particular command:

fuser -v -k -i /path/to/the/command

where option -i specifies asking the user for confirmation before killing a process.

4. Kill processes using a particular TCP/UDP port#:

fuser -v -k -i -n tcp port#

Monday, July 8, 2013

How to Mount LVM partitions/disks

Logical volume manager (LVM) is suitable for many occasions, e.g., managing large disk farms, easily re-sizing disk partitions on small systems, and etc. The following quote from wiki LVM page best describes its common uses:

One can think of LVM as a thin software layer on top of the hard disks and partitions, which creates an illusion of continuity and ease-of-use for managing hard-drive replacement, repartitioning, and backup.

Here are steps on how to mount LVM partitions/disks:

1. scan all disks for volume groups:

vgscan

2. scan all disks for logical volumes:

lvscan

The output consists of one line for each logical volume indicating if it is active and its size.

3. change the availability of the logical volume (if it is inactive):

lvchange -a y /dev/vg_name/lv_name

where vg_name is the name of the volume group found by vgscan and lv_name name of the logical volume found by lvscan. You may use vgchange to change the availability of all logical volumes in a specified volume group.

4. mount the logic volume:

mount /dev/vg_name/lv_name /mount/point

where /mount/point is the mount point for the logic volume.

Sunday, July 7, 2013

Load Python Module Dynamically

There are times that we need to import some python module and, for whatever reasons, we will not know the name of the module until run-time. This can be achieved by built-in function __import__().

Dynamic loading python module is useful, for example, if we plan to read configuration from a file. Here is an example that allows us to load the module specified in sys.argv[1]:

config_file = sys.argv[1]
config = __import__(config_file)

For newer versions (2.7 and up), there are convenience wrappers for __import__(). We may use module importlib:

import importlib
config_file = sys.argv[1]
config = importlib.import_module(config_file)

Both __import__() and importlib.import_module() search modules in certain locations (e.g., sys.modules, sys.meta_path, sys.path). There are times that modules being loaded are not in those default locations. This can be achieved by using imp.load_source(), which allows us to load and initialize modules implemented as Python source files:

import imp
pathname = sys.argv[1]
config = imp.load_source(name, pathname)

where pathname is the path name of the configuration file.