Jay's journal: tip

Showing posts with label tip. Show all posts

Tuesday, April 8, 2014

List All Installed Python Packages

There are times that we need a list of all installed python packages. One way to accomplish it is using

help('modules')

in python shell. It shows all installed packages including stdlib. If we are not interested in stdlib, command pip works:

pip freeze

It generates installed packages in requirements format.

Monday, December 16, 2013

Learning PyLucene by Example

Apache Lucene is a a full text search framework built in Java. It has many appealing features such as high-performance indexing, powerful and efficient search algorithms, and cross-platform solution. PyLucene is a Python extension for using (Java) Lucene. Its goal is to allow users to use Lucene for text indexing and searching within Python. PyLucene is not a port but a Python wrapper around Java Lucene, which embeds Lucene running in a JVM into a Python process.

This is a quick guide on PyLucene. We show code snippets for full-text searching on bible versers, which are stored in a dictionary data structure. There are three steps in buliding the index: create an index, fill the index and close resources. In the first step, we choose StandardAnalyzer as the analyzer, SimpleFSDirectory as (file) storage scheme for our IndexWriter. In the second step, each verse (document) is labelled with five fields that serve as index for future search. The text of each verse is labelled as "Text." Our search will be primarily on this field. However, the text of each verse is for indexing only but not stored (Field.Store.NO) in the index, since all verses are already stored in our main data store (bible dictionary). Label "Testament" allows us to distinguish if the verse is in Old or New testament. The last three fields: book, chapter, verse are keys to the data store that allow us to retrieve the text of the specified verse. Once the index is built, we close all the resources in the last step.

The snippet for building the index is as follows:

def make_index():
    '''Make index from data source -- bible
    Some global variables used:
        bible: a dictionary that stores all bible verses
        OTbooks: a list of books in old testament
        NTbooks: a list of books in new testament
        chapsInBook: a list of number of chapters in each book
    '''
    lucene.initVM()
    path = raw_input("Path for index: ")
    # 1. create an index
    index_path = File(path)
    analyzer = StandardAnalyzer(Version.LUCENE_35)
    index = SimpleFSDirectory(index_path)
    config = IndexWriterConfig(Version.LUCENE_35, analyzer)
    writer = IndexWriter(index, config)

    # 2 construct documents and fill the index
    for book in bible.keys():
        if book in OTbooks:
            testament = "Old"
        else:
            testament = "New"
        for chapter in xrange(1, chapsInBook[book]+1):
            for verse in xrange(1, len(bible[book][chapter])+1):
                verse_text = bible[book][chapter][verse]
                doc = Document()
                doc.add(Field("Text", verse_text, Field.Store.NO, Field.Index.ANALYZED))
                doc.add(Field("Testament", testament, Field.Store.YES, Field.Index.ANALYZED))
                doc.add(Field("Book", book, Field.Store.YES, Field.Index.ANALYZED))
                doc.add(Field("Chapter", str(chapter), Field.Store.YES, Field.Index.ANALYZED))
                doc.add(Field("Verse", str(verse), Field.Store.YES, Field.Index.ANALYZED))
                writer.addDocument(doc)

    # 3. close resources
    writer.close()
    index.close()

There are five steps in our simple search: open the index, parse the query string, search the index, display results and close resources. In the first step, we use IndexReader to open the index built before for our simple search. The query string (kwds) is parsed by the QueryParser in the second step. The search job is done in step three by IndexSearcher, and the results are stored in the object hits. In step four, we get (getField) the book, chapter, and verse fields from the documents returned in the previous step, which allow us to retrieve the bible verses from the data store and display them. Finally, we close all resources in step five.

The snippet for searching and displaying results is as follows:

def search(indexDir, kwds):
    '''Simple Search
    Input paramenters:
        1. indexDir: directory name of the index
        2. kwds: query string for this simple search
    display_verse(): procedure to display the specified bible verse
    '''
    lucene.initVM()
    # 1. open the index
    analyzer = StandardAnalyzer(Version.LUCENE_35)
    index = SimpleFSDirectory(File(indexDir)
    reader = IndexReader.open(index)
    n_docs = reader.numDocs()

    # 2. parse the query string
    queryparser = QueryParser(Version.LUCENE_35, "Text", analyzer)
    query = queryparser.parse(kwds)

    # 3. search the index
    searcher = IndexSearcher(reader)
    hits = searcher.search(query, n_docs).scoreDocs

    # 4. display results
    for i, hit in enumerate(hits):
        doc = searcher.doc(hit.doc)
        book = doc.getField('Book').stringValue()
        chapter = doc.getField('Chapter').stringValue()
        verse = doc.getField('Verse').stringValue()
        display_verse(book, int(chapter), int(verse))

    # 5. close resources
    searcher.close()

Tuesday, October 8, 2013

An Introduction to Fabric

Fabric is described as a simple, Pythonic tool for application deployment or systems administration tasks. The fab tool simply imports your fabfile (fabfile.py, by default) and executes the function or functions at your disposal. There is no magic about fabric – anything we can do in a Python script can be done in a fabfile. Here is a small but complete fabfile that defines one task:

def hello(name="World"):
print("Hello %s!" % name)

Calling fab hello will display the familiar message "Hello World!" However, we can personalize it by issuing:

$fab hello:name=Jay

The message shown will be: "Hello Jay!"

Below is another small fabfile that allows us to find kernel release as well as machine hardware of a remote system by using SSH:

from fabric.api import run

def host_type():
run("uname -srm")

Since there is no remote host defined in the fabfile, we need to specify it (hostname) using option -H:

$fab -H hostname host_type

Fabric provides many command-execution functions, the following five are frequently used:

run(command) -- Run a shell command on a remote host.
sudo(comand) -- Run (with superuser privileges) a shell command on a remote host.
local(command) -- Run a command on the local host.
get(remote_path, local_path) -- Download one or more files from a remote host.
put(local_path, remote_path) -- Upload one or more files to a remote host.

Fabric also includes context managers for use with the with statement, the following three are commonly used:

settings(*args, **kwargs) -- Nest context managers and/or override env variables.
lcd(path) -- Update local current working directory.
cd(path) -- Update remote current working directory.

The third sample fabfile allows us to deploy a Django project to three production servers (serv1, serv2 and serv3). In fact, it can be extended to deploy various applications. The deployment consists of the following steps:

testing the project -- python manage.py test apps
packing the project -- tar czf /tmp/project.tgz .
moving the packed file to server -- put("/tmp/project.tgz", "/path/to/serv/tmp/project.tgz")
unpacking the file -- run("tar xzf /path/to/serv/tmp/project.tgz"), and
deploying the project on server -- run("touch manage.py")

To use it, we issue:
$fab install

from __future__ import with_statement
from fabric.api import *
from fabric.contrib.console import confirm

env.hosts = ['serv1', 'serv2', 'serv3']
env.user= "admin"

def test():
    """
    Run test; if it fails prompt the user for action.
    """
    src_dir = "/path/to/local/src/directory"
    with settings(warn_only=True), lcd(src_dir):
        result = local("python manage.py test apps", capture=True)
    if result.failed and not confirm("Tests failed. Continue anyway?"):
        abort("Aborting at user request.")

def pack_move():
    """
    Archive our current code and upload to servers.
    """
    src_dir = "/path/to/local/src/directory"
    with settings(warn_only=True), lcd(src_dir):
        local( "tar czf /tmp/project.tgz .")
    put( "/tmp/project.tgz", "/path/to/serv/tmp/project.tgz" )
    local( "rm /tmp/project.tgz" )

def install():
    """
    Deploy the project on servers.
    """
    test()
    pack_move()
    dst_dir = "/path/to/serv/dst/directory"
    with settings(hide("warnings"), warn_only=True):
        if run("test -d %s" % dst_dir).failed:
            run("mkdir -p %s" % dst_dir)
    with cd(dst_dir):
        run("tar xzf /path/to/serv/tmp/project.tgz")
        run("touch manage.py")
    run("rm -f /path/to/serv/tmp/project.tgz")

Wednesday, October 2, 2013

Git -- Cheat Sheet

Here is a cheat sheet for git.

1. Create
git init                  # create a local repository
git clone <url>    # clone a repository from url

2. Commit
git commit -m "commit message"

3. Browse
git log                  # history of change
git status             # files changed in working directory
git diff                 # diff between working directory and the index
git diff HEAD       # diff between working directory and the most recent commit
git diff --cached   # diff between the index and the most recent commit
git show <object>   # show object
gitk                             # git repository (GUI) browser

4. Stage
git add <file>                  # add file to the index
git reset HEAD <file>    # unstage the staged file

5. Undo
git commit -a --amend          # fix the last commit
git reset --hard <commit> # discard any changes and reset to the commit
git revert HEAD                    # revert the last commit
git revert <commit>             # revert the specific commit
git checkout -- <file>       # unmodify the modified file

6. Branch
git branch <new_branch> # create a branch named new_branch based on HEAD
git branch -d <old_branch> # delete the branch named old_branch
git checkout <branch>           # switch to the branch
git checkout -b <branch>    # create a new branch then switch to it
git merge <branch>                # merge the specified branch into HEAD

7. Update
git fetch               # download latest changes from origin
git pull                 # fetch from and integrate (merge) with origin

8. Publish
git push                 # update origin

Wednesday, September 11, 2013

Python Decompiler

Need a decompiler for your python byte-code?

Here is one: uncompyle2, which works for python 2.

Tuesday, September 10, 2013

Untangle Links

How to find the terminal pathname of a file/directory? For example, the file /usr/lib64/mozilla/plugins/libjavaplugin.so will be installed by IcedTea-web java plugin. It is a symbolic link to another link. The command namei enables us to untangle links by following each pathname until the endpoint is found.

Here is an example usage and its output:

%namei /usr/lib64/mozilla/plugins/libjavaplugin.so
f: /usr/lib64/mozilla/plugins/libjavaplugin.so
d /
d usr
d lib64
d mozilla
d plugins
l libjavaplugin.so -> /etc/alternatives/libjavaplugin.so.x86_64
   d /
   d etc
   d alternatives
   l libjavaplugin.so.x86_64 -> /usr/lib64/IcedTeaPlugin.so
     d /
     d usr
     d lib64
     - IcedTeaPlugin.so

which untangles /usr/lib64/mozilla/plugins/libjavaplugin.so by following links and gets its terminal pathname:
/usr/lib64/IcedTeaPlugin.so.

Monday, September 9, 2013

Practical Examples on RPM package management

Here are some practical examples on usages of rpm or yum for RPM based package management. A package name should be used as parameter in most examples, and it was omitted. A '?????' indicates that I have no clue on how to complete the task using rpm/yum. All comments are welcome if you know proper solution.

1. Install an RPM Package
rpm -ivh
yum install

2. Remove an RPM Package
rpm -evh
yum remove

3. Upgrade an RPM Package
rpm -Uvh
yum upgrade

4. List all installed RPM Packages
rpm -qa
yum list installed

5. List all (installed + available) RPM Packages
?????
yum list

6. Check dependencies of an RPM Package
rpm -qR
yum deplist

7. Query the information of Installed RPM Package
rpm -qi
yum info                           # both installed + available

8. List all files of an installed RPM package
rpm -ql
repoquery -ql                 # need to install yum-utils

9. Query a file that belongs which RPM Package
rpm -qf /path/to/file
yum whatprovides /path/to/file

10. Query the Information of RPM Package Before Installing
rpm -qip foo.rpm            # rpm package needs to be downloaded first
yum info

11. Search for a Package
?????
yum search

12. Find out which package provides some feature or file
rpm -qf /path/to/file       # for file only
yum whatprovides

13. List the package specific scriptlet
rpm -q --scripts
?????

Saturday, September 7, 2013

Manage Autostart Applications in GNOME

After login in GNOME a lot of applications can be automatically started to make life easier. System-wide autostart applications can be found in /etc/xdg/autostart and in /usr/share/gnome/autostart. Users have choices to edit a autostart application by disabling it, editing its name, command or description; additionally, users may add their own autostart applications.

Gnome provides the tool gnome-session-properties, which allows us to add, modify and remove autostart applications. Once we are done with gnome-session-properties, new entries (files) are generated and saved in directory ~/.config/autostart/.

Here is an example file skype.desktop created and saved after adding skype as a new autostart application:

[Desktop Entry]
Type=Application
Exec=/usr/bin/skype
X-GNOME-Autostart-enabled=true
Name=Skype

A new file named gnome-keyring-ssh.desktop is created (and saved in directory ~/.config/autostart) after removing the default autostart application SSH key agent. The contents in it are the same as that in file /etc/xdg/autostart/gnome-keyring-ssh.desktop, except in one line:

X-GNOME-Autostart-enabled=false

which specifies that the autostart is disabled.

Sunday, August 25, 2013

Automated Remote Backups with Rdiff-backup

One subtle feature of rdiff-backup is that it allows users to make remote backups over Internet using SSH, which makes remote backups very secure since data transferred is encrypted.

One problem is that SSH requires a password for logging, which is not convenient if we want to run rdiff-backup as a cron job. Here we show how to initiate rdiff-backups from a central backup server, and pull data from a farm of hosts to be backed up. For security reasons, the central server uses a non-root user account (rdiffbk) to perform backups, whereas root account is used on each host being backed up. Though root accounts are used on hosts being backed up, they are protected by SSH public-key authentication mechanism with forced-command-only option.

For convenience, I'll call the central backup server canine and three hosts to be backed up beagle, shepherd and terrier. For short, only works on canine and beagle will be shown.

Here is the procedure for backup server canine:

generate one passphrase-free SSH key pair for each host being backed up,
move corresponding ssh key to each host,
create SSH configuration file, and
create a cron job file

Step 1: generate one passphrase-free SSH key pair for each host being backed up

To generate RSA type pair for host beagle, we issue

ssh-keygen -t rsa -f id_beagle-backup

where private key will be saved in file id_beagle-backup and public key id_beagle-backup.pub.

Step 2: move corresponding ssh key to each host

To move id_beagle-backup.pub to host beagle, we may choose to use any preferred method (for example, ftp, sftp, or ssh-copy-id), since public key is not sensitive. Other hosts can be done similarly.

Step 3: create SSH configuration file

To define how to connect to host beagle with backup key, we place the following lines into file ~rdiffbk/.ssh/config. Other hosts need to be configured similarly.

host beagle-backup
    hostname beagle
    user root
    identifyfile ~rdiffbk/.ssh/id_beagle-backup
    protocol 2

Step 4: create a cron job file

The following cron job file automates the remote backups daily at 200am, 210am, and 220am, respectively.

0 2 * * * rdiff-backup beagle-backup::/remote_dir beagle/remote_dir
10 2 * * * rdiff-backup shepherd-backup::/remote_dir shepherd/remote_dir
20 2 * * * rdiff-backup terrier-backup::/remote_dir terrier/remote_dir

By default setting, rdiff-backup uses SSH to pipe remote data. Therefore, both SSH server and rdiff-backup are required in hosts to be backed up.
What left on host beagle and others (shepherd, terrier) is simply to give permission to canine to access it (through SSH) and run rdiff-backup. This can be done in the following two steps:

Step I: create an authorized-keys file for root account

To enable SSH public key authentication for root account, we need to create the file /root/.ssh/authorized_keys, which consists public key for user rdiffbk@canine, forced command and other options. The public key (id_beagle-backup.pub) should be available for beagle once we have done Step 2. A sample authorized_keys file is as follows:

command="rdiff-backup --server --restrict-read-only /",from="canine",no-port-forwarding,no-X11-forwarding,no-pty ssh-rsa AAAAB3.... rdiffbk@canine

Here, for security reason, rdiff-backup server is restricted to real only, and
we disable port-forward, X11-forward and pty options. See here for more details.

Step II: configure SSH server for root access

As we saw here, this can be done by put the following line in the SSH server configuration file (sshd_config):

PermitRootLogin forced-commands-only

Thursday, August 22, 2013

Tips on Forced Command for SSH

In general, an SSH connection invokes a remote command chosen by the client. There are times that server should decide which command the client will run. The forced command enables us to achieve this goal.

There are two ways to forced command. One is through public-key authentication configuration in the file authorized_keys as we saw here. The other is thought the usage of the keyword ForceCommand in sshd_config. To restrict users run nothing but the alpine command, we put the following line in sshd_config:

ForceCommand /usr/bin/alpine

The major difference between these two are: configuration though public-key authentication applies to one user, and each user may have her/his own option; configuration through ForceCommand keyword may be system-wide, keyword Match should be used combinedly if ForceCommand should apply to certain user(s).

What if we want the user to not only execute a single command, but few fixed commands at user's choice, such as:

show process list (ps aux),
print system information (uname -a),
show who is logged on (who), or
start rdiff-backup server (rdiff-backup --server --restrict-read-only /)

A simple wrapper script combined with the ForceCommand will suffice. Here is an example that allows user backup to invoke four different commands at user's choice.

With environment variable SSH_ORIGINAL_COMMAND,
the following script (wrapper.sh) wraps all permitted commands:

#!/bin/sh
# Script: /usr/local/bin/wrapper.sh

case "$SSH_ORIGINAL_COMMAND" in
    "ps")
        ps aux
        ;;
    "uname")
        uname -a
        ;;
    "who")
        who
        ;;
    "rdiff")
        rdiff-backup --server --restrict-read-only /
        ;;
    *)
        echo "Only the following commands are available to you:"
        echo "ps, uname, who and rdiff"
        exit 1
        ;;
esac

The configuration (sshd_config) of ForceCommand with Match (user backup) is as follows:

Match User backup
    ForceCommand /usr/local/bin/wrapper.sh

To show process list on ssh server, one issues:

ssh backup@server ps

where original command "ps" was passed to the wrapper script by environment variable SSH_ORIGINAL_COMMAND.

To backup directory tree /path_to_src on server to local directory /path_to_dst, one issues:

rdiff-backup --remote-schema "ssh -C %s rdiff" backup@server::/path_to_src /path_to_dst

Wednesday, August 21, 2013

Root Access Control for SSH

Sshd has a separate access control mechanism for the root (superuser). The keyword PermitRootLogin specifies its usage.

The argument (option) for PermitRootLogin must be "no", "yes'', "without-password'', or ``forced-commands-only''. If this option is set to "no'', root is not allowed to log in.

If this option is set to "without-password'', password authentication is disabled for root. However, root may login in with GSSAPIAuthentication, HostbasedAuthentication or PubkeyAuthentication, if they are set properly.

If this option is set to "forced-commands-only'', root login with public key authentication is allowed, but only if the command option is specified (which may be useful for remote backup as we saw in the example of public-key-based configuration). All other authentication methods are disabled in this setting.

Public-key-based Configuration for SSH server

Public key is one of the frequently used authentication methods in SSH. To set up public-key authentication for one's account on an SSH server, one creates an authentication file named authorized_keys (for OpenSSH), and lists key and options that provide access to one's account.

Each line (SSH protocol 2) in authorized_keys may contain:

An (optional) set of authorization options for the key.
A (required) key type string: ssh-dss for a DSA key, or ssh-rsa for an RSA key.
The (required) base64-encoded public key.
An (optional) descriptive comment.

The optional options consist of comma-separated option specifications, where no space is allowed, except within double quotes. Some common option specifications are:

command="command": Specifies that the command to be executed
from="pattern-list": Specifies the permitted client name or IP address
no-port-forwarding: Forbids TCP forwarding
no-X11-forwarding: Forbids X11 forwarding
no-pty: Prevents tty allocation

The following example file specifies that:
the command "rdiff-backup --server --restrict-read-only /" to be executed if client is from the machine named "beagle" where no port, X11 forwarding is allowed. Notice that all settings are in one line.

command="rdiff-backup --server --restrict-read-only /",from="beagle",no-port-forwarding,no-X11-forwarding ssh-rsa AAAAB3.... root@beagle

Saturday, August 17, 2013

Tips on SSH Client Configuration

OpenSSH client ssh obtains configuration data from sources in the following order:

command-line options,
user's configuration file (~/.ssh/config),
system-wide configuration file (/etc/ssh/ssh_config)

Since each parameter may be defined in different sources, order of parameter definition is important. The ssh manual page (ssh_config) says that:

For each parameter, the first obtained value will be used.

Here are some frequently used parameters:
Each configuration file contains sections separated by host specifications that applies to all matching hosts specified by the Host parameter. The host is the hostname argument given on the command line.

Hostname: Specifies the real host name to log into.

IdentityFile: Specifies a file from which the user's public key authentication identity is read.

Port: Specifies the port number to connect on the remote host (if it is not 22).

User: Specifies the user to log in as.

Here is an example configuration file (~/.ssh/config) for remote machine robert.some.net:

host bob
        hostname robert.some.net
        identityfile /somepath/.ssh/id_rsa_bob
        port 2222
        user root

With the above configuration file, once we issue:

ssh bob

which is equivalent to

ssh -i /somepath/.ssh/id_rsa_bob -p 2222 root@robert.some.net

Sunday, July 28, 2013

Rdiff-backup by Examples

Rdiff-backup is a python script that backs up one directory to another. Some features of rdiff-backup as claimed in its official site are: easy to use, creating mirror, keeping increments, and preserving all information. Here are some examples of its usage.

1. simple backing up (backup local directory foo to local directory bar):

rdiff-backup foo bar

2. simple remote backing up (backup local directory /some/local_dir to directory /whatever/remote_dir on machine hostname.net):

rdiff-backup /some/local_dir hostname.net:://whatever/remote_dir

Ssh will be used to open the necessary pipe for remote backing up.

3. simple restoring from previous backup (restore from bar/dir to foo/dir):

cp -a bar/dir foo/dir

4. simple restoring from the latest remote backup (restore from hostname.net:://whatever/remote_dir to local directory /some/local_dir):

rdiff-backup -r now hostname.net:://whatever/remote_dir /some/local_dir

5. restoring from a certain version of a remote backup (restore from backup done 15 days ago):

rdiff-backup -r 15D hostname.net:://whatever/remote_dir /some/local_dir

6. restoring from an increment file (restore file pg.py to its version dated 2011-11-30T00:28:38+08:00)

rdiff-backup hostname.net:://remote-dir/rdiff-backup-data/increments/pg.py.2011-11-30T00:28:38+08:00.diff.gz /local_dir/pg.py

Sunday, July 14, 2013

Tips on Some Stat Commands for FreeBSD

FreeBSD provides some handy commands to get various information: Command netstat shows network status, sockstat lists open sockets, fstat identifies active files, and procstat gets detailed process information. Here are some tips on their usages.

1. Show Internet routing table:

netstat -rn

2. Show all active Internet (IPv4) connections (including servers):
netstat -f inet -a -n
or,
sockstat -4
which gives us more information on command name and process identifier (PID) for each connection.

3. Show all Active TCP connections (including servers):
netstat -f inet -p tcp -a -n
or,
sockstat -4 -P tcp

4. Identify processes using a file or directory:
fstat -v /path/to/the/fileORdirectory

5. Identify all files opened by the specified process:
fstat -v -p PID
or,
procstat -f PID

6. Identify all files opened by the specified user:
fstat -v -u user

7. Get detailed process information on all processes:
procstat -a

8. Get environment variables on the specified process:
procstat -e PID

Saturday, July 13, 2013

Some Usages of fuser on Linux Systems

Command fuser allows us to identify processes using files or sockets. Here are some example usages of fuser.

1. Identify processes using a file or directory:

fuser -v /path/to/the/fileORdirectory

2. Identify processes using a particular TCP/UDP port#:

fuser -v -n tcp port#

where option -n enables us to specify an object in a different name space, tcp port# here.

3. Kill processes that are executing a particular command:

fuser -v -k -i /path/to/the/command

where option -i specifies asking the user for confirmation before killing a process.

4. Kill processes using a particular TCP/UDP port#:

fuser -v -k -i -n tcp port#

Monday, July 8, 2013

How to Mount LVM partitions/disks

Logical volume manager (LVM) is suitable for many occasions, e.g., managing large disk farms, easily re-sizing disk partitions on small systems, and etc. The following quote from wiki LVM page best describes its common uses:

One can think of LVM as a thin software layer on top of the hard disks and partitions, which creates an illusion of continuity and ease-of-use for managing hard-drive replacement, repartitioning, and backup.

Here are steps on how to mount LVM partitions/disks:

1. scan all disks for volume groups:

vgscan

2. scan all disks for logical volumes:

lvscan

The output consists of one line for each logical volume indicating if it is active and its size.

3. change the availability of the logical volume (if it is inactive):

lvchange -a y /dev/vg_name/lv_name

where vg_name is the name of the volume group found by vgscan and lv_name name of the logical volume found by lvscan. You may use vgchange to change the availability of all logical volumes in a specified volume group.

4. mount the logic volume:

mount /dev/vg_name/lv_name /mount/point

where /mount/point is the mount point for the logic volume.

Sunday, July 7, 2013

Load Python Module Dynamically

There are times that we need to import some python module and, for whatever reasons, we will not know the name of the module until run-time. This can be achieved by built-in function __import__().

Dynamic loading python module is useful, for example, if we plan to read configuration from a file. Here is an example that allows us to load the module specified in sys.argv[1]:

config_file = sys.argv[1]
config = __import__(config_file)

For newer versions (2.7 and up), there are convenience wrappers for __import__(). We may use module importlib:

import importlib
config_file = sys.argv[1]
config = importlib.import_module(config_file)

Both __import__() and importlib.import_module() search modules in certain locations (e.g., sys.modules, sys.meta_path, sys.path). There are times that modules being loaded are not in those default locations. This can be achieved by using imp.load_source(), which allows us to load and initialize modules implemented as Python source files:

import imp
pathname = sys.argv[1]
config = imp.load_source(name, pathname)

where pathname is the path name of the configuration file.

Thursday, July 4, 2013

A quick and simple way to wipe a hard drive

For various reasons, it is necessary to wipe hard drives before disposing them. Writing disk with all zeros should suffice most occasions. This can be done as:

dd if=/dev/zero of=/dev/sdXY

where /dev/sdXY is the device name of the hard drive to work on. To get a better performance, we may need to set bs option, e.g.,

dd if=/dev/zero of=/dev/sdb1 bs=10M

to make dd read and write in 10M bytes at a time.

Tuesday, July 2, 2013

Some Usages of cpio

The command cpio allows us to copy files between archives and directories. There are three operation modes: output (-o), input (-i), and pass-through (-p).

To create an archive for directory tree src, we issue:

find src -print0 | cpio -ov0 > src.cpio

To extract files from its archive just created, we issue:

cpio -ivd < src.cpio

To copy files from directory tree src to directory des, we issue:

find src -print0 | cpio -pmd0 dest

To copy files from src to dest that are less than 2 days old and whose names contain 'txt', we issue:

find src -mtime -2 -print0 | grep txt | cpio -pmd0 dest

To copy files from src to dest that are less than 2 days old and which contain the word 'txt', we issue:

find src -mtime -2 | xargs grep -l txt | cpio -pmd0 dest

Combining with ssh (or, natcat if you are not concerned with security), cpio allows us to access remote hosts. Here are some example:

To backup the directory tree src on a remote host, we may issue:

find src -print0 | cpio -oaV0 -H tar -O user@remotehost:src.tar

which uses ssh to copy archive file from localhost to remote host.

To copy files from src to a remote host, we issue:

find src -print0 | cpio -oaV0 -H tar | ssh user@remotehost "cpio -imd"