Here is a cheat sheet for git.
1. Create
git init # create a local repository
git clone <url> # clone a repository from url
2. Commit
git commit -m "commit message"
3. Browse
git log # history of change
git status # files changed in working directory
git diff # diff between working directory and the index
git diff HEAD # diff between working directory and the most recent commit
git diff --cached # diff between the index and the most recent commit
git show <object> # show object
gitk # git repository (GUI) browser
4. Stage
git add <file> # add file to the index
git reset HEAD <file> # unstage the staged file
5. Undo
git commit -a --amend # fix the last commit
git reset --hard <commit> # discard any changes and reset to the commit
git revert HEAD # revert the last commit
git revert <commit> # revert the specific commit
git checkout -- <file> # unmodify the modified file
6. Branch
git branch <new_branch> # create a branch named new_branch based on HEAD
git branch -d <old_branch> # delete the branch named old_branch
git checkout <branch> # switch to the branch
git checkout -b <branch> # create a new branch then switch to it
git merge <branch> # merge the specified branch into HEAD
7. Update
git fetch # download latest changes from origin
git pull # fetch from and integrate (merge) with origin
8. Publish
git push # update origin
A journal on information technology: things I studied, worked, thought, but can't stay in my memory.
Showing posts with label file. Show all posts
Showing posts with label file. Show all posts
Wednesday, October 2, 2013
Monday, September 30, 2013
Git -- basics
Git is a distributed version control system (DVCS) designed to handle things from small to very large projects. DVCS features many advantages over the traditional centralized VCS, because users have the entire history of the project on their local disks (repository). Two of them are:
The index (staging area) serves as a bridge between the working directory and the object database. Information stored in the index will go into the object database at our next commit. The following figure shows the normal work flow among them.
Working Index Object
directory database
<--------checkout-----------|
|-----add---->
|---commit--->
Each file in our working directory can be in one of two states: tracked or untracked. Tracked files are those that are in the last snapshot or in the index (staging area); they can be further classified as unmodified, modified, or staged. A staged file is one that has been modified/created and added to the index. Untracked files are everything else. The lifecycle of files in working directory is illustrated in the following figure:
untracked unmodified modified staged
|----------------------add------------------------->
<---------------------reset------------------------|
|-----edit------>
|------add----->
<----------commit-------------|
The command git ls-files lists tracked files, whereas git ls-file -o (option o) lists untracked files.
- It allows users to work productively without network connection, and
- it makes most operations much faster, since most operations are local.
- index (also called staging area or cache) -- a mutable file that caches information about the working directory, and
- object database -- an immutable, append-only storage that stores files and meta-data for our project.
- blob (binary large object) -- each file that we add to the repository is turned into a blob object.
- tree -- each directory is turned into a tree object.
- commit -- a commit is a snapshot of the working directory at a point in time.
- tag -- a container that contains reference to another object.
The index (staging area) serves as a bridge between the working directory and the object database. Information stored in the index will go into the object database at our next commit. The following figure shows the normal work flow among them.
Working Index Object
directory database
<--------checkout-----------|
|-----add---->
|---commit--->
Each file in our working directory can be in one of two states: tracked or untracked. Tracked files are those that are in the last snapshot or in the index (staging area); they can be further classified as unmodified, modified, or staged. A staged file is one that has been modified/created and added to the index. Untracked files are everything else. The lifecycle of files in working directory is illustrated in the following figure:
untracked unmodified modified staged
|----------------------add------------------------->
<---------------------reset------------------------|
|-----edit------>
|------add----->
<----------commit-------------|
The command git ls-files lists tracked files, whereas git ls-file -o (option o) lists untracked files.
Sunday, August 25, 2013
Automated Remote Backups with Rdiff-backup
One subtle feature of rdiff-backup is that it allows users to make remote backups over Internet using SSH, which makes remote backups very secure since data transferred is encrypted.
One problem is that SSH requires a password for logging, which is not convenient if we want to run rdiff-backup as a cron job. Here we show how to initiate rdiff-backups from a central backup server, and pull data from a farm of hosts to be backed up. For security reasons, the central server uses a non-root user account (rdiffbk) to perform backups, whereas root account is used on each host being backed up. Though root accounts are used on hosts being backed up, they are protected by SSH public-key authentication mechanism with forced-command-only option.
For convenience, I'll call the central backup server canine and three hosts to be backed up beagle, shepherd and terrier. For short, only works on canine and beagle will be shown.
Here is the procedure for backup server canine:
To generate RSA type pair for host beagle, we issue
ssh-keygen -t rsa -f id_beagle-backup
where private key will be saved in file id_beagle-backup and public key id_beagle-backup.pub.
Step 2: move corresponding ssh key to each host
To move id_beagle-backup.pub to host beagle, we may choose to use any preferred method (for example, ftp, sftp, or ssh-copy-id), since public key is not sensitive. Other hosts can be done similarly.
Step 3: create SSH configuration file
To define how to connect to host beagle with backup key, we place the following lines into file ~rdiffbk/.ssh/config. Other hosts need to be configured similarly.
host beagle-backup
hostname beagle
user root
identifyfile ~rdiffbk/.ssh/id_beagle-backup
protocol 2
Step 4: create a cron job file
The following cron job file automates the remote backups daily at 200am, 210am, and 220am, respectively.
0 2 * * * rdiff-backup beagle-backup::/remote_dir beagle/remote_dir
10 2 * * * rdiff-backup shepherd-backup::/remote_dir shepherd/remote_dir
20 2 * * * rdiff-backup terrier-backup::/remote_dir terrier/remote_dir
By default setting, rdiff-backup uses SSH to pipe remote data. Therefore, both SSH server and rdiff-backup are required in hosts to be backed up.
What left on host beagle and others (shepherd, terrier) is simply to give permission to canine to access it (through SSH) and run rdiff-backup. This can be done in the following two steps:
Step I: create an authorized-keys file for root account
To enable SSH public key authentication for root account, we need to create the file /root/.ssh/authorized_keys, which consists public key for user rdiffbk@canine, forced command and other options. The public key (id_beagle-backup.pub) should be available for beagle once we have done Step 2. A sample authorized_keys file is as follows:
command="rdiff-backup --server --restrict-read-only /",from="canine",no-port-forwarding,no-X11-forwarding,no-pty ssh-rsa AAAAB3.... rdiffbk@canine
Here, for security reason, rdiff-backup server is restricted to real only, and
we disable port-forward, X11-forward and pty options. See here for more details.
Step II: configure SSH server for root access
As we saw here, this can be done by put the following line in the SSH server configuration file (sshd_config):
PermitRootLogin forced-commands-only
One problem is that SSH requires a password for logging, which is not convenient if we want to run rdiff-backup as a cron job. Here we show how to initiate rdiff-backups from a central backup server, and pull data from a farm of hosts to be backed up. For security reasons, the central server uses a non-root user account (rdiffbk) to perform backups, whereas root account is used on each host being backed up. Though root accounts are used on hosts being backed up, they are protected by SSH public-key authentication mechanism with forced-command-only option.
For convenience, I'll call the central backup server canine and three hosts to be backed up beagle, shepherd and terrier. For short, only works on canine and beagle will be shown.
Here is the procedure for backup server canine:
- generate one passphrase-free SSH key pair for each host being backed up,
- move corresponding ssh key to each host,
- create SSH configuration file, and
- create a cron job file
To generate RSA type pair for host beagle, we issue
ssh-keygen -t rsa -f id_beagle-backup
where private key will be saved in file id_beagle-backup and public key id_beagle-backup.pub.
Step 2: move corresponding ssh key to each host
To move id_beagle-backup.pub to host beagle, we may choose to use any preferred method (for example, ftp, sftp, or ssh-copy-id), since public key is not sensitive. Other hosts can be done similarly.
Step 3: create SSH configuration file
To define how to connect to host beagle with backup key, we place the following lines into file ~rdiffbk/.ssh/config. Other hosts need to be configured similarly.
host beagle-backup
hostname beagle
user root
identifyfile ~rdiffbk/.ssh/id_beagle-backup
protocol 2
Step 4: create a cron job file
The following cron job file automates the remote backups daily at 200am, 210am, and 220am, respectively.
0 2 * * * rdiff-backup beagle-backup::/remote_dir beagle/remote_dir
10 2 * * * rdiff-backup shepherd-backup::/remote_dir shepherd/remote_dir
20 2 * * * rdiff-backup terrier-backup::/remote_dir terrier/remote_dir
By default setting, rdiff-backup uses SSH to pipe remote data. Therefore, both SSH server and rdiff-backup are required in hosts to be backed up.
What left on host beagle and others (shepherd, terrier) is simply to give permission to canine to access it (through SSH) and run rdiff-backup. This can be done in the following two steps:
Step I: create an authorized-keys file for root account
To enable SSH public key authentication for root account, we need to create the file /root/.ssh/authorized_keys, which consists public key for user rdiffbk@canine, forced command and other options. The public key (id_beagle-backup.pub) should be available for beagle once we have done Step 2. A sample authorized_keys file is as follows:
command="rdiff-backup --server --restrict-read-only /",from="canine",no-port-forwarding,no-X11-forwarding,no-pty ssh-rsa AAAAB3.... rdiffbk@canine
Here, for security reason, rdiff-backup server is restricted to real only, and
we disable port-forward, X11-forward and pty options. See here for more details.
Step II: configure SSH server for root access
As we saw here, this can be done by put the following line in the SSH server configuration file (sshd_config):
PermitRootLogin forced-commands-only
Sunday, July 28, 2013
Rdiff-backup by Examples
Rdiff-backup is a python script that backs up one directory to another. Some features of rdiff-backup as claimed in its official site are: easy to use, creating mirror, keeping increments, and preserving all information. Here are some examples of its usage.
1. simple backing up (backup local directory foo to local directory bar):
rdiff-backup foo bar
2. simple remote backing up (backup local directory /some/local_dir to directory /whatever/remote_dir on machine hostname.net):
rdiff-backup /some/local_dir hostname.net:://whatever/remote_dir
Ssh will be used to open the necessary pipe for remote backing up.
3. simple restoring from previous backup (restore from bar/dir to foo/dir):
cp -a bar/dir foo/dir
4. simple restoring from the latest remote backup (restore from hostname.net:://whatever/remote_dir to local directory /some/local_dir):
rdiff-backup -r now hostname.net:://whatever/remote_dir /some/local_dir
5. restoring from a certain version of a remote backup (restore from backup done 15 days ago):
rdiff-backup -r 15D hostname.net:://whatever/remote_dir /some/local_dir
6. restoring from an increment file (restore file pg.py to its version dated 2011-11-30T00:28:38+08:00)
rdiff-backup
1. simple backing up (backup local directory foo to local directory bar):
rdiff-backup foo bar
2. simple remote backing up (backup local directory /some/local_dir to directory /whatever/remote_dir on machine hostname.net):
rdiff-backup /some/local_dir hostname.net:://whatever/remote_dir
Ssh will be used to open the necessary pipe for remote backing up.
3. simple restoring from previous backup (restore from bar/dir to foo/dir):
cp -a bar/dir foo/dir
4. simple restoring from the latest remote backup (restore from hostname.net:://whatever/remote_dir to local directory /some/local_dir):
rdiff-backup -r now hostname.net:://whatever/remote_dir /some/local_dir
5. restoring from a certain version of a remote backup (restore from backup done 15 days ago):
rdiff-backup -r 15D hostname.net:://whatever/remote_dir /some/local_dir
6. restoring from an increment file (restore file pg.py to its version dated 2011-11-30T00:28:38+08:00)
rdiff-backup
hostname.net:://remote-dir/rdiff-backup-data/increments
/pg.py.2011-11-30T00:28:38+08:00.diff.gz /local_dir/pg.pyMonday, July 8, 2013
How to Mount LVM partitions/disks
Logical volume manager (LVM) is suitable for many occasions, e.g., managing large disk farms, easily re-sizing disk partitions on small systems, and etc. The following quote from wiki LVM page best describes its common uses:
1. scan all disks for volume groups:
vgscan
2. scan all disks for logical volumes:
lvscan
The output consists of one line for each logical volume indicating if it is active and its size.
3. change the availability of the logical volume (if it is inactive):
lvchange -a y /dev/vg_name/lv_name
where vg_name is the name of the volume group found by vgscan and lv_name name of the logical volume found by lvscan. You may use vgchange to change the availability of all logical volumes in a specified volume group.
4. mount the logic volume:
mount /dev/vg_name/lv_name /mount/point
where /mount/point is the mount point for the logic volume.
One can think of LVM as a thin software layer on top of the hard disks and partitions, which creates an illusion of continuity and ease-of-use for managing hard-drive replacement, repartitioning, and backup.Here are steps on how to mount LVM partitions/disks:
1. scan all disks for volume groups:
vgscan
2. scan all disks for logical volumes:
lvscan
The output consists of one line for each logical volume indicating if it is active and its size.
3. change the availability of the logical volume (if it is inactive):
lvchange -a y /dev/vg_name/lv_name
where vg_name is the name of the volume group found by vgscan and lv_name name of the logical volume found by lvscan. You may use vgchange to change the availability of all logical volumes in a specified volume group.
4. mount the logic volume:
mount /dev/vg_name/lv_name /mount/point
where /mount/point is the mount point for the logic volume.
Monday, July 1, 2013
Access UFS File System under Linux
Unix file system (UFS) is widely used in many Unix systems, for example, FreeBSD, OpenBSD, and HP-UX. There are times that we need to access UFS under Linux systems. The following command allows us to mount UFS2 for read-only (ro) under Linux systems:
mount -t ufs -o ufstype=ufs2,ro /dev/sdXY /mnt/path
Write support for UFS is not compiled into Linux kernels by default. One needs to properly configure and compile kernels for write support.
mount -t ufs -o ufstype=ufs2,ro /dev/sdXY /mnt/path
Write support for UFS is not compiled into Linux kernels by default. One needs to properly configure and compile kernels for write support.
Wednesday, May 1, 2013
Hard and Symbolic links
A hard link is an entry in a directory file that associates a name with an (existing) file on a file system, which allows a file to appear in multiple paths.
Unix/Linux systems do not allow hard links on directories, since it may create endless cycles. Hard links are limited to files on the same volume, because name and file association in each hard link is through inode. Most file systems that support hard links use link count to keep track on the total number of links created to point to the inode (file). To find all the files which refer to the same file as NAME, we may use command find with the option '-samefile NAME' or '-inum INODE', where INODE is the inode number of NAME. The command ls with option '-il' gives you information on link count and inode for files.
A symbolic link is a special type of file that contains a text string which is interpreted by the operating system as a path to another file/directory. The other file/directory is usually called the "target". A symbolic link is another file that exists independently of its target, i.e., they are two files/directories indexed by two different inodes, as opposed to hard links. Symbolic links are different from hard links in that:
There is one issue with symbolic links. If a symbolic link is removed, its target remains unaffected. However, there is no automatic update for a symbolic link if its target is moved, renamed, or deleted. The symbolic link continues to exist and point to the original target, which no longer exists. This is called a broken link.
Unix/Linux systems do not allow hard links on directories, since it may create endless cycles. Hard links are limited to files on the same volume, because name and file association in each hard link is through inode. Most file systems that support hard links use link count to keep track on the total number of links created to point to the inode (file). To find all the files which refer to the same file as NAME, we may use command find with the option '-samefile NAME' or '-inum INODE', where INODE is the inode number of NAME. The command ls with option '-il' gives you information on link count and inode for files.
A symbolic link is a special type of file that contains a text string which is interpreted by the operating system as a path to another file/directory. The other file/directory is usually called the "target". A symbolic link is another file that exists independently of its target, i.e., they are two files/directories indexed by two different inodes, as opposed to hard links. Symbolic links are different from hard links in that:
- a symbolic link may point to a directory, and
- a symbolic link may point to a directory/file in different volume
There is one issue with symbolic links. If a symbolic link is removed, its target remains unaffected. However, there is no automatic update for a symbolic link if its target is moved, renamed, or deleted. The symbolic link continues to exist and point to the original target, which no longer exists. This is called a broken link.
Tuesday, April 30, 2013
Inode
An inode is a data structure in Unix/Linux file systems that stores information (metadata) about a file system object, except file content and file name. The command stat allows us to retrieve most information stored in an inode. Here is an example of system file bash on a Linux box:
File: `/usr/bin/bash'
Size: 902036 Blocks: 1768 IO Block: 4096 regular file
Device: 808h/2056d Inode: 279026 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Context: system_u:object_r:shell_exec_t:s0
Access: 2013-05-01 13:59:45.680608666 +0800
Modify: 2013-01-31 22:47:10.000000000 +0800
Change: 2013-03-21 10:29:27.135450340 +0800
Birth: -
We can see that regular files have the following attributes:
One question remains: where is the file name stored? The answer is that it is stored in the content of the directory that contains the file. Unix/Linux directories are lists of association structures, each of which consists of one file name and one inode number for that file. That is why we need to specify (implicitly or explicitly) the path whenever we want to access a file in the file systems.
File: `/usr/bin/bash'
Size: 902036 Blocks: 1768 IO Block: 4096 regular file
Device: 808h/2056d Inode: 279026 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Context: system_u:object_r:shell_exec_t:s0
Access: 2013-05-01 13:59:45.680608666 +0800
Modify: 2013-01-31 22:47:10.000000000 +0800
Change: 2013-03-21 10:29:27.135450340 +0800
Birth: -
We can see that regular files have the following attributes:
- Size in bytes
- Device ID where the file is stored
- Inode number
- Link count
- User ID of the file
- Group ID of the file
- Permissions (access rights) of the file
- Timestamps on last access (atime), modify (mtime), and change (ctime)
- direct pointer,
- singly indirect pointer,
- doubly indirect pointer, and
- triply indirect pointer
One question remains: where is the file name stored? The answer is that it is stored in the content of the directory that contains the file. Unix/Linux directories are lists of association structures, each of which consists of one file name and one inode number for that file. That is why we need to specify (implicitly or explicitly) the path whenever we want to access a file in the file systems.
Monday, April 22, 2013
poor men's PGP
I just finished the pmPGP, a CLI for sending/receiving openPGP mime messages.
The pmPGP is based on python and gnupg; it supports sending emails in the following formats:
The pmPGP is based on python and gnupg; it supports sending emails in the following formats:
- plain -- regular email
- sign -- RFC3156
- encrypt -- RFC3156
- sign-encrypt -- RFC3156
- Sencrypt -- Symmetric encryption (for fun and personal usage)
- sign-Sencrypt -- (for fun and personal usage)
Poor man may use pmPGP to store/backup files on email servers.Sounds interesting? Get it from:
Saturday, January 5, 2013
find all file descriptors used by a process
A file descriptor (FD) is an abstract indicator for a file accessing. In Unix-like systems, file descriptors can refer to many different objects besides files, such as pipes, unix domain sockets, and internet sockets.
lsof (list open files) is an open source command to report a list of open files and the processes that opened them. To find all file descriptors used by the process with pid, we may issuing the command:
lsof -p pid
To find all internet sockets used by the process with pid, we may issue:
lsof -i -n -P | grep pid
where, -i specifies listing IP sockets only, -n no translation of hostnames, and -P no translation of port names.
What if lsof is not available on your system?
If your system implements the procfs (proc filesystem, /proc), all file descriptors used by the process with pid can be found in the directory /proc/pid/fd. Therefore, on linux systems, you may issue:
ls -l /proc/pid/fd
to get your job done. However, other approach is needed for FreeBSD systems, since procfs is being gradually phased out on FreeBSD. Both fstat (-- identify active files) and procstat (-- get detailed process information) allow us to achieve our goal. You may issue:
fstat -p pid
or,
procstat -f pid
where, pid is the process id of your interest.
lsof (list open files) is an open source command to report a list of open files and the processes that opened them. To find all file descriptors used by the process with pid, we may issuing the command:
lsof -p pid
To find all internet sockets used by the process with pid, we may issue:
lsof -i -n -P | grep pid
where, -i specifies listing IP sockets only, -n no translation of hostnames, and -P no translation of port names.
What if lsof is not available on your system?
If your system implements the procfs (proc filesystem, /proc), all file descriptors used by the process with pid can be found in the directory /proc/pid/fd. Therefore, on linux systems, you may issue:
ls -l /proc/pid/fd
to get your job done. However, other approach is needed for FreeBSD systems, since procfs is being gradually phased out on FreeBSD. Both fstat (-- identify active files) and procstat (-- get detailed process information) allow us to achieve our goal. You may issue:
fstat -p pid
or,
procstat -f pid
where, pid is the process id of your interest.
Subscribe to:
Posts (Atom)