Monday, September 30, 2013

Git -- basics

Git is a distributed version control system (DVCS) designed to handle things from small to very large projects. DVCS features many advantages over the traditional centralized VCS, because users have the entire history of the project on their local disks (repository). Two of them are:
  • It allows users to work productively without network connection, and
  • it makes most operations much faster, since most operations are local.
Git employs snapshots instead of file diffs to track files. Every time we do a commit, it basically takes a picture of working directory at that moment and stores that snapshot. In addition to the working directory, Git has two main data structures:
  • index (also called staging area or cache) -- a mutable file that caches information about the working directory, and 
  • object database -- an immutable, append-only storage that stores files and meta-data for our project.
The object database contains four types of objects:
  • blob (binary large object) -- each file that we add to the repository is turned into a blob object.
  • tree -- each directory is turned into a tree object.
  • commit -- a commit is a snapshot of the working directory at a point in time.
  • tag -- a container that contains reference to another object.
Each object is identified by a SHA-1 hash of its contents. In general, git stores each object in a directory matching the first two characters of its hash; the file name used is the rest of the hash for that object. The command git cat-file or git show allows us to view content of objects.

The index (staging area) serves as a bridge between the working directory and the object database. Information stored in the index will go into the object database at our next commit. The following figure shows the normal work flow among them.

 Working        Index             Object
directory                             database      

     <--------checkout-----------|
     |-----add---->
                           |---commit--->


Each file in our working directory can be in one of two states: tracked or untracked. Tracked files are those that are in the last snapshot or in the index (staging area); they can be further classified as unmodified, modified, or staged. A staged file is one that has been modified/created and added to the index. Untracked files are everything else. The lifecycle of files in working directory is illustrated in the following figure:

 untracked         unmodified        modified        staged
                      
     |----------------------add------------------------->                  
     <---------------------reset------------------------|

                                  |-----edit------> 
                                                           |------add----->                                  
                                 <----------commit-------------|


The command git ls-files lists tracked files, whereas git ls-file -o (option o) lists untracked files.

No comments:

Post a Comment