How Git Internally Stores Objects: Blobs, Trees, Commits

Summary: Deep dive into Git’s object database.


When developers use Git, the majority of their interaction is confined to the surface: git add, git commit, git push, and so on. But beneath this familiar interface lies a powerful and efficient object database that guarantees Git's speed, integrity, and flexibility. Let’s take a journey into Git’s internals and explore how it stores your project using blobs, trees, and commits.


What is Git’s Object Database?

At the heart of Git lies its object database—a key-value store that holds all your data, history, and metadata. This data lives inside the .git/objects directory, with each stored "object" indexed by a SHA-1 hash.

Git’s fundamental data units are:

  • Blobs: File contents
  • Trees: Directories
  • Commits: Snapshots and history

Let’s decode each, starting from the bottom up.


1. Blobs: The Building Blocks

A blob (binary large object) represents the contents of a single file. Importantly, blobs only store the content and no information about filenames or directories.

How a Blob is Stored

When you git add a file, Git:

  1. Reads the file content.
  2. Creates a blob object containing that content.
  3. Compresses it using zlib.
  4. Stores it under .git/objects/, naming the file by the SHA-1 hash of its content.

Example

echo "Hello, Git!" > hello.txt
git add hello.txt

After adding, Git stores a blob object. You can see it with:

git cat-file -p <blob-hash>
# Outputs: Hello, Git!

Blobs are content-addressable: identical files across different commits or branches will use the same blob object, saving space.


2. Trees: The Directory Structure

A tree object represents a directory. Unlike blobs, trees store:

  • References to blobs (files) and other trees (subdirectories)
  • File and directory names
  • File mode (executable, symlink, etc.)

Trees knit blobs together, giving structure to your project.

How a Tree is Stored

When you run git commit, Git:

  1. Scans the staging area (index) for the current state of files and directories.
  2. For each directory, creates a tree object listing:
    • File modes
    • Filenames
    • Hashes of blobs/trees for files and subdirectories

Example

Suppose your project looks like:

project/
└── hello.txt

The tree object might look like:

100644 blob <blob-hash>    hello.txt

If you have a subdirectory:

project/
├── hello.txt
└── src/
    └── main.py

The root tree object will refer to both a blob (for hello.txt) and another tree (for src).


3. Commits: The Snapshots

A commit ties it all together. Each commit object includes:

  • A pointer to a tree object (the root directory snapshot)
  • Zero or more parent commits (linking history)
  • Metadata (author, date, message)

Commits are the snapshot mechanism—each commit records a complete tree of the project at a point in time.

How a Commit is Stored

On git commit, Git creates a new commit object:

  • It references the tree object (snapshot of the directory).
  • Points to its parent commit(s).
  • Stores author, timestamp, and your commit message.

Example

tree <tree-hash>
parent <parent-commit-hash>
author Alice <alice@email.com> 1677109123 +0200
committer Alice <alice@email.com> 1677109123 +0200

Initial commit

This structure allows Git to efficiently track the entire history, reconstructing previous states instantly.


Why This Model is Powerful

  • Deduplication: Identical content is stored once, regardless of history or filename.
  • Integrity: Everything is referenced by cryptographic hashes, making tampering obvious.
  • Efficiency: Snapshots only store what changed—a single new blob or tree if you change a file, rather than copying the entire project.

Visualizing the Relationships

[Commit] ----> [Tree(root)] ----> [Blob(file1)]
                      |
                      '---> [Blob(file2)]
                      |
                      '---> [Tree(subdir)] ----> [Blob(subfile)]

Inspecting Objects Yourself

You can explore these objects in any Git repository:

  • List all objects: git rev-list --objects --all
  • Inspect an object: git cat-file -p <object-hash>
  • Show object type: git cat-file -t <object-hash>

Conclusion

Git’s object database may seem arcane, but it’s an elegant, robust system underpinning modern software development. By storing data as blobs, trees, and commits—each with its own well-defined role—Git offers a blend of speed, integrity, deduplication, and history tracking that remains unrivaled.

Next time you add or commit a file, remember: it’s not just a simple save. You’re adding another object to a finely crafted database—a foundation for your project’s story.


Further Reading: