How Git Internally Stores Objects: Blobs, Trees, Commits
Summary: Deep dive into Git’s object database.
When developers use Git, the majority of their interaction is confined to the surface: git add
, git commit
, git push
, and so on. But beneath this familiar interface lies a powerful and efficient object database that guarantees Git's speed, integrity, and flexibility. Let’s take a journey into Git’s internals and explore how it stores your project using blobs, trees, and commits.
What is Git’s Object Database?
At the heart of Git lies its object database—a key-value store that holds all your data, history, and metadata. This data lives inside the .git/objects
directory, with each stored "object" indexed by a SHA-1 hash.
Git’s fundamental data units are:
- Blobs: File contents
- Trees: Directories
- Commits: Snapshots and history
Let’s decode each, starting from the bottom up.
1. Blobs: The Building Blocks
A blob (binary large object) represents the contents of a single file. Importantly, blobs only store the content and no information about filenames or directories.
How a Blob is Stored
When you git add
a file, Git:
- Reads the file content.
- Creates a blob object containing that content.
- Compresses it using zlib.
- Stores it under
.git/objects/
, naming the file by the SHA-1 hash of its content.
Example
echo "Hello, Git!" > hello.txt
git add hello.txt
After adding, Git stores a blob object. You can see it with:
git cat-file -p <blob-hash>
# Outputs: Hello, Git!
Blobs are content-addressable: identical files across different commits or branches will use the same blob object, saving space.
2. Trees: The Directory Structure
A tree object represents a directory. Unlike blobs, trees store:
- References to blobs (files) and other trees (subdirectories)
- File and directory names
- File mode (executable, symlink, etc.)
Trees knit blobs together, giving structure to your project.
How a Tree is Stored
When you run git commit
, Git:
- Scans the staging area (index) for the current state of files and directories.
- For each directory, creates a tree object listing:
- File modes
- Filenames
- Hashes of blobs/trees for files and subdirectories
Example
Suppose your project looks like:
project/
└── hello.txt
The tree object might look like:
100644 blob <blob-hash> hello.txt
If you have a subdirectory:
project/
├── hello.txt
└── src/
└── main.py
The root tree object will refer to both a blob (for hello.txt
) and another tree (for src
).
3. Commits: The Snapshots
A commit ties it all together. Each commit object includes:
- A pointer to a tree object (the root directory snapshot)
- Zero or more parent commits (linking history)
- Metadata (author, date, message)
Commits are the snapshot mechanism—each commit records a complete tree of the project at a point in time.
How a Commit is Stored
On git commit
, Git creates a new commit object:
- It references the tree object (snapshot of the directory).
- Points to its parent commit(s).
- Stores author, timestamp, and your commit message.
Example
tree <tree-hash>
parent <parent-commit-hash>
author Alice <alice@email.com> 1677109123 +0200
committer Alice <alice@email.com> 1677109123 +0200
Initial commit
This structure allows Git to efficiently track the entire history, reconstructing previous states instantly.
Why This Model is Powerful
- Deduplication: Identical content is stored once, regardless of history or filename.
- Integrity: Everything is referenced by cryptographic hashes, making tampering obvious.
- Efficiency: Snapshots only store what changed—a single new blob or tree if you change a file, rather than copying the entire project.
Visualizing the Relationships
[Commit] ----> [Tree(root)] ----> [Blob(file1)]
|
'---> [Blob(file2)]
|
'---> [Tree(subdir)] ----> [Blob(subfile)]
Inspecting Objects Yourself
You can explore these objects in any Git repository:
- List all objects:
git rev-list --objects --all
- Inspect an object:
git cat-file -p <object-hash>
- Show object type:
git cat-file -t <object-hash>
Conclusion
Git’s object database may seem arcane, but it’s an elegant, robust system underpinning modern software development. By storing data as blobs, trees, and commits—each with its own well-defined role—Git offers a blend of speed, integrity, deduplication, and history tracking that remains unrivaled.
Next time you add or commit a file, remember: it’s not just a simple save. You’re adding another object to a finely crafted database—a foundation for your project’s story.
Further Reading: