Understanding the .git Folder and How Git Works Internally

Summary: Dive into what happens behind the scenes inside .git.

Version control is at the heart of modern software development, and Git stands as the de facto standard. While most developers are comfortable using git commands like commit, push, and pull, fewer understand the inner workings that make Git both powerful and efficient. At the core of every Git repository lies the mysterious .git folder, which quietly houses all the magic. This article explores the anatomy of the .git directory and how Git functions internally to manage your codebase.

What is the `.git` Folder?

Whenever you run git init in a directory, Git creates a hidden folder named .git. This directory contains all the metadata and object data Git needs to manage your repository's history. Everything—from commit history to branch pointers and configuration—is stored here. Without the .git folder, your project is just another collection of files and directories; it’s this folder that turns it into a Git repository.

Key Components of the `.git` Folder

The contents of the .git folder may initially seem arcane, but every file and subdirectory serves a specific purpose.

1. HEAD

The HEAD file is a straightforward text file pointing to the current branch reference.
For example:
```
ref: refs/heads/main
```
It tracks which branch you're currently on.

2. config

Contains repository-specific configuration settings (like remotes, user info, aliases), separate from global Git config.

Example:

[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
[remote "origin"]
    url = git@github.com:user/repo.git
    fetch = +refs/heads/*:refs/remotes/origin/*

3. description

Used mostly by graphical interfaces or Gitweb to describe the repository. Usually ignored in bare repositories.

4. hooks/

Contains scripts that Git can trigger at key events, such as before a commit (pre-commit) or after a push (post-receive).
Custom logic for code quality checks, CI/CD, etc., can be executed here.

5. info/

Houses the exclude file, which can be used to ignore files locally, supplementing .gitignore.

6. objects/

The heart of Git’s content storage.
Git stores everything—files, directories, commits—as objects using the content-addressed storage model.
Objects are named based on the SHA-1 (or SHA-256) hash of their content.
- blobs: store file contents
- trees: represent directory structures
- commits: record changes and metadata
- tags: point to specific objects
Structure example:
```
objects/ab/cdef1234567890...
```
This approach ensures deduplication and data integrity.

7. refs/

Contains references, or pointers to commits.
- refs/heads/: Local branches
- refs/tags/: Tags
- refs/remotes/: Remote branches

8. logs/

Stores recent updates to references—used for git reflog.
Helps recover lost commits or see previous HEAD positions.

9. index

Also called the staging area. Keeps track of changes staged for the next commit.
A binary file internally, tracking which content should be part of the next commit.

10. Packed files

Git may also optimize storage using packfiles (visible as pack subfolder within objects/).
Combines many objects into compressed files to save space and improve performance.

How Git Works Internally

Now that you know what’s inside .git, let’s glimpse into what happens when you use Git.

1. Adding and Staging Files

When you run git add <file>:

Git takes a snapshot of the file’s contents.
It creates a blob object and stores it in objects/ if it doesn't already exist (based on hash).
Adds an entry to the index referencing this blob.

2. Committing

Running git commit:

Git reads the index, writes a tree object (directory structure), and stores it.
Creates a commit object pointing to the tree, referencing parent commits, author, date, and a message.
The commit is added to objects/.
The current branch reference (in refs/heads/) is updated to point to the new commit.

3. Branching

Branches are just plain text pointer files in refs/heads/ (e.g., refs/heads/feature-x).
Each branch contains the hash of its latest commit.

4. Merging & Rebasing

Merges are recorded as commits with more than one parent.
Rebasing moves or recreates commits, adjusting branch pointers.

5. Fetching, Pulling, Pushing

Remotes: URLs and remote branch references are tracked in the .git/config file and refs/remotes/.
Pull: Gets commits from a remote and updates local refs.
Push: Sends new commits and refs to a remote.

Visual Summary: Anatomy of a Commit

HEAD
 ↓
refs/heads/main → <commit hash>
                    ↓
                 commit (author, date, message, parent(s))
                    ↓
                 tree (snapshot of directory)
                 /   \
             blob   tree
            (file) (subdir)

Why Understanding `.git` Matters

Troubleshooting: If you know where and how Git stores information, you can recover lost commits, fix damaged repos, or reset references.
Advanced Features: Enables use of tools like git reflog, custom hooks, and low-level commands (git cat-file, git fsck).
Security: Recognizing the content-addressable nature of Git objects offers insight into data integrity and why commit histories are so robust.

Conclusion

The .git folder is much more than a hidden subdirectory—it’s the central nervous system of your repository. By understanding what’s inside and how Git builds its history through objects, trees, commits, and references, you gain a powerful perspective over your codebase. Next time you run a Git command, you’ll know exactly what’s happening behind the scenes, inside the humble .git directory.

Explore further: