Skip to content

Implement parsing for tree objects #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/commits.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Commits

## Structure of a commit
tree - tree commit
[tree](tree.md) - tree commit
parent - reference to its parents
author - author name
committer - committer name
Expand Down
36 changes: 36 additions & 0 deletions doc/tree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# tree

## What's in a tree?

A tree describes the content of the current folder by associating blobs to paths.
It's a table with 3 columns: file mode, file path, SHA-1.
Each subfolder will be represented by its own tree object.

## Tree format

A tree is a concatenation of records of the format:
```
[mode] space [path] 0x00 [sha-1]
```

## How does Git store version history in the worktree?

1. Each branch is associated with one worktree object
2. Current versions of the worktree is associated with a blob object

## How to parse a tree object?

1. Not sure if there's a format header
2. Follow the tree format.
3. It can be parsed into:
- a mapping of a path to a file mode, and a SHA-1 hash
- when needed, the SHA-1 hash could be read

# What do we do with trees?
1. Every commit object stores a reference to the tree, which represents the working object.
2. Need to modify the paths in the working tree.
3. Need to add to the working tree.
4. Delete from the working tree
5. Show the mode, sha1, object type, from a given file path

Since I'm using `git add`, `git rm` in terms of file paths, I will use the file path as a key and represent it in an unordered_map.
27 changes: 27 additions & 0 deletions include/tree.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#ifndef TREE_H
#define TREE_H

#include <string>
#include <unordered_map>

#include "object.h"

class GitTree : public GitObject {
public:
GitTree(const std::string &data = std::string(""));

void deserialise(
const std::string &data) override; // convert string format to data object
std::string
serialise(GitRepository &repo) override; // convert this to a string format
std::string print_matching_files(
const std::string &filePathPattern); // print tree entries that
// match the given file path
void init();

protected:
std::vector<std::string> pathNames;
std::unordered_map<std::string, std::tuple<int, std::string>> fileEntries;
};

#endif // TREE_H
3 changes: 2 additions & 1 deletion include/util.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,6 @@ namespace fs = boost::filesystem;
std::string read_file(const fs::path &filePath);
bool create_file(const fs::path &filePath, const std::string &content = "");
std::string sha1_hexdigest(const std::string &data);

std::string binaryToHex(const std::string &binary);
std::string hexToBinary(const std::string &hexString);
#endif // UTIL_H
2 changes: 1 addition & 1 deletion src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ target_link_libraries(repository PRIVATE inih util boost_libraries)
target_include_directories(repository PUBLIC ../include)
target_compile_features(repository PUBLIC cxx_std_17)

add_library(object object.cpp blob.cpp commit.cpp)
add_library(object object.cpp blob.cpp commit.cpp tree.cpp)
target_link_libraries(object PRIVATE repository boost_libraries)
target_include_directories(object PUBLIC ../include)
target_compile_features(object PUBLIC cxx_std_17)
Expand Down
2 changes: 1 addition & 1 deletion src/cat-file.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ void catfile(std::vector<std::string> &args) {
if (repo) {
GitObject *obj =
GitObject::read(*repo, GitObject::find(*repo, hash, type));
std::cout << obj->serialise(*repo) << "\n";
std::cout << obj->serialise(*repo);
}
} catch (std::runtime_error &err) {
std::cerr << err.what() << "\n";
Expand Down
49 changes: 49 additions & 0 deletions src/tree.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#include "tree.h"
#include "repository.h"
#include "util.h"
#include <iostream>
#include <sstream>

GitTree::GitTree(const std::string &data) : GitObject() {
this->deserialise(data);
};

std::string GitTree::serialise(GitRepository &repo) {
std::stringstream ss;
for (const auto &path : this->pathNames) {
auto &[mode, sha] = this->fileEntries[path];
ss << mode << " " << path << '\0' << hexToBinary(sha);
}
return ss.str();
}

void GitTree::deserialise(const std::string &data) {
int curr = 0;
while (curr < data.size()) {
int space = data.find(' ', curr);
int mode = std::stoi(data.substr(curr, space - curr));
curr = space + 1;
space = data.find('\0', curr);
std::string path = data.substr(curr, space - curr);
curr = space + 1;
std::string sha = binaryToHex(data.substr(curr, 20));
curr += 20;
this->fileEntries[path] = {mode, sha};
pathNames.push_back(path);
}
// maintain sort order via path names to ensure
// consistent tree objects are generated each time
std::sort(pathNames.begin(), pathNames.end());
}

std::string GitTree::print_matching_files(const std::string &filePath) {
std::stringstream ss;
for (const auto &path : this->pathNames) {
// TODO: implement better file path matching
if (filePath.empty() || path == filePath) {
auto &[mode, sha] = this->fileEntries[path];
ss << mode << " " << sha << " " << path << "\n";
}
}
return ss.str();
}
126 changes: 75 additions & 51 deletions src/util.cpp
Original file line number Diff line number Diff line change
@@ -1,69 +1,93 @@
#include <iostream>
#include <iomanip>
#include <string>
#include <fstream>
#include <sstream>
#include <boost/filesystem.hpp>
#include <boost/uuid/detail/sha1.hpp>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>

namespace fs = boost::filesystem;

std::string read_file(const fs::path &filePath) {
try {
// Open the file
std::ifstream fileStream(filePath.string());
try {
// Open the file
std::ifstream fileStream(filePath.string());

// Check if the file is successfully opened
if (fileStream.is_open()) {
// Read the content of the file
std::stringstream buffer;
buffer << fileStream.rdbuf();
return buffer.str();
} else {
std::cerr << "Error opening the file for reading: " << filePath << std::endl;
return "";
}
} catch (const std::exception& e) {
std::cerr << "Exception: " << e.what() << std::endl;
return "";
// Check if the file is successfully opened
if (fileStream.is_open()) {
// Read the content of the file
std::stringstream buffer;
buffer << fileStream.rdbuf();
return buffer.str();
} else {
std::cerr << "Error opening the file for reading: " << filePath
<< std::endl;
return "";
}
} catch (const std::exception &e) {
std::cerr << "Exception: " << e.what() << std::endl;
return "";
}
}

bool create_file(const fs::path &filePath, const std::string& content = "") {
try {
// Create the file
std::ofstream fileStream(filePath.string());
bool create_file(const fs::path &filePath, const std::string &content = "") {
try {
// Create the file
std::ofstream fileStream(filePath.string());

// Check if the file is successfully opened
if (fileStream.is_open()) {
// Write content to the file if provided
if (!content.empty()) {
fileStream << content;
}
// Check if the file is successfully opened
if (fileStream.is_open()) {
// Write content to the file if provided
if (!content.empty()) {
fileStream << content;
}

std::cout << "File created successfully: " << filePath << std::endl;
return true;
} else {
std::cerr << "Error opening the file for writing: " << filePath << std::endl;
return false;
}
} catch (const std::exception& e) {
std::cerr << "Exception: " << e.what() << std::endl;
return false;
std::cout << "File created successfully: " << filePath << std::endl;
return true;
} else {
std::cerr << "Error opening the file for writing: " << filePath
<< std::endl;
return false;
}
} catch (const std::exception &e) {
std::cerr << "Exception: " << e.what() << std::endl;
return false;
}
}

std::string sha1_hexdigest(const std::string& data) {
boost::uuids::detail::sha1 sha1;
sha1.process_bytes(data.data(), data.size());
std::string sha1_hexdigest(const std::string &data) {
boost::uuids::detail::sha1 sha1;
sha1.process_bytes(data.data(), data.size());

unsigned int digest[5];
sha1.get_digest(digest);
unsigned int digest[5];
sha1.get_digest(digest);

std::stringstream ss;
ss << std::hex << std::setfill('0');
for (unsigned int i : digest) {
ss << std::setw(8) << i;
}
return ss.str();
std::stringstream ss;
ss << std::hex << std::setfill('0');
for (unsigned int i : digest) {
ss << std::setw(8) << i;
}
return ss.str();
}

std::string binaryToHex(const std::string &binaryData) {
std::ostringstream hexStream;
for (unsigned char byte : binaryData) {
hexStream << std::setw(2) << std::setfill('0') << std::hex
<< static_cast<int>(byte);
}
return hexStream.str();
}

std::string hexToBinary(const std::string &hexString) {
std::string binaryData;
for (size_t i = 0; i < hexString.size(); i += 2) {
// Take two hex characters at a time
std::string byteStr = hexString.substr(i, 2);

// Convert hex pair to a single byte (char)
char byte = static_cast<char>(std::stoi(byteStr, nullptr, 16));
binaryData.push_back(byte);
}
return binaryData;
}