Skip to content

CLI tool to produce MD context files from many sources, to help interact with LLMs (ChatGPT, Llama3, Claude, etc.).

License

Notifications You must be signed in to change notification settings

Tanq16/ai-context

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Context Logo

AI Context

Release Build GitHub Release

Go Report Card License: MIT GoDoc

Generate AI-friendly markdown files from GitHub repositories, local source code, YouTube videos, or webpages.


A multi-architecture, multi-OS, command-line tool with concurrency support that produces context files in markdown from various sources to make interactions with LLM apps (like ChatGPT, Claude, etc.) easy.

Quickstart

ai-context -u "https://github.com/tanq16/ai-context" # single URL
ai-context -f urllist.file                           # URL file

Features

  • Local Directory Processing
    • this is mainly for locally available code bases (directories or already cloned git repos)
    • the context file includes directory structure and all file contents within context
  • GitHub Repository Processing
    • this clones and processes provided GitHub link and does the same as Local Directory Processing
    • it temporarily clones the repository, so no need for cleanup
    • it also supports private repositories on GitHub through use of GH_TOKEN environment variable
  • YouTube Transcript Processing
    • this downloads transcripts for given YouTube video link and stores it as markdown
    • the transcript also preserves time segments
  • WebPage Processing
    • this converts an HTML webpage to markdown text, stripping off JS and CSS
    • it also downloads all images from the page and stores them locally with UUID filenames
    • the markdown text includes links via local paths to the downloaded images

Installation

  • Binary
    • Download the latest release for your platform and OS from the releases page
    • Binaries are build via GitHub actions for MacOS, Linux, and Windows for both AMD64 (x86_64) and ARM64 (like Apple Silicon) architectures
    • You can also download specific versions if needed; however, the latest version is recommended
  • Go Install
    • Run the following command (requires Go v1.22+):
    go install github.com/tanq16/ai-context@latest
    • For specific versions, use binaries or build specific commits as I have not and will not implement Go-native binary versioning
  • Local Build
    git clone https://github.com/tanq16/ai-context.git && \
    cd ai-context
    go build .

Usage

# Process a single path (local directory) with additional ignore patterns
ai-context -u /path/to/directory  -i "tests,docs,*doc.*"

# Process one URL (GitHub repo or YouTube Video or Webpage URL)
ai-context -u https://www.youtube.com/watch?v=video_id

# Make a list of paths
cat << EOF > listfile
../notif
/working/cybernest
https://github.com/assetnote/h2csmuggler
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html
EOF

# Process URL list concurrently
ai-context -f listfile

# Process private GitHub repository
GH_TOKEN=$(cat /secrets/GH.PAT) ai-context -u https://github.com/ORG/REPO

Warning

For directory path (in URL or listfile mode), the path should either start with / (absolute) or with ./ or ../ (relative). For current directory, always use ./ for correct regex matching.

Output

  • The tool creates a local folder called context and puts everything converted into .md files in that folder
  • The filenames have the syntax of TYPE-PATHNAME.md (example, gh-ffuf_ffuf.md)
  • Every single path in the listfile mode will result in a new context file
  • All images (only downloaded via webpages) are named as UUIDs and stored in the context/images directory (images are downloaded as a conenience, but doesn't take away from text-first context creation)

Command Line Options

  • -u, --url: provide a path (GitHub repo, YouTube video, WebPage link, or relative/absolute directory path) to process
  • -f, --file: provide a file with a list of paths (URLs or directory paths) to process
  • -i, --ignore: add additional patterns to ignore during processing (comma-separated)
  • -t, --threads: (optional) number of workers for concurrent file processing when passing list file (default = 5)
  • --debug: verbose logging (helpful if something isn't working as expected or you want to see individual steps)

Tip

  • Do a head -n 200 context/FILE.md (or 500 lines) to view the content tree of the processed code base or directory to see what's been included. Then refine your -i flag arguments to ignore additional patterns.
  • When processing a large number of items, it can look stalled due to thread limits and image download times; use --debug to enable verbose logs to know what's running.

Default Ignores

The tool includes pre-defined and sensible ignore patterns, including common files and directories that typically don't add value to the context. These are:

  • Version control files (.git, .gitignore)
  • Dependencies (node_modules, vendor)
  • Compiled files (*.exe, *.dll)
  • Media files (images, videos, audio)
  • Documentation files
  • Lock files (package-lock.json, yarn.lock)
  • Build artifacts and caches

For a full list, see aicontext/ignores.go.

Acknowledgments

This project takes inspiration from, uses, or references:

  • repomix: inspiration for turning code into context
  • innertube: inspiration for code to get transcript from YouTube video
  • html-to-markdown: used to convert HTML to MD
  • go-git: git operations in Go