Taiko

A CLI tool to generate LLM prompts from the codebase.

Features

🔍 Generate structured prompts of the codebase
🎯 Smart file filtering and gitignore support
📊 Token counting for different LLM models
📁 Flexible file and directory exclusion

Whats this for?

The primary purpose of generated markdown files is to serve as input to a LLM, such as those provided by OpenAI (GPT-4, GPT-3.5 Turbo), Anthropic (Claude), or others. The Markdown file contains a structured representation of the codebase, ready to be fed into an LLM.
The token count is crucial. LLMs have a limited context window (e.g., GPT-4 has variants with 8k, 32k, and 128k token limits). The precise token count tells you exactly how much of a model's context window this codebase representation will consume. This is far more accurate than naive character counting or word counting. Knowing the token count allows you to choose the right model - If your token count exceeds a model's limit, you know you need to use a model with a larger context window, or you need to reduce the size of the input (more on this below), avoid errors - attempting to send more tokens than a model can handle will result in an error and estimate costs - many LLM APIs charge based on token usage. The accurate token count allows you to estimate the cost of processing your codebase.

Installation

Clone the repository:

git clone https://github.com/harshpratapsingh/taiko
cd taiko

Build and install:

cargo install --path .

Usage

# Generate a code review prompt
taiko . -o review.md

# Generate a security audit prompt
taiko . -p security -o security.md

# Generate documentation prompt
taiko . -p docs -o docs.md

# Exclude patterns
taiko . -e "Cargo.lock,target,.md" -o review.md
# This will exclude:
# - Specific file: "Cargo.lock"
# - Directory: "target"
# - File extension: ".md" (all markdown files)

# Choose specific tokenizer
taiko . -n p50k -o review.md

Options

Options:
  -i, --include <PATTERN>      Include only specific file extensions (comma-separated)
  -e, --exclude <PATTERN>      Exclude patterns (comma-separated):
                              - file.ext: Exclude specific file
                              - .ext: Exclude all files with extension
                              - dir: Exclude directory
  -n, --encoding <MODEL>       Choose tokenizer model [default: cl100k]
  -o, --output <FILE>          Output file path
  -p, --prompt-type <TYPE>     Prompt type [default: code_review]
  -h, --help                   Print help
  -V, --version               Print version

Exclude Pattern Examples

# Exclude specific files
taiko . -e "Cargo.lock,package-lock.json" -o review.md

# Exclude file extensions
taiko . -e ".md,.json,.lock" -o review.md

# Exclude directories
taiko . -e "target,node_modules,dist" -o review.md

# Mix different patterns
taiko . -e "Cargo.lock,.md,target,dist,.json" -o review.md

Available Tokenizers

cl100k: For GPT-4 and GPT-3.5 Turbo models
p50k: For text-davinci and code-davinci models
p50k_edit: For edit models
r50k/gpt2: For GPT-3 models like davinci

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
prompt-macro		prompt-macro
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
review.md		review.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Taiko

Features

Whats this for?

Installation

Usage

Options

Exclude Pattern Examples

Available Tokenizers

License

About

Releases

Packages

Languages

License

harsh-ps-2003/taiko

Folders and files

Latest commit

History

Repository files navigation

Taiko

Features

Whats this for?

Installation

Usage

Options

Exclude Pattern Examples

Available Tokenizers

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages