rust_tantivy
is a Rust-based project utilizing the Tantivy search engine library to index and search the contents of files within a specified folder. The project supports searching for single terms, phrases, and regular expressions, with capabilities to update the index whenever files are modified.
To get started with rust_tantivy
, ensure that Rust is installed on your system. If Rust is not installed, you can install it by following the instructions here.
- Clone the repository:
git clone https://github.com/itsmesamster/rust_tantivy.git cd rust_tantivy
- Build the project
cargo build --release`
To use rust_tantivy
, specify the folder you want to index and search within. The following command demonstrates how to run the main program:
cargo run --release -- <folder_path>`
Replace <folder_path>
with the path to the folder you want to index.
cargo run --release -- /path/to/your/folder`
This command indexes the folder's contents and performs searches for predefined terms, phrases, and regex patterns.
- Cargo.toml: The manifest file for Rust, containing metadata for the project and its dependencies.
- src/main.rs: The program's entry point, handling command-line arguments and invoking indexing and searching functions.
- src/lib.rs: Contains the core functionality for indexing, updating the index, searching, and generating reports.
The create_index_from_folder
function indexes all files in the specified folder. It reads the content of each file and indexes it based on the schema defined within the function. The index is stored in the directory specified by index_path
.
The update_index_with_new_files
function updates the existing index by identifying files added or modified since the last update. It also removes files from the index that are no longer present in the folder.
The project supports searching for:
- Single Terms: Individual words or tokens.
- Phrases: Sequences of words enclosed in quotes.
- Regular Expressions: Patterns that match specific character sequences.
These searches are handled by the search_terms_in_index
, search_phrases_in_index
, and search_regex_in_index
functions, respectively.
The create_report
function generates an HTML report of the search results, listing the files where terms, phrases, or regex patterns were found.
To search for single terms in the indexed files:
let search_terms: Vec<&str> = vec!["Australia"];`
To search for specific phrases:
let search_phrases: Vec<&str> = vec!["Cross Roads, Ripley County, Indiana"];`
To search using regular expressions:
let search_regex: Vec<&str> = vec!["d[ai]{2}ry"];`
The project relies on the following Rust crates:
- tantivy - A full-text search engine library in Rust.
- regex - A regular expression library for Rust.
- log - A logging facade for Rust.
- walkdir - A Rust library for recursive directory traversal.
- htmlescape - A library for escaping HTML entities in Rust.
This project is licensed under the MIT License. See the LICENSE file for more details.