Code Search in large codebase
- python cindex.py folder_to_index
- python csearch.py regex
- uses re for now
Broad classification of code search tools based on how the user enters the search query
- Search with strings
- Search with code
- Search with natural language
- Search with regex
-
Regex over entire codebase: This approach is slow because regular expression over entire text takes
O(n)
time (Thompson NFA). -
Regex over an index: This approach aims to reduce the number of files over which the regex is performed by building an index. E.g. Google Code Search
Google Code Search has the following components
- Build an index
- Prepare trigrams from user query
- Lookup in the index, regex search over the corresponding files
-