-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Addresses #9 - improves test coverage and documentation, also bugs
Signed-off-by: Tim Bray <[email protected]>
- Loading branch information
Showing
23 changed files
with
50,820 additions
and
408 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,22 @@ | ||
.PHONY: test | ||
|
||
all: test build linux | ||
all: test build | ||
|
||
test: */*.go | ||
go test -v ./... && go vet ./... | ||
|
||
build: */*.go | ||
go build -o bin/tf . | ||
build: bin/macos-arm/tf bin/macos-x86/tf bin/linux-x86/tf bin/linux-arm/tf | ||
|
||
bin/macos-arm/tf: */*.go | ||
GOOS=darwin GOARCH=arm64 go build -o bin/macos-arm/tf | ||
|
||
bin/macos-x86/tf: */*.go | ||
GOOS=darwin GOARCH=amd64 go build -o bin/macos-x86/tf | ||
|
||
bin/linux-x86/tf: */*.go | ||
GOOS=linux GOARCH=amd64 go build -o bin/linux-x86/tf | ||
|
||
bin/linux-arm/tf: */*.go | ||
GOOS=linux GOARCH=arm64 go build -o bin/linux-arm/tf | ||
|
||
linux: */*.go | ||
GOOS=linux go build -o bin/ltf . | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,111 +1,95 @@ | ||
# topfew | ||
A program that finds records in which a | ||
certain field or combination of fields occurs | ||
most frequently | ||
A program that finds and prints out the top few records in which a certain field or combination of fields occurs most frequently. | ||
|
||
## Usage | ||
|
||
```shell | ||
tf | ||
-n, --n [number of lines] | ||
-f, --fields [fieldlist] | ||
-h, -help, --help | ||
-g, --grep [regexp] | ||
-v, --vgrep [regexp] | ||
-s, --sed [regexp] [replacement] | ||
-w, --width [number of file segments] | ||
-sample | ||
[filename] | ||
-n, --number (output line count) [default is 10] | ||
-f, --fields (field list) [default is the whole record] | ||
-g, --grep (regexp) [may repeat, default is accept all] | ||
-v, --vgrep (regexp) [may repeat, default is reject none] | ||
-s, --sed (regexp) (replacement) [may repeat, default is no changes] | ||
-w, --width (segment count) [default is result of runtime.numCPU()] | ||
--sample | ||
-h, -help, --help | ||
filename [default is stdin] | ||
|
||
All the arguments are optional; if none are provided, tf will read records | ||
from the standard input and list the 10 which occur most often. | ||
``` | ||
## Options | ||
`-n integer`, `--number integer` How many of the highest‐occurrence‐count lines to print out. The | ||
default value is 10. | ||
`-n integer`, `--number integer` How many of the highest‐occurrence‐count lines to print out. | ||
The default value is 10. | ||
`-f fieldlist, --fields fieldlist` Specifies which fields should be extracted from incoming records | ||
and used in computing occurrence counts. The fieldlist must be a | ||
comma‐separated list of integers identifying field numbers, | ||
which start at one, for example 3 and 2,5,6. The fields | ||
must be provided in order, so 3,1,7 is an error. | ||
`-f fieldlist, --fields fieldlist` Specifies which fields should be extracted from incoming records and used in computing occurrence counts. | ||
The fieldlist must be a comma‐separated list of integers identifying field numbers, which start at one, for example 3 and 2,5,6. | ||
The fields must be provided in order, so 3,1,7 is an error. | ||
If no fieldlist is provided, **tf** treats the whole input record as a single field. | ||
`-g, regexp`, `--grep regexp` | ||
`-g regexp`, `--grep regexp` | ||
The initial **g** suggests `grep`. These options apply the provided | ||
regular expression to, respectively, each record as it is read | ||
and each field‐set as it is extracted, and if the regexp does | ||
not match the record or field, cause tf to bypass the record. | ||
The initial **g** suggests `grep`. | ||
This option applies the provided regular expression to each record as it is read and if the regexp does not match the record, **tf** bypasses it. | ||
These options can be provided multiple times; the provided regu‐ | ||
lar expressions will be applied in the order they appear on the | ||
command line. | ||
This option can be provided multiple times; the provided regular expressions will be applied in the order they appear on the command line. | ||
`-v regexp`, `--vgrep regegxp` | ||
The initial **v** suggests "grep ‐v". These operations are the in‐ | ||
verse of `‐grecord` and `‐gfield`, rejecting records and extracted | ||
fields that match the provided regular expression. As with | ||
those operations, these can be provided multiple times. | ||
The initial **v** suggests `grep ‐v`. This operation is the inverse of `-g` and `-‐grep`, rejecting records that match the provided regular expression. | ||
As with `grep`, it can be provided multiple times. | ||
`-s regexp replacement`, `--sed regexp replacement` | ||
As its name suggests, applies sed‐style editing by replacing any | ||
text that matches the provided regexp with the provided replace‐ | ||
ment. It works on the fields in the fieldlist after they have | ||
been extracted from the record. | ||
As its name suggests, applies sed‐style editing by replacing any text that matches the provided regexp with the provided replacement. | ||
It works on the fields in the fieldlist after they have been extracted from the record. | ||
If ()‐enclosed capturing groups appear in the regexp, they may | ||
be referred to as **$1**, **$2**, and so on in, the replacement. | ||
If ()‐enclosed capturing groups appear in the regexp, they may be referred to as **$1**, **$2**, and so on in, the replacement. | ||
This option can be provided many times, and the replacement op‐ | ||
erations are performed in the order they appear on the command | ||
line. | ||
This option can be provided many times, and the replacement operations are performed in the order they appear on the command line. | ||
`--sample` | ||
It can be tricky to get the regular expressions in the `−g`, | ||
`−v`, and `−s` options right. Specifying | ||
`-−sample` causes **tf** to print lines to the standard output that | ||
display the filtering and field‐editing logic. It can only be | ||
used when processing standard input, not a file. | ||
It can be tricky to get the regular expressions in the `−g`, `−v`, and `−s` options right. | ||
Specifying `-−sample` causes **tf** to print lines to the standard output that display the filtering and field‐editing logic. | ||
It can only be used when processing standard input, not a file. | ||
`-w integer`, `--width integer` | ||
If a file name is specified then **tf**, rather than reading it from | ||
end to end, will divide it into segements and process it in multiple | ||
parallel threads. The optimal number of threads depends in a | ||
complicated way on how many cores your CPU has what kind of cores | ||
they are, and the storage architecture. | ||
If a file name is specified then **tf**, rather than reading it from end to end, will divide it into segments and process it in multiple parallel threads. | ||
The optimal number of threads depends in a complicated way on how many cores your CPU has what kind of cores they are, and the storage architecture. | ||
The default is the result of the Go `runtime.NumCPU()` calls and | ||
often produces good results. | ||
The default is the result of the Go `runtime.NumCPU()` calls and often produces good results. | ||
`-h`, `-help`, `--help` | ||
Describes the function and options of tf. | ||
Describes the function and options of **tf**. | ||
## Examples | ||
To find the IP address that most commonly hits your | ||
web site, given an Apache logfile named `access_log` | ||
To find the IP address that most commonly hits your web site, given an Apache logfile named `access_log`. | ||
`tf -fields 1 access_log` | ||
`tf --fields 1 access_log` | ||
The same effect could be achieved with | ||
`awk '{print $1}' access_log | sort | uniq -c | sort -rn | head` | ||
But tf is usualy much faster. | ||
But **tf** is usually much faster. | ||
Do the same, but exclude high-traffic bots (omiting `access_log`) | ||
Do the same, but exclude high-traffic bots (omitting the filename). | ||
`tf -fields 1 -vrecord googlebot -vrecord bingbot` | ||
`tf -fields 1 -vgrep googlebot -vgrep bingbot` | ||
Most popular IP addresses from May 2020. | ||
`tf -fields 1 -grecord '\[../May/2020' ` | ||
`tf -fields 1 -grep '\[../May/2020'` | ||
Most popular hour/minute of the day for retrievals | ||
Most popular hour/minute of the day for retrievals. | ||
`tf -fields 4 -sed "\\[" "" -sed '^[^:]*:' '' -sed ':..$' '' ` | ||
`tf -fields 4 -sed "\\[" "" -sed '^[^:]*:' '' -sed ':..$' ''` | ||
## Credits | ||
Tim Bray created version 0.1 of Topfew, and the path toward 1.0 was based chiefly on ideas stolen from Dirkjan Ochtman and contributed by Simon Fell. |
Oops, something went wrong.