Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: cyclone-github/spider
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.7.1
Choose a base ref
...
head repository: cyclone-github/spider
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref
  • 11 commits
  • 4 files changed
  • 2 contributors

Commits on Jan 2, 2024

  1. Update README.md

    cyclone-github authored Jan 2, 2024

    Unverified

    This user has not yet uploaded their public signing key.
    Copy the full SHA
    2d3e916 View commit details
  2. Copy the full SHA
    a0dfe0d View commit details
  3. Bump golang.org/x/net from 0.7.0 to 0.17.0

    Bumps [golang.org/x/net](https://github.com/golang/net) from 0.7.0 to 0.17.0.
    - [Commits](golang/net@v0.7.0...v0.17.0)
    
    ---
    updated-dependencies:
    - dependency-name: golang.org/x/net
      dependency-type: indirect
    ...
    
    Signed-off-by: dependabot[bot] <[email protected]>
    dependabot[bot] authored Jan 2, 2024
    Copy the full SHA
    5540c21 View commit details
  4. Merge pull request #1 from cyclone-github/dependabot/go_modules/golan…

    …g.org/x/net-0.17.0
    
    Bump golang.org/x/net from 0.7.0 to 0.17.0
    cyclone-github authored Jan 2, 2024
    Copy the full SHA
    4994139 View commit details

Commits on Jan 10, 2024

  1. Update README.md

    cyclone-github authored Jan 10, 2024
    Copy the full SHA
    2fec49d View commit details

Commits on Apr 19, 2024

  1. Bump golang.org/x/net from 0.17.0 to 0.23.0

    Bumps [golang.org/x/net](https://github.com/golang/net) from 0.17.0 to 0.23.0.
    - [Commits](golang/net@v0.17.0...v0.23.0)
    
    ---
    updated-dependencies:
    - dependency-name: golang.org/x/net
      dependency-type: indirect
    ...
    
    Signed-off-by: dependabot[bot] <[email protected]>
    dependabot[bot] authored Apr 19, 2024
    Copy the full SHA
    28319f9 View commit details
  2. Merge pull request #2 from cyclone-github/dependabot/go_modules/golan…

    …g.org/x/net-0.23.0
    
    Bump golang.org/x/net from 0.17.0 to 0.23.0
    cyclone-github authored Apr 19, 2024
    Copy the full SHA
    58da087 View commit details

Commits on Apr 30, 2024

  1. Update README.md

    cyclone-github authored Apr 30, 2024
    Copy the full SHA
    7c6fca6 View commit details

Commits on Jul 1, 2024

  1. Update go.mod

    go 1.22.4
    goquery v1.9.2
    cascadia v1.3.2
    net v0.26.0
    cyclone-github authored Jul 1, 2024
    Copy the full SHA
    ba8e482 View commit details

Commits on Dec 22, 2024

  1. Update README.md

    cyclone-github authored Dec 22, 2024
    Copy the full SHA
    1c85bd9 View commit details

Commits on Dec 28, 2024

  1. Update README, go.mod, go.sum

    cyclone-github authored Dec 28, 2024
    Copy the full SHA
    edd82f3 View commit details
Showing with 162 additions and 11 deletions.
  1. +29 −0 CHANGELOG.md
  2. +53 −11 README.md
  3. +10 −0 go.mod
  4. +70 −0 go.sum
29 changes: 29 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
### version 0.7.1
```
added progress bars to word / ngrams processing & file writing operations
added RAM usage monitoring
optimized order of operations for faster processing with less RAM
TO-DO: refactor code
```
### version 0.7.0
```
added feature to allow crawling specific file extensions (html, htm, txt)
added check to keep crawler from crawling offsite URLs
added flag "-delay" to avoid rate limiting (-delay 100 == 100ms delay between URL requests)
added write buffer for better performance on large files
increased crawl depth from 5 to 100 (not recommended, but enabled for edge cases)
fixed out of bounds slice bug when crawling URLs with NIL characters
fixed bug when attempting to crawl deeper than available URLs to crawl
fixed crawl depth calculation
optimized code which runs 2.8x faster vs v0.6.x during bench testing
```
### version 0.6.2
```
fixed scraping logic & ngram creations bugs
switched from gocolly to goquery for web scraping
remove dups from word / ngrams output
```
### version 0.5.10
```
initial github release
```
64 changes: 53 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,61 @@
[![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=cyclone-github&repo=spider&theme=gruvbox)](https://github.com/cyclone-github/spider/)

[![Go Report Card](https://goreportcard.com/badge/github.com/cyclone-github/spider)](https://goreportcard.com/report/github.com/cyclone-github/spider)
[![GitHub issues](https://img.shields.io/github/issues/cyclone-github/spider.svg)](https://github.com/cyclone-github/spider/issues)
[![License](https://img.shields.io/github/license/cyclone-github/spider.svg)](LICENSE)
[![GitHub release](https://img.shields.io/github/release/cyclone-github/spider.svg)](https://github.com/cyclone-github/spider/releases)
[![Go Reference](https://pkg.go.dev/badge/github.com/cyclone-github/spider.svg)](https://pkg.go.dev/github.com/cyclone-github/spider)

# Cyclone's URL Spider
<!-- ![image](https://i.imgur.com/Z6RjlUv.png) -->
```
----------------------
| Cyclone's URL Spider |
----------------------
![image](https://i.imgur.com/Z6RjlUv.png)
Crawling URL: https://forum.hashpwn.net
Base domain: forum.hashpwn.net
Crawl depth: 2
ngram len: 1-3
Crawl delay: 0ms (increase this to avoid rate limiting, ex: -delay 100)
URLs crawled: 51
Processing... [====================] 100.00%
Unique words: 1983
Unique ngrams: 11030
Writing... [====================] 100.00%
Output file: forum.hashpwn.net_wordlist.txt
RAM used: 0.03 GB
Runtime: 4.949s
```

Wordlist & ngram creation tool to crawl a given url and create a wordlist / ngrams (depending on flags given).
Wordlist & ngram creation tool to crawl a given url and create wordlists and/or ngrams (depending on flags given).
### Usage Instructions:
- To create a simple wordlist from a specified url (will save deduplicated wordlist to url_wordlist.txt):
- ./spider.bin -url https://github.com/cyclone-github
- To create ngrams, use flag "-ngram" and set ngram level such as "-ngram 2" or a range "-ngram 1-3"
- ./spider.bin -url https://github.com/cyclone-github -ngram 1-3
- To set url crawl url depth of 2, use flag "-crawl 2"
- ./spider.bin -url https://github.com/cyclone-github -crawl 2
- `./spider.bin -url https://github.com/cyclone-github`
- To set url crawl url depth of 2 and create ngrams len 1-5, use flag "-crawl 2" and "-ngram 1-5"
- `./spider.bin -url https://github.com/cyclone-github -crawl 2 -ngram 1-5`
- To set a custom output file, use flag "-o filename"
- ./spider.bin -url https://github.com/cyclone-github -o wordlist.txt
- `./spider.bin -url https://github.com/cyclone-github -o wordlist.txt`
- To set a delay to keep from being rate-limited, use flag "-delay nth" where nth is time in milliseconds
- ./spider.bin -url https://github.com/cyclone-github -delay 100
- `./spider.bin -url https://github.com/cyclone-github -delay 100`
- Run `./spider.bin -help` to see a list of all options

### Compile from source:
- If you want the latest features, compiling from source is the best option since the release version may run several revisions behind the source code.
- This assumes you have Go and Git installed
- `git clone https://github.com/cyclone-github/spider.git`
- `cd spider`
- `go mod init spider`
- `go mod tidy`
- `go build -ldflags="-s -w" .`
- Compile from source code how-to:
- https://github.com/cyclone-github/scripts/blob/main/intro_to_go.txt
### Change Log:
- https://github.com/cyclone-github/spider/blob/main/CHANGELOG.md
### Mentions:
- Go Package Documentation: https://pkg.go.dev/github.com/cyclone-github/spider
- Softpedia: https://www.softpedia.com/get/Internet/Other-Internet-Related/Cyclone-s-URL-Spider.shtml

### Compile from source code info:
- https://github.com/cyclone-github/scripts/blob/main/intro_to_go.txt
### Antivirus False Positives:
- Several antivirus programs on VirusTotal incorrectly detect compiled Go binaries as a false positive. This issue primarily affects the Windows executable binary, but is not limited to it. If this concerns you, I recommend carefully reviewing the source code, then proceed to compile the binary yourself.
- Uploading your compiled binaries to https://virustotal.com and leaving an up-vote or a comment would be helpful as well.
10 changes: 10 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
module spider

go 1.23.4

require github.com/PuerkitoBio/goquery v1.10.1

require (
github.com/andybalholm/cascadia v1.3.3 // indirect
golang.org/x/net v0.33.0 // indirect
)
70 changes: 70 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
github.com/PuerkitoBio/goquery v1.10.1 h1:Y8JGYUkXWTGRB6Ars3+j3kN0xg1YqqlwvdTV8WTFQcU=
github.com/PuerkitoBio/goquery v1.10.1/go.mod h1:IYiHrOMps66ag56LEH7QYDDupKXyo5A8qrjIx3ZtujY=
github.com/andybalholm/cascadia v1.3.3 h1:AG2YHrzJIm4BZ19iwJ/DAua6Btl3IwJX+VI4kktS1LM=
github.com/andybalholm/cascadia v1.3.3/go.mod h1:xNd9bqTn98Ln4DwST8/nG+H0yuB8Hmgu1YHNnWw0GeA=
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.13.0/go.mod h1:y6Z2r+Rw4iayiXXAIxJIDAJ1zMW4yaTpebo8fPOliYc=
golang.org/x/crypto v0.19.0/go.mod h1:Iy9bg/ha4yyC70EfRS8jz+B6ybOBKMaSxLj6P6oBDfU=
golang.org/x/crypto v0.23.0/go.mod h1:CKFgDieR+mRhux2Lsu27y0fO304Db0wZe70UKqHu0v8=
golang.org/x/crypto v0.31.0/go.mod h1:kDsLvtWBEx7MV9tJOj9bnXsPbxwJQ6csT/x4KIN4Ssk=
golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4=
golang.org/x/mod v0.8.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
golang.org/x/mod v0.12.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
golang.org/x/mod v0.15.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/mod v0.17.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c=
golang.org/x/net v0.6.0/go.mod h1:2Tu9+aMcznHK/AK1HMvgo6xiTLG5rD5rZLDS+rp2Bjs=
golang.org/x/net v0.10.0/go.mod h1:0qNGK6F8kojg2nk9dLZ2mShWaEBan6FAoqfSigmmuDg=
golang.org/x/net v0.15.0/go.mod h1:idbUs1IY1+zTqbi8yxTbhexhEEk5ur9LInksu6HrEpk=
golang.org/x/net v0.21.0/go.mod h1:bIjVDfnllIU7BJ2DNgfnXvpSvtn8VRwhlsaeUTyUS44=
golang.org/x/net v0.25.0/go.mod h1:JkAGAh7GEvH74S6FOH42FLoXpXbE/aqXSrIQjXgsiwM=
golang.org/x/net v0.33.0 h1:74SYHlV8BIgHIFC/LrYkOGIwL19eTYXQ5wc6TBuO36I=
golang.org/x/net v0.33.0/go.mod h1:HXLR5J+9DxmrqMwG9qjGCxZ+zKXxBru04zlTvWlWuN4=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.3.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y=
golang.org/x/sync v0.6.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.7.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sync v0.10.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.8.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.12.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.17.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.20.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.28.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/telemetry v0.0.0-20240228155512-f48c80bd79b2/go.mod h1:TeRTkGYfJXctD9OcfyVLyj2J3IxLnKwHJR8f4D8a3YE=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
golang.org/x/term v0.5.0/go.mod h1:jMB1sMXY+tzblOD4FWmEbocvup2/aLOaQEp7JmGp78k=
golang.org/x/term v0.8.0/go.mod h1:xPskH00ivmX89bAKVGSKKtLOWNx2+17Eiy94tnKShWo=
golang.org/x/term v0.12.0/go.mod h1:owVbMEjm3cBLCHdkQu9b1opXd4ETQWc3BhuQGKgXgvU=
golang.org/x/term v0.17.0/go.mod h1:lLRBjIVuehSbZlaOtGMbcMncT+aqLLLmKrsjNrUguwk=
golang.org/x/term v0.20.0/go.mod h1:8UkIAJTvZgivsXaD6/pH6U9ecQzZ45awqEOzuCvwpFY=
golang.org/x/term v0.27.0/go.mod h1:iMsnZpn0cago0GOrHO2+Y7u7JPn5AylBrcoWkElMTSM=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
golang.org/x/text v0.7.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8=
golang.org/x/text v0.9.0/go.mod h1:e1OnstbJyHTd6l/uOt8jFFHp6TRDWZR/bV3emEE/zU8=
golang.org/x/text v0.13.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE=
golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/text v0.15.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/text v0.21.0/go.mod h1:4IBbMaMmOPCJ8SecivzSH54+73PCFmPWxNTLm+vZkEQ=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc=
golang.org/x/tools v0.6.0/go.mod h1:Xwgl3UAJ/d3gWutnCtw505GrjyAbvKui8lOU390QaIU=
golang.org/x/tools v0.13.0/go.mod h1:HvlwmtVNQAhOuCjW7xxvovg8wbNq7LwfXh/k7wXUl58=
golang.org/x/tools v0.21.1-0.20240508182429-e35e4ccd0d2d/go.mod h1:aiJjzUbINMkxbQROHiO6hDPo2LHcIPhhQsa9DLh0yGk=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=