Skip to content

Commit

Permalink
Classifier rewrite (#213)
Browse files Browse the repository at this point in the history
The classifier has been re-implemented and now uses a DSL allowing for full customisation. Several bugs have also been fixed.

- Closes #182
- Closes #70
- Closes #68
- Hopefully fixes #126
  • Loading branch information
mgdigital authored Apr 21, 2024
1 parent 7902b93 commit c16f761
Show file tree
Hide file tree
Showing 163 changed files with 7,877 additions and 2,308 deletions.
6 changes: 6 additions & 0 deletions .github/workflows/checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,12 @@ jobs:
uses: actions/setup-node@v3
with:
node-version: 20.x
- name: Setup protoc
uses: arduino/setup-protoc@v3
with:
version: "23.4"
- name: Install protoc-gen-go
run: go install google.golang.org/protobuf/cmd/[email protected]
- name: Install web app, apply database migrations, generate code and build web app
run: |
(cd webui && npm ci); \
Expand Down
6 changes: 6 additions & 0 deletions .mockery.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,15 @@ mockname: "{{.InterfaceName}}"
outpkg: "{{.PackageName}}_mocks"
filename: "{{.InterfaceName}}.go"
packages:
github.com/bitmagnet-io/bitmagnet/internal/classifier:
interfaces:
LocalSearch:
github.com/bitmagnet-io/bitmagnet/internal/protocol/dht/ktable:
interfaces:
Table:
github.com/bitmagnet-io/bitmagnet/internal/protocol/dht/responder:
interfaces:
Limiter:
github.com/bitmagnet-io/bitmagnet/internal/tmdb:
interfaces:
Client:
1 change: 1 addition & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
bitmagnet.io/schemas/**/*.*
webui/dist/**/*.*
webui/src/app/graphql/generated/**/*.*
13 changes: 13 additions & 0 deletions Taskfile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ tasks:
- go run ./internal/gql/enums/gen/genenums.go
- go run ./internal/torznab/gencategories/gencategories.go
- go run github.com/99designs/gqlgen generate --config ./internal/gql/gqlgen.yml
- protoc --go_out=. ./internal/protobuf/bitmagnet.proto
- go run github.com/vektra/mockery/v2
- go run . classifier schema --format json > ./bitmagnet.io/schemas/classifier-0.1.json

lint:
cmds:
Expand Down Expand Up @@ -82,3 +84,14 @@ tasks:
- goose -s create {{.NAME}} sql
vars:
NAME: migration

install-protoc:
cmds:
- |
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v{{.VERSION}}/protoc-{{.VERSION}}-{{.PLATFORM}}.zip
sudo unzip -o protoc-{{.VERSION}}-{{.PLATFORM}}.zip -d /usr/local bin/protoc
sudo unzip -o protoc-{{.VERSION}}-{{.PLATFORM}}.zip -d /usr/local 'include/*'
rm -f protoc-{{.VERSION}}-{{.PLATFORM}}.zip
vars:
VERSION: 23.4
PLATFORM: osx-x86_64
1 change: 1 addition & 0 deletions bitmagnet.io/Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ source 'https://rubygems.org'
gem "just-the-docs", "~> 0.6"
gem "jekyll", "~> 4.3"
gem "jekyll-redirect-from", "~> 0.16"
gem "jekyll-target-blank", "~> 2.0"
gem "kramdown", "~> 2.3"
gem "kramdown-parser-gfm", "~> 1.1"
gem "webrick", "~> 1.8"
7 changes: 7 additions & 0 deletions bitmagnet.io/Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ GEM
sass-embedded (~> 1.54)
jekyll-seo-tag (2.8.0)
jekyll (>= 3.8, < 5.0)
jekyll-target-blank (2.0.2)
jekyll (>= 3.0, < 5.0)
nokogiri (~> 1.10)
jekyll-watch (2.2.1)
listen (~> 3.0)
just-the-docs (0.6.2)
Expand All @@ -55,9 +58,12 @@ GEM
rb-fsevent (~> 0.10, >= 0.10.3)
rb-inotify (~> 0.9, >= 0.9.10)
mercenary (0.4.0)
nokogiri (1.16.4-arm64-darwin)
racc (~> 1.4)
pathutil (0.16.2)
forwardable-extended (~> 2.6)
public_suffix (5.0.3)
racc (1.7.3)
rake (13.0.6)
rb-fsevent (0.11.2)
rb-inotify (0.10.1)
Expand All @@ -78,6 +84,7 @@ PLATFORMS
DEPENDENCIES
jekyll (~> 4.3)
jekyll-redirect-from (~> 0.16)
jekyll-target-blank (~> 2.0)
just-the-docs (~> 0.6)
kramdown (~> 2.3)
kramdown-parser-gfm (~> 1.1)
Expand Down
1 change: 1 addition & 0 deletions bitmagnet.io/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,4 @@ nav_external_links:
favicon_ico: "/assets/images/favicon.png"
plugins:
- jekyll-redirect-from
- jekyll-target-blank
11 changes: 11 additions & 0 deletions bitmagnet.io/_plugins/schemas.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
module Schemas
class Generator < Jekyll::Generator
def generate(site)
Dir.glob(File.join(site.source, 'schemas', '*.json')).each do |json_file|
File.open(File.join(site.dest, 'schemas', File.basename(json_file)), 'w') do |file|
file.write(File.read(json_file))
end
end
end
end
end
6 changes: 3 additions & 3 deletions bitmagnet.io/external-resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ nav_order: 7

Community members have developed the following resources, tools and packages; these are not maintained under the **bitmagnet** project:

- [@davispuh](https://github.com/davispuh){:target="\_blank"} has published an Arch package, `bitmagnet-git`, [in the AUR repository](https://aur.archlinux.org/packages/bitmagnet-git){:target="\_blank"}.
- [@DyonR](https://github.com/DyonR){:target="\_blank"} has developed [magnetico2bitmagnet](https://github.com/DyonR/magnetico2bitmagnet){:target="\_blank"}, a collection of scripts for importing into **bitmagnet** from Magnetico and other sources.
- [@DyonR](https://github.com/DyonR){:target="\_blank"} has written [a **bitmagnet** on Unraid guide](https://github.com/DyonR/bitmagnet-unraid){:target="\_blank"}.
- [@davispuh](https://github.com/davispuh) has published an Arch package, `bitmagnet-git`, [in the AUR repository](https://aur.archlinux.org/packages/bitmagnet-git).
- [@DyonR](https://github.com/DyonR) has developed [magnetico2bitmagnet](https://github.com/DyonR/magnetico2bitmagnet), a collection of scripts for importing into **bitmagnet** from Magnetico and other sources.
- [@DyonR](https://github.com/DyonR) has written [a **bitmagnet** on Unraid guide](https://github.com/DyonR/bitmagnet-unraid).
- Your link could be here!
6 changes: 3 additions & 3 deletions bitmagnet.io/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ No. **bitmagnet** does not download, store or distribute any content _at all_. I

## Should I use a VPN with **bitmagnet**?

It is recommended to use a VPN: **bitmagnet** may download **metadata about** illegal and copyrighted content. It is possible that rudimentary law enforcement and anti-piracy tracking tools would incorrectly flag this activity, although we've never heard about anyone getting into trouble for using this or similar metadata crawlers. Setting up a VPN is simple and cheap, and it's better to be safe than sorry. We are not affiliated with any VPN providers, but if you're unsure which provider to choose, we can recommend [Mullvad](https://mullvad.net/){:target="\_blank"}.
It is recommended to use a VPN: **bitmagnet** may download **metadata about** illegal and copyrighted content. It is possible that rudimentary law enforcement and anti-piracy tracking tools would incorrectly flag this activity, although we've never heard about anyone getting into trouble for using this or similar metadata crawlers. Setting up a VPN is simple and cheap, and it's better to be safe than sorry. We are not affiliated with any VPN providers, but if you're unsure which provider to choose, we can recommend [Mullvad](https://mullvad.net/).

## Is **bitmagnet** intended to be used as a public service?

Expand Down Expand Up @@ -47,7 +47,7 @@ Visit the metrics endpoint at `/metrics` and check the metric `bitmagnet_dht_cra

## How are the seeders/leechers numbers determined for torrents crawled from the DHT?

The DHT crawler uses a [BEP33 scrape request](https://www.bittorrent.org/beps/bep_0033.html){:target="\_blank"} to provide a very rough estimate of the current seeders/leechers.
The DHT crawler uses a [BEP33 scrape request](https://www.bittorrent.org/beps/bep_0033.html) to provide a very rough estimate of the current seeders/leechers.

## How do I know if a torrent crawled by **bitmagnet** is being actively seeded, and that I'll be able to download it?

Expand All @@ -59,7 +59,7 @@ No. The DHT crawler works by sampling random info hashes from the network, and w

## I'm seeing a lot of torrents in the "Unknown" category, that are clearly of a particular content type - what's wrong?

**bitmagnet** is in early development, and improving the classifier will be an ongoing effort. When new versions are released, you can follow the [reclassify turorial](/tutorials/reprocess-reclassify.html) to reclassify torrents.
**bitmagnet** is in early development, and improving the classifier will be an ongoing effort. When new versions are released, you can follow the [reclassify turorial](/tutorials/reprocess-reclassify.html) to reclassify torrents. If you'd like to [improve or customize the classifier](/tutorials/classifier.html), this is also possible.

## Can I run multiple **bitmagnet** instances pointing to the same database?

Expand Down
8 changes: 4 additions & 4 deletions bitmagnet.io/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ nav_order: -1

> Important
>
> This software is currently in alpha. It is ready to preview some interesting and unique features, but there will likely be bugs, as well as API and database schema changes before the (currently theoretical) 1.0 release. If you'd like to support this project and help it gain momentum, **[please give it a star on GitHub](https://github.com/bitmagnet-io/bitmagnet){:target="\_blank"}**.
> This software is currently in alpha. It is ready to preview some interesting and unique features, but there will likely be bugs, as well as API and database schema changes before the (currently theoretical) 1.0 release. If you'd like to support this project and help it gain momentum, **[please give it a star on GitHub](https://github.com/bitmagnet-io/bitmagnet)**.
>
> [If you're interested in getting involved and you're a backend GoLang or frontend TypeScript/Angular developer, or you're knowledgeable about BitTorrent protocols then **I'd like to hear from you**](/internals-development.html) - let's get this thing over the line!
Expand Down Expand Up @@ -51,7 +51,7 @@ This means that **bitmagnet** is not reliant on any external trackers or torrent
- [ ] A more complete web UI
- [ ] Saved searches for content of particular interest, enabling custom feeds in addition to the following feature
- [ ] Smart deletion: there's a lot of crap out there; crawling DHT can quickly use lots of database disk space, and search becomes slower with millions of indexed torrents of which 90% are of no interest. A smart deletion feature would use saved searches to identify content that you're _not_ interested in, including low quality content (such as low resolution movies). It would automatically delete associated metadata and add the info hash to a bloom filter, preventing the torrent from being re-indexed in future.
- [ ] Bi-directional integration with the [Prowlarr indexer proxy](https://prowlarr.com/){:target="\_blank"}: Currently **bitmagnet** can be added as an indexer in Prowlarr; bi-directional integration would allow **bitmagnet** to crawl content from any indexer configured in Prowlarr, unlocking many new sources of content
- [ ] Bi-directional integration with the [Prowlarr indexer proxy](https://prowlarr.com/): Currently **bitmagnet** can be added as an indexer in Prowlarr; bi-directional integration would allow **bitmagnet** to crawl content from any indexer configured in Prowlarr, unlocking many new sources of content
- [ ] More documentation and more tests!

### Pipe dream features
Expand All @@ -61,5 +61,5 @@ This is where things start to get a bit nebulous. For now all focus is on delive
- [ ] In-place seeding: identify files on your computer that are part of an indexed torrent, and allow them to be seeded in place after having moved, renamed or deleted parts of the torrent
- [ ] Integration with popular BitTorrent clients
- [ ] Federation of some sort: allow friends to connect instances and pool the indexing effort, perhaps involving crowd sourcing manual content curation to supplement the automated classifiers
- [ ] Something that looks like a decentralized private tracker; by this I probably mean something that's based partly on personal trust and manually weeding out any bad actors; I'd be wary of creating something that looks a bit like [Tribler](https://github.com/Tribler/tribler){:target="\_blank"}, which while an interesting project seems to have demonstrated that implementing trust, reputation and privacy at the protocol level carries too much overhead to be a compelling alternative to plain old BitTorrent, for all its imperfections
- [ ] Support for the [BitTorrent v2 protocol](https://blog.libtorrent.org/2020/09/bittorrent-v2/){:target="\_blank"}: It remains to be seen if wider adoption will ever make this a valuable feature
- [ ] Something that looks like a decentralized private tracker; by this I probably mean something that's based partly on personal trust and manually weeding out any bad actors; I'd be wary of creating something that looks a bit like [Tribler](https://github.com/Tribler/tribler), which while an interesting project seems to have demonstrated that implementing trust, reputation and privacy at the protocol level carries too much overhead to be a compelling alternative to plain old BitTorrent, for all its imperfections
- [ ] Support for the [BitTorrent v2 protocol](https://blog.libtorrent.org/2020/09/bittorrent-v2/): It remains to be seen if wider adoption will ever make this a valuable feature
2 changes: 1 addition & 1 deletion bitmagnet.io/internals-development.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ has_children: true
# Internals & Development

{: .highlight }
Are you an experienced developer with knowledge of GoLang, Postgres, TypeScript/Angular and/or BitTorrent protocols? I'm currently a lone developer with a full time job and many other commitments, and have been working on this in spare moments for the past few months. This project is too big for one person! If you're interested in contributing please [review the open issues](https://github.com/bitmagnet-io/bitmagnet/issues){:target="\_blank"} and feel free to open a PR!
Are you an experienced developer with knowledge of GoLang, Postgres, TypeScript/Angular and/or BitTorrent protocols? I'm currently a lone developer with a full time job and many other commitments, and have been working on this in spare moments for the past few months. This project is too big for one person! If you're interested in contributing please [review the open issues](https://github.com/bitmagnet-io/bitmagnet/issues) and feel free to open a PR!
14 changes: 7 additions & 7 deletions bitmagnet.io/internals-development/dht-crawler.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@ nav_order: 2

# Architecture & Lifecycle of the DHT Crawler

The DHT and BitTorrent protocols are (rather impenetrably) documented at [bittorrent.org](http://bittorrent.org/beps/bep_0000.html){:target="\_blank"}. Relevant resources include:
The DHT and BitTorrent protocols are (rather impenetrably) documented at [bittorrent.org](http://bittorrent.org/beps/bep_0000.html). Relevant resources include:

- [BEP 5: DHT Protocol](http://bittorrent.org/beps/bep_0005.html){:target="\_blank"}
- [BEP 51: Infohash Indexing](https://www.bittorrent.org/beps/bep_0051.html){:target="\_blank"}
- [BEP 33: DHT Scrapes](https://www.bittorrent.org/beps/bep_0033.html){:target="\_blank"}
- [BEP 10: Extension Protocol](https://www.bittorrent.org/beps/bep_0010.html){:target="\_blank"}
- [The Kademlia paper](https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf){:target="\_blank"}
- [BEP 5: DHT Protocol](http://bittorrent.org/beps/bep_0005.html)
- [BEP 51: Infohash Indexing](https://www.bittorrent.org/beps/bep_0051.html)
- [BEP 33: DHT Scrapes](https://www.bittorrent.org/beps/bep_0033.html)
- [BEP 10: Extension Protocol](https://www.bittorrent.org/beps/bep_0010.html)
- [The Kademlia paper](https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf)

The rest of what I've figured out about how to implement a DHT crawler was cobbled together from [the now archived **magnetico** project](https://github.com/boramalper/magnetico){:target="\_blank"} and [anacrolix's BitTorrent libraries](https://github.com/anacrolix){:target="\_blank"}.
The rest of what I've figured out about how to implement a DHT crawler was cobbled together from [the now archived **magnetico** project](https://github.com/boramalper/magnetico) and [anacrolix's BitTorrent libraries](https://github.com/anacrolix).

The following diagram illustrates roughly how the crawler has been implemented within **bitmagnet**. It's debatable if this will help stop anyone's brain from melting, including my own.

Expand Down
16 changes: 8 additions & 8 deletions bitmagnet.io/internals-development/observability-telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,22 +9,22 @@ nav_order: 3

## Grafana stack & Prometheus integration

**bitmagnet** can integrate with the [Grafana stack](https://grafana.com/){:target="\_blank"} and [Prometheus](https://prometheus.io/){:target="\_blank"} for monitoring and building observability dashboards for the DHT crawler and other components. See the "Optional observability services" section of the [example docker compose configuration](https://github.com/bitmagnet-io/bitmagnet/blob/main/docker-compose.yml){:target="\_blank"} and [example Grafana / Prometheus configuration files and a provisioned Grafana dashboard](https://github.com/bitmagnet-io/bitmagnet/tree/main/observability){:target="\_blank"}.
**bitmagnet** can integrate with the [Grafana stack](https://grafana.com/) and [Prometheus](https://prometheus.io/) for monitoring and building observability dashboards for the DHT crawler and other components. See the "Optional observability services" section of the [example docker compose configuration](https://github.com/bitmagnet-io/bitmagnet/blob/main/docker-compose.yml) and [example Grafana / Prometheus configuration files and a provisioned Grafana dashboard](https://github.com/bitmagnet-io/bitmagnet/tree/main/observability).

![Grafana dashboard](/assets/images/grafana-1.png)

The example integration includes:

- [Grafana](https://grafana.com/oss/grafana/){:target="\_blank"} - A dashboarding and visualization tool
- [Grafana Agent](https://grafana.com/oss/agent/){:target="\_blank"} - Collects metrics and logs, and forwards them to storage backends
- [Prometheus](https://prometheus.io/){:target="\_blank"} - A time series database for metrics
- [Loki](https://grafana.com/oss/loki/){:target="\_blank"} - A log aggregation system
- [Pyroscope](https://pyroscope.io/){:target="\_blank"} - A continuous profiling tool
- [Postgres exporter](https://github.com/prometheus-community/postgres_exporter){:target="\_blank"} - Exposes Postgres metrics to Prometheus
- [Grafana](https://grafana.com/oss/grafana/) - A dashboarding and visualization tool
- [Grafana Agent](https://grafana.com/oss/agent/) - Collects metrics and logs, and forwards them to storage backends
- [Prometheus](https://prometheus.io/) - A time series database for metrics
- [Loki](https://grafana.com/oss/loki/) - A log aggregation system
- [Pyroscope](https://pyroscope.io/) - A continuous profiling tool
- [Postgres exporter](https://github.com/prometheus-community/postgres_exporter) - Exposes Postgres metrics to Prometheus

# Profiling with pprof

**bitmagnet** exposes [Go pprof](https://golang.org/pkg/net/http/pprof/){:target="\_blank"} profiling endpoints at `/debug/pprof/*`, for example:
**bitmagnet** exposes [Go pprof](https://golang.org/pkg/net/http/pprof/) profiling endpoints at `/debug/pprof/*`, for example:

```sh
go tool pprof http://localhost:3333/debug/pprof/heap
Expand Down
Loading

0 comments on commit c16f761

Please sign in to comment.