Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Episode title detection failure when dashes (-) are used as separators. #1222

Closed
SnoutBaron opened this issue Dec 2, 2024 · 2 comments
Closed
Labels
Milestone

Comments

@SnoutBaron
Copy link

For example, this is completely undetectable to Taiga:
HeavenlyDelusion-S01E01-Heaven and Hell [1151EF50]

This is detectable:
HeavenlyDelusion_S01E01_Heaven and Hell [1151EF50]

The issue is that a lot of torrents use (-) as separators, so for the moment I'm using PowerRenamer to regex dashes to underscores, unless there's something in Taiga that I'm missing.

@quanticism
Copy link

quanticism commented Dec 31, 2024

I've been using Taiga for years and its title detection is impressive. I suspect training a local LLM to parse titles may be more resilient to unhandled edge cases, but I'm sure this idea has already crossed the mind of Taiga's devs.

@erengy erengy added the anitomy label Jan 12, 2025
@erengy erengy added this to the v2.0 milestone Jan 12, 2025
@erengy
Copy link
Owner

erengy commented Jan 12, 2025

See #1215, #1071, #1069, #683 for similar issues.

Currently fixed in Anitomy's development branch:

>anitomy "HeavenlyDelusion-S01E01-Heaven and Hell [1151EF50]"
┌──────────────────────────────────┐
│ Element       │ Value            │
│──────────────────────────────────│
│ title         │ HeavenlyDelusion │
│ season        │ 01               │
│ episode       │ 01               │
│ episode_title │ Heaven and Hell  │
│ file_checksum │ 1151EF50         │
└──────────────────────────────────┘

I suspect training a local LLM to parse titles may be more resilient to unhandled edge cases, but I'm sure this idea has already crossed the mind of Taiga's devs.

I thought about making use of ML, but it comes with its own challenges and it's hard to predict if it'd perform significantly better or not. Besides, you might still need a rule-based system for creating the initial training dataset. So I ended up rewriting Anitomy instead, but I'd be interested in the results if anyone else wants to go that route.

@erengy erengy closed this as completed Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants