Amature collab question #113

LoFiApostasy · 2024-04-08T19:54:05Z

LoFiApostasy
Apr 8, 2024

First Id like to start off by saying great job on this project. I belong to a small group of enthusiasts trying to develop a similar project and was wondering the best way to maybe merge some of our progress into yours? What we have noticed is that the coherency of the vision models increases with the added context of the (latest) WD taggers. There is even a furry specific WD tagger that really helps when tagging that content. (Let me know if you want the link).
Full disclosure im not a programmer but i can usually figure it out with time and chatgpt, luckily we have some programmers in our group and tend to lean on their expertise.

My question is this: In our efforts to port over our processes (and by extension produce some prs to contribute), can you point me to the right parts of your code to start adapting our multishot (aka scripted steps), into the autocaptioner. I'm thinking this is will live in cli until its fully ported and working, then well make an 'advanced auto caption' tab.

To add some clarity the current process in our tagging script looks like this:

Build a template.yaml file per concept folder.
1a) Specify the concept definition, this will be used to focus the vision model, helps with complex concepts, and guarantees the vision model mentions the concept/visual style/whatever your trying to train on.
1b) Specify other options used by the scripts.
call wd tagger, dedupe, run minor processing.
2a) tags are added to their own section in .yaml sorted by confidence.
call vision model, feed options, loop until finished as specified in the folder "template.yaml"
Simplified example:
"Describe the background scenery only", save it to its own "scenery" section in the .yaml
"Describe the scene based on the provided 'concept' and use the provided 'wd tags' for inspiration." save to "description" section in the .yaml

As an amateur some initial thoughts and guidance would be greatly appreciated and help speed up my initial attempts.

Thanks.

jhc13 · 2024-04-09T15:39:17Z

jhc13
Apr 9, 2024
Maintainer

Hi, thanks for reaching out.

What we have noticed is that the coherency of the vision models increases with the added context of the (latest) WD taggers.

I remember you mentioning this in #76. I have been considering adding prompt templates that would allow users to dynamically inject existing tags or the file/folder name of each image into the prompt.

There is even a furry specific WD tagger that really helps when tagging that content. (Let me know if you want the link).

Please share the link. If it's compatible with the existing models I can add it.

My question is this: In our efforts to port over our processes (and by extension produce some prs to contribute), can you point me to the right parts of your code to start adapting our multishot (aka scripted steps), into the autocaptioner. I'm thinking this is will live in cli until its fully ported and working, then well make an 'advanced auto caption' tab.

The captioning logic is in captioning_thread.py. A QThread is created each time captioning is requested, and the run() method is the main function in charge of the captioning process.

To create the new tab, you will have to make a QDockWidget similar to the one in auto_captioner.py and connect it to the rest of the program in main_window.py.

That said, I think the workflow you described consists of steps that are quite specific to you and your team, whereas I try to add features that can be applied more generally. Therefore, although I appreciate your willingness to contribute to the project, I think it would be best if you forked the repository and created your own version of the software tailored to your needs (the license allows this).

What I might implement myself in the future is a general framework that would allow creating complicated workflows for captioning (your process would be a good example), but that seems like a big undertaking so I'm not sure when I could do that.

Thank you again, and feel free to let me know if you have more questions or suggestions.

2 replies

LoFiApostasy Apr 12, 2024
Author

So the links were active not long ago here https://github.com/Garbevoir/wd-e621-hydrus-tagger/tree/edit/model/Z3D-E621-Convnext but the discord is down. The model is just in the standard WD (onyx) format. Ill toss a copy up on my git if it dosnt come back online. Thanks for your help.

jhc13 Apr 12, 2024
Maintainer

It will have to be available on Hugging Face (like this) for me to add it to TagGUI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amature collab question #113

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Amature collab question #113

LoFiApostasy Apr 8, 2024

Replies: 1 comment · 2 replies

jhc13 Apr 9, 2024 Maintainer

LoFiApostasy Apr 12, 2024 Author

jhc13 Apr 12, 2024 Maintainer

LoFiApostasy
Apr 8, 2024

Replies: 1 comment 2 replies

jhc13
Apr 9, 2024
Maintainer

LoFiApostasy Apr 12, 2024
Author

jhc13 Apr 12, 2024
Maintainer