Amature collab question #113
Replies: 1 comment 2 replies
-
Hi, thanks for reaching out.
I remember you mentioning this in #76. I have been considering adding prompt templates that would allow users to dynamically inject existing tags or the file/folder name of each image into the prompt.
Please share the link. If it's compatible with the existing models I can add it.
The captioning logic is in To create the new tab, you will have to make a That said, I think the workflow you described consists of steps that are quite specific to you and your team, whereas I try to add features that can be applied more generally. Therefore, although I appreciate your willingness to contribute to the project, I think it would be best if you forked the repository and created your own version of the software tailored to your needs (the license allows this). What I might implement myself in the future is a general framework that would allow creating complicated workflows for captioning (your process would be a good example), but that seems like a big undertaking so I'm not sure when I could do that. Thank you again, and feel free to let me know if you have more questions or suggestions. |
Beta Was this translation helpful? Give feedback.
-
First Id like to start off by saying great job on this project. I belong to a small group of enthusiasts trying to develop a similar project and was wondering the best way to maybe merge some of our progress into yours? What we have noticed is that the coherency of the vision models increases with the added context of the (latest) WD taggers. There is even a furry specific WD tagger that really helps when tagging that content. (Let me know if you want the link).
Full disclosure im not a programmer but i can usually figure it out with time and chatgpt, luckily we have some programmers in our group and tend to lean on their expertise.
My question is this: In our efforts to port over our processes (and by extension produce some prs to contribute), can you point me to the right parts of your code to start adapting our multishot (aka scripted steps), into the autocaptioner. I'm thinking this is will live in cli until its fully ported and working, then well make an 'advanced auto caption' tab.
To add some clarity the current process in our tagging script looks like this:
1a) Specify the concept definition, this will be used to focus the vision model, helps with complex concepts, and guarantees the vision model mentions the concept/visual style/whatever your trying to train on.
1b) Specify other options used by the scripts.
2a) tags are added to their own section in .yaml sorted by confidence.
Simplified example:
"Describe the background scenery only", save it to its own "scenery" section in the .yaml
"Describe the scene based on the provided 'concept' and use the provided 'wd tags' for inspiration." save to "description" section in the .yaml
As an amateur some initial thoughts and guidance would be greatly appreciated and help speed up my initial attempts.
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions