Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support staged processing and/or continuing in GUI mode #26

Open
EdouardVuillard opened this issue Aug 7, 2024 · 6 comments
Open

Support staged processing and/or continuing in GUI mode #26

EdouardVuillard opened this issue Aug 7, 2024 · 6 comments

Comments

@EdouardVuillard
Copy link

At the moment, any interruption means starting all over again. The program does not seem to recover any of its work if it is interrupted.
A related concern is that the program does not permit user control of its stages. The program works in a number of (it seems) self-contained stages. At the least there appear to me to be the following:
(1) Existing eBook is chosen by user.
(2) eBook is converted to plain text.
(3) eBook is separated into two CSV files, one containing dialog, one containing everything else.
(4) Each segment from the tables in the CSV files is separately converted to one or more temporary WAV files.
(5) The temporary WAV files are combined for each segment combined into a single WAV file for that segment leading to a growing final collection of WAV files.
(6) The final collection of WAV files is combined together into an audiobook file.
On a long book, every stage takes hours. Stage (5) is the longest but having waited hours for (1) to (4) to complete it is shame to have to do them again if the program has already completed them for a book once.
I have run the GUI far enough to get part way through stage (5). My problem is that the GUI estimates it is going to take 152 days to complete. Some serious tuning of the settings appears necessary. For example, during installation (my second attempt at installation) I skipped the NVidia installation. I would like to go back and install that component but if I do, I will lose all progress to date.
Also, given that for me the purpose is to produce an audiobook that I can listen to privately in my car, it would be nice to be able to produce an audiobook per chapter so that I can get started listening while the program keeps working through the remaining separate chapters. I appreciate that I probably could have divided up the base eBook myself.

@DrewThomasson
Copy link
Owner

DrewThomasson commented Aug 7, 2024

Before I get started heads up:

-The loading bar estimates are always wonky and inaccurate at the beginning

I can promise you if it didn't take 152 days to generate to get to the final 5th combining stage then it will not take that long to get through the 5th stage of combining all the audio files into one.

Edit- The combining audio files should take the least amount of time

@DrewThomasson
Copy link
Owner

Thank you so much for analyzing my code, I've never seen anyone do this before I'm very impressed at how in depth you have gonna into the inner workings of my program

  1. you probably don't need the nvidia driver thing, considering the audiobook is generating in hours rather than days shows that your CUDA cores are being utilized

So 🤷

  1. you're right that I should implement a save file function of some kind to keep track of where it is, and be able to get back from where you left off.

  2. I probably won't be implementing a "generate by chapter request" Sadly, that might add unnecessary complications to the program, that I a single repo maintainer would have trouble maintaining. :/ sorry

I'd LOVE to have anyone else helping out with the code for this project but so far nothing yet :/

Anyway, Thank you for your feedback and I hope that helps answer any of your questions.

@EdouardVuillard
Copy link
Author

Thanks Drew and I certainly appreciate the limits on your resources. You are right about the time estimate: after a few days running it has dropped back to a much more reasonable 40 days. I understand that stage (5) is by far the longest stage and stage (6) I expect will be short. However, it would still be nice to be able to resume anytime if stage (5) is interrupted.

I have a possible temporary workaround for the problem of starting listening before the complete book is ready. I copied out the finished WAV files and merged them together using ffmpeg. This produced a single 40 minute WAV that I can load on to my iPod Classic. Not pretty but probably good enough to get started.

@DrewThomasson
Copy link
Owner

DrewThomasson commented Aug 9, 2024

Yeah honestly that's what I do when I'm impatient with it lol

Also I now realize your running it off a cpu and not a nvidia gpu is why it's probably taking so long for you. . :/

If you have a nvidia gpu with a minimum of 4gb vram then you should be able to generate the audiobook at around real-time audiobook speed

Edit: I'll see about getting some kind of save file working tho

@DrewThomasson
Copy link
Owner

At the moment I'm designing a save system, any suggestions would be helpful.

@DrewThomasson
Copy link
Owner

DrewThomasson commented Oct 9, 2024

Oh I forgot to mention this but here is a temp workaround for pausing and resuming VoxNovel,

  • But it only applies for the docker image forms of VoxNovel as seen here ⬇️
  • That should go over how to pause a running docker image and resume it.
  • Probs easiest in the headless docker image tho lol.

#21 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants