-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
local scan: report progress #9
Comments
OK, have come as far as finding that one line would do I need:
and seems a reasonable piece of logging info. I wonder would a pull request for this be welcome? It's not what I'd call a temporally perfect pulse given it emits as dirs are thrown onto the job queue for worker threads (i.e. dir has been found but not scanned yet) but is plenty good for its intended purpose. Namely on a huge library that takes a long time to scan, some pulse of life and progress is reassuring and vital. As it stands it just stalls at the start of the scan with no output until the whole tree has been walked ... and that looks stone dead when I point it at a mounted NAS with a huge library! |
We have https://github.com/mopidy/mopidy/blob/develop/mopidy/local/commands.py#L157 which logs each file we process. Or is your problem with the initial scan of what files exist? |
Thanks, I say that in the debug log indeed! And noticed as I tweaked my one line that debug is forced to file and info ends up on console (when running mopidy on the CLI to test). But yes the issue I have is with the first time experience that I had and I expect others will. Runs something like this:
I simply contend this is not likely an unusual first time experience, and turns from sour to pleasant with this one line patch above. Yes, it's the initial scan it seems the tree walk that is done in a lump and then only at end is an info message written, and then with the list in hand mopidy goes to scan the actual files for metadata. The latter is of course slow, but on a NAS with a large library (not uncommon, albeit less and less so as frankly on-line streaming is - as I predicted a decade ago ;-) slowly making large local media collections less relevant) this first tree walk is still very time consuming. I can go time it on my system maybe for input, but it's way beyond the threshold of human patience and wanting a pulse or progress indicator to differentiate working from stuck ... Not least because when working with a NAS, stuck is not all that uncommon either. NAS can go down, LAN night go down, LAN or NAS might get flooded (someone else in house is streaming video off it say or down or uploading large files ;-) ... So I'd still see this one liner as a value add to the product IMO and recommend a pull into main. Though given it's an initial directory walk and logging later uses the word "scan" perhaps it should here log ("Found %s ...", entry) for clarity. Point is when I run with this I have a very pleasing pulse of life as my rather large NAS tree is walked. I haven't done a whole tree walk yet so don't know timing, but to be honest, as I proceed that's on the list, not least because a feature I admire (the cross I bear, being the management of large local library) on any scanner is intelligent projection, which isn't hard to do for rescans (where a previous run is on record). I know that when I run a python tree walk over a backup library on a SATA drive (not NAS) then it's still pretty time consuming and on my todo has been to add timing records and feedback, because I do need multiple passes for tag cleaning ... |
We also have https://github.com/mopidy/mopidy/blob/develop/mopidy/local/commands.py#L162 in the file path with is the non debug user logging. We could perhaps add a log callback to the tree walk that triggers every n files and then just have it log info log scanned M files? As for the tree walk being slow it used to be much much worse, now at least we have a worker pool of threads scanning so that we can make progress while other threads are IO blocked. If you can think of other ways to make this less painful and speed it up that would be great. |
Not sure of the drawback that one line of info logging poses? Perhaps bad to do logging IO in a worker thread? Having trouble imaging what you mean by a log callback, passing an arg to the worker thread which supplies a function that emits log into from the main thread? I tried my simple one-liner above on my library to check performance. I'm kind of happy with what I see, it stalls a moment after the root dir is logged (while the other 9 worker threads are waiting I guess for the first sign of work on the queue) but not a bothersome period. But in short order I get: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 29: ordinal not in range(128) suggesting I have unicode fouling up that logging line (I do have folders with probable unicode names (the band Άβατον for example). Alas can't easily see which entry is fouling it as I can't emit the entry to the console ;-) and am done playing for tonight. Folder by folder pleases me mind yoy, as I follow the age old artist/album/song directory format (mostly) a folder is an album (mostly). But for the product proper any pulse of life would please a newb trialing it too. I love the worker pool by the way, it took some reading to get my head around it, (python threading new to me too), but looks slick (though "relative" seems not to be used anywhere, and put in with some plan in mind that hasn't eventuated). Alas not a pro on speeding up dir walks. They are always a tad slow in my experience. I'm already impressed by your farming out to workers, so while one thread is reading the filenames in a dir, another is actually checking the file. I have typically just used os.walk and been patient ;-). Little choice as if I'm walking on an ARM based NAS on a shared LAN (with my wife,kids, tenants, guests). (as an aside - I'd PM but can't see how on your profile - I'm a Swiss expat ;-). |
The current codebase now offers a Each time the scan flushes (every 100 files by default, following the value of
Is it enough details to close this issue? |
Logging during the scan was greatly improved for the 3.0 release. However, I still don't think we log anything showing progress during the initial file hierarchy traversal, found in |
I'd like to see this as well. I tried running a scan on my Why don't we incrementally add tracks as opposed to walking the tree, then adding all of them at once? |
I'm playing with
mopidy local scan
and an initial observation is I can't pull progress info out of mopidy any way. Not even with logging set to theDEBUG
level.Given a local media store can be large and take a long time to scan, I see it as imperative to see some pulse of life, or progress report from the scanner. Not least when any number of things might cause it to hang, not least of which being the local media store being on a mounted NAS system (hardly an unusual scenario I'd think) - though that is stretching the definition of "local" it is none the less where local media library might (in my case, does) reside.
The text was updated successfully, but these errors were encountered: