Skip already uploaded files and add synchronization functionality #353
lukaskremla
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
FWIW : mpremote v1.24.0+ by default skips uploading unchanged files to a board (if the device has
This hinges on the fact that reading from flash is often much faster than writing it, therefore the penalty of reading to create a hash is much lower than writing too many files. If you approach is to reduce file writes to the device - then in my experience in attempting the same is that you also will need to account for:
Timings for : ESP32_GENERIC_S3 mpremote rm :large.txt
PS D:\> measure-command {mpremote cp large.txt : }
TotalSeconds : 2,2762847
PS D:\> measure-command {mpremote cp large.txt : }
TotalSeconds : 0,789586
PS D:\> measure-command {mpremote cp Xlarge.txt : }
8
TotalSeconds : 9,1412448
PS D:\> measure-command {mpremote cp Xlarge.txt : }
TotalSeconds : 0,8132274
PS D:\> dir .\large.txt
Directory: D:\
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 18-12-2024 18:22 7291 large.txt
-a--- 18-12-2024 22:32 28967 Xlarge.txt |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
EDIT: I've been working on this script and its implementation further, I already have a new UI for the excluded paths and I've fine tuned some things about the implementation. I won't be rewriting this post and modifying the files at the bottom, I'd prefer to fine tune that in a PR if you'd be interested in this functionality. I can elaborate on the changes in the discussion if need be.
I've had an idea about a different approach to hashing files. My first approach was reading whole files into memory, which is bad design, I quickly modified it to use a chunk-based approach instead, but that was still not great. It did save time, but not a significant amount of it.
I wasn't satisfied with the results I got to so far and I was thinking about the biggest bottlenecks of my approach. Then realizing that what is slowing the approach down the most is in fact continuously executing this script before uploading every file, this operation itself took long enough to negate most of the file hashing benefits. And it even caused very frequent time out errors when I was running this on one board (my guess is that the serial converter used could be to blame as well, but I am unsure).
A different more efficient approach is to reduce the communication between the computer and target device to a minimum. Then I got an idea that I could try figuring out a way to first pin point all of the already existing files on the target device and remove them from the list of files to be uploaded right at the start - this reduces the amount of hashing script executions from n (n being the number of uploaded files) to 1 no matter the amount of uploaded files.
The script is stored as a large string in the FileSystemWidget, it has place holder values that are set by the calling method, three variables have to be replaced by the calling method before this script is executed.
First is a boolean to determine if synchronization should happen, second one is a list of files that are being uploaded and the last is a list of paths to ignore.
The files which are being uploaded are passed to the scripts as tuples of target paths, sizes and calculated hashes. The script first checks these values in this order and only calculates hashes at the end if absolutely necessary. The hashes are then compared on the microcontroller and the calling method simply extracts a list of matching paths, it then gets all the already uploaded files by the target paths and removes them from the set of files being uploaded.
I've been through several iterations and versions of this script, at some point I thought that being able to have the upload functionality automatically remove old files I don't want could be a nice feature so I added the synchronization toggle. Currently, in my version of the plugin it is only implemented as another button in the configuration (just like reset on success and switch to repl tab). But I have already thought about a more robust way to implement this.
My idea is to keep this first synchronize toggle, if the user enables it another toggle appears, it will be called "ignore certain paths" or something alike, if clicked again the user would have a chance to manually configure paths the synchronization script should ignore. The path can be to a folder or to a specific file.
My idea for how this dialog/configurator could look like is that I would create a similar table the one that the run configuration "Before launch tasks" uses, with the + - and edit buttons. Some basic validation could be done for these paths. Both paths that start with a leading / or those that don't would be supported, the plugin would format it in both cases to match the MicroPython filesystem's expectations.
I've made both an SHA256 and a CRC32 version of these scripts, the scripts themselves with comments explaining my reasoning for doing things the way I did and then the modified kotlin files are attached below for you to check.
Please do let me know if you'd do something differently, have questions about this or if you'd want me to create a PR so we can discuss the details of merging this there. The kotlin implementation of this hashing script does rely on #350.
By the way, it is entirely possible that something is suboptimal on the kotlin side of this script, please let me know if something should be done differently.
As of right now I don't have a finalized version of the synchronize option UI I've described, but its something I plan to work on now and I would provide it in the PR with these changes.
I tested both versions for speed on the ESP32 S3, they should take about the same amount of time, but I assume that on microcontrollers without hardware optimizations for hashing CRC32 will be faster. Both versions should work, so far me and my team have only used the SHA256 version of the script.
I've ran some tests regarding ideal chunk sizes, unsurprisingly 1024 bytes seems like it is the best. Higher chunk sizes (2048, 4096, 8192 and 16384) did lend about a hundred ms speed up each time the chunk size was doubled, but on the ESP32 S3 a 16KB chunk size often fails to allocate, my guess is that memory fragmentation could be to blame. 1024 was fast enough and should work on every microcontroller, so I went with it. I can share the tests I ran with you if you'd want. I also have a set of around 60 files with a size of around 1.3 MB that I can share with you, they have dummy text in them, they contain several folders, python scripts, web files and difficult characters (the ones that caused problems with uploads back when I started testing this plugin).
I've also looked at the FNV1A32 hashing algorithm to evaluate if it could be viable, and with my test file sets it was usually 10x slower than SHA256 or CRC32 on the ESP32 S3 even if I tried making it hash 4 bytes on each run when creating the hash. The pure python implementation simply can't beat the built-in hashing modules. It is possible that on small boards like the ESP8266 where neither SHA256 or CRC32 are available FNV1A32 could still be faster than uploading all of the files no matter what, I can investigate that later.
MicroPythonRunConfiguration
FileSystemWidget SHA256
FileSystemWidget CRC32
MicroPython SHA256
MicroPython CRC32
Beta Was this translation helpful? Give feedback.
All reactions