-
-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross-platform, high level file I/O module #1171
Comments
For previous discussion, see #1158. |
Some additional design notes:
It may be worth investigating existing libraries like intake that exist to abstract data access from raw filesystem operations. |
what about having an api for appdata folder path, Documents etc. |
@coolcoder613eb We have one of those - however, that API is necessary, but insufficient for the general problem. The issue is that on Android, some file locations aren't accessible using standard Unix file I/O APIs; and "cloud"-based locations require a different approach again. This ticket is covering the idea of providing an abstraction over those alternate file sources. |
I have been working on this issue and, as a proof-of-concept, I created a cross-platform uri_io package: The online documentation can be found here: |
Thanks for the update; however, I'm not sure I see how what you've got here meets the needs of a high level Python file I/O module (at least, from Toga's perspective). A local file in Python is accessed using:
To my understanding, what we need here is something that satisfies the API:
As an implementation note, this direct approach is likely to have poor performance; synchronously loading a web resource is going to be a blocking activity, so adopting an approach to async file I/O (like aiofiles) will likely be necessary in practice. On an implementation note, your code has an... eccentric package structure. Why have you adopted a |
Hello @freakboy3742, thanks for your comments about the package namespace. I will change it. And yes, the reading/writing should be async - I was just too lazy so far. Regarding your code expectation: Except for the naming I think we are not that far apart. This code works:
But this code does not:
I get this message: I have no idea how to add the context manager protocol. Is this difficult to implement? Is there some example somewhere? |
The Context manger protocol is fairly straightforward to implement - you define 2 methods ( |
Sounds easy enough. In |
On file-like resources |
I changed the package namespace and I added the context manager protocol to UriInputStream and UriOutputStream. The online documentation can be found here: @freakboy3742 Now, the following code works, which is pretty close to what you expected:
As for the async implementation: |
Any reason for using As for progress indicators - I'm not sure I see why additional handling is required. If you're dealing with a small file, To that end - if f.read() is a blocking network access method, it should probably be an async call. |
Yes, I intend to add following methods:
The newline argument is missing in open_text_inputstream because for me, only the "universal" newline mode makes sense. And yes, it would be possible to emulate the standard f.open() method. But I don't like f.open() because it is too broad and therefore, there are dependancies between the arguments |
Well, sure - Python's At the very least, Also: it doesn't matter if it's reading or writing - if it's text, it has encoding. If it doesn't have encoding, it's bytes. At which point, you need to add an Or, you can have a single |
@freakboy3742 I now replaced the UriFile.open_xxx methods with a single open() method. And I added support for text encoding and decoding. For reading, I use a TextIOWrapper and a BufferedReader which works great. But for some reason, using TextIOWrapper and a BufferedWriter for writing did not work. The problem was that TextIOWrapper.write(content) expects content to be a str and at the raw outputstream, it was still a str and not bytes as the outputstream expected. Do you know why this is? What is still missing is the async operation mode. readall() for example should be async but will it help to define that method as async when on the platform level, it is just 1 function call (readAllBytes)? It will block anyway, won't it? To become non-blocking, I would need to replace readAllBytes with some read loop where I add a asyncio.sleep now and then, right? And, read(4096) does not need to be async, but read(-1) which is the same as readall() should be async. How should this be handled? |
TextIOWrapper does everything in text; if you want bytes, you need to use a BytesIO, or perform encoding on any output content.
I don't see an inherent problem with In terms of API, |
There's an existing library which is quite similar to this: |
Looks interesting, but I did not find an api for reading / writing files there |
It looks like it implements the whole pathlib API, including, |
@mhsmith Ooh - I knew about fsspec, but didn't realise there was a Pathlib wrapper that used it as well. I guess the downside is that it doesn't have an async mode... but fsspec does (albeit only implemented for HTTPS), and it should be a lot easier to add async features to an existing project than to start from scratch. |
Personally I find asyncio rather problematic to use as async functions must always be run in an async event loop. Since asyncio only allows 1 asyncio event loop to be run at the same time, this can cause incompatibility issues when attempting to mix various APIs/python modules that wants to create there own asyncio event loops. Instead, I would suggest to simply instruct the user to run the function asynchronously using a method of there own choice whenever they need to. This of course also assumes that the user is performance aware, but tbh even if you do use asyncio, theres nothing stopping a user from simply prefixing every function call with await anyway. I rarely see people actually use |
In an ideal world, I also think implementing pathlib & the file operations in the os module would be a good idea to allow use of 3rd party libraries built around pathlib/os. Partial support would be better than no support (I wouldn't say theres an absolute need to support all of the pathlib/os API from day 1 if you were to decide to go down this route). Also, |
It sounds to me like you've got a misunderstanding of how asyncio works. All Toga code can already run in an async context. This is fundamentally necessary to prevent beachballing. As soon as the app is running, you have access to the running event loop, and you can add co-routines to that event loop. Even if you use non-async callbacks in Toga, they're still being invoked in an async context. The "multiple event loop" problem is something that Toga hits because there's a conflict between the OS-level GUI event loop and the CPython asyncio event loop. I can't think of a single library in the CPython ecosystem that truly has it's own main loop - because the main loop is fundamental infrastructure of the async system, not something that an end-library implements. If you've got an app that is using the
This fundamentally doesn't work, as "the method of their choice" won't be integrated with the GUI event loop (at least, not unless they've done a lot of extra work... and I can almost guarantee they haven't). |
Is your feature request related to a problem? Please describe.
There is the problem that in Android 11 the standard file I/O will not be supported anymore on external storage.
All file access on external storage must be done with the SAF (Storage Access Framework)
https://developer.android.com/about/versions/11/privacy/storage
Describe the solution you'd like
I would like to have an abstract, cross-platform, high level module for handling file I/O.
Files should not be referenced with a file path but rather with an URI.
On Android, they would be content URIs (content://), on other platforms, we could use file URIs (file://) or whatever fits the platform.
The module should provide about following functions:
The text was updated successfully, but these errors were encountered: