Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic load #291

Open
madhavajay opened this issue Jul 6, 2022 · 7 comments
Open

Dynamic load #291

madhavajay opened this issue Jul 6, 2022 · 7 comments

Comments

@madhavajay
Copy link
Contributor

I want to be able to define the capnp definitions at run time.

It seems that .load won't accept anything but a real file due to this exception:

if not _os.path.isfile(file_name):
    raise IOError("File not found: " + file_name)

I can work around this by saving a file to /tmp and loading it which seems silly.
My question is other than only having the same module name once in the module tree is there any reason why this shouldn't be allowed? I am happy to open a pr which allows this work unless there is some reason why it should never be done.

I imagine that allowing a network user to define an arbitrary capnp definition and having that initialized miiiight be a concern? But I guess the actual serde code where the traversal_limit_in_words is set is the main concern?

import time
import capnp


def str_to_capnp(proto: str) -> type:
    temp_url = f"/tmp/{int(time.time())}.capnp"
    with open(temp_url, "w+") as f:
        f.write(proto)
    f.close()
    print(temp_url)
    return capnp.load(temp_url)


s = """
@0x8f9c60bd7a9842fc;

struct REPT {
  child @0 :Data;
  minVals @1 :Data;
  maxVals @2 :Data;
  entities @3 :Data;
}
"""

x = str_to_capnp(s)
print(x, type(x))
@madhavajay
Copy link
Contributor Author

@kentonv Thanks for the explanation here: capnproto/capnproto#1532

I know this is a different sub project to the C++ one, but as a first step would it be reasonable to provide a PR which prevents the file write / read hack above and simply allows a schema string to be passed in. The second step would be to start investigating any differences between the current code with SchemaParser and the SchemaLoader C++ class you mentioned? Or would you prefer that as a single PR which would ensure the dynamic loading safety is delegated to the schema.capnp part of the C++ library. Also, will this require updating the current C++ version to something newer like v0.10.x?

@haata any thoughts?

@kentonv
Copy link
Member

kentonv commented Aug 29, 2022

SchemaParser contains a SchemaLoader under the hood. SchemaParser parses from text .capnp files, whereas SchemaLoader loads already-compiled schemas in schema.capnp format.

Note that I would NOT recommend using the same SchemaLoader instance that is already being used as part of the SchemaParser that processes regular imports. You don't want to mix untrusted third-party schemas together with your imports, they could interfere with each other. What you want instead is the ability to create a separate SchemaLoader and load things into it.

@madhavajay
Copy link
Contributor Author

Okay, so if I understand correctly, the current implementation in pycapnp uses SchemaParser with a SchemaLoader under the hood but restricts non file loading intentionally to assume the user has intentionally chosen a .capnp file they trust.

Allowing dynamic loading can be implemented as a seperate function and should use a seperate instance of the SchemaLoader C++ class to ensure that dynamic imports cant effect the non dynamic ones?

Also, would they be namespaced somehow because of using a different SchemaLoader or is that something extra that should be considered / added?

Assuming all of that is correct, would it be okay to add a PR?

@kentonv
Copy link
Member

kentonv commented Aug 31, 2022

To be clear I only know the C++ side, I haven't looked at the Python implementation in about 8 years. @haata will have to comment on the Python side.

But I would guess that it wasn't an "intentional" decision to prevent importing from virtual files, I think it was probably just the easiest code to write. I think it would be fine to support importing virtual files, it's just some work that needs to be done. (And there should probably be a warning not to use it for untrusted input.)

For untrusted inputs, you need a separate SchemaLoader, yes (and you should only be accepting the schemas in compiled schema.capnp format, not in text .capnp format).

@haata
Copy link
Collaborator

haata commented Aug 31, 2022

I'm not opposed to something like this. Just need to make sure it doesn't affect the common use-case too much.

@madhavajay
Copy link
Contributor Author

Okay no worries, I think we will prototype this locally with the above workaround to see if what we are doing makes sense and then provide a PR to discuss in more detail. What we are hoping to do is provide an automatic recursive serde system in our python code base which uses capnp under the hood and then for instances where we need performance and zero copy semantics we will implement manually constructed capnp serde logic. The primary reason being we had protobuf for this but protobuf maxing at 2GB and not supporting zero copy became a bottleneck for us.

Right now we have both protobuf and capnp in our code base and we are going to remove protobuf and use capnp exclusively for our bytes format between systems (hopefully including js).

Ill report back when we have some more meaningful examples to show.

@Zentren
Copy link

Zentren commented Feb 21, 2023

With respect to dynamic schema loading discussed in this issue, a colleague of mine should be in touch in the coming weeks about a PR for potential inclusion (after some extension and clean up our end) based off tinkering done in this branch: master...Zentren:pycapnp:dynamic_schema_loading.

We need to load binary schemas sent over the wire to interpret arbitrary structs able to appear in specific RPCs and had to make library changes to support this use-case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants