Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] Setting up PerlIO callbacks when embedding a Perl interpreter using C (before any *.pm modules were ever loaded or any Perl code executed) #22571

Open
vadimkantorov opened this issue Sep 4, 2024 · 5 comments

Comments

@vadimkantorov
Copy link

vadimkantorov commented Sep 4, 2024

Hi!

I managed to do a fully hermetic single-file static build of perl via building all modules statically (followed https://perldoc.perl.org/perlembed) and providing my own implementations of open/fopen/read/seek to serve *.pm system files from memory.

Is there a way to hook up to the Perl's own PerlIO layers system to make sure that Perl only calls these functions (including for module/*.pm discovery and loading) and never goes to libc's IO functions or does libc IO function calls / IO syscalls? This would be much cleaner and a more robust solution.

It would be nice if setting up PerlIO in perlembed scenario was covered in docs.

I also wonder how diamond operator is implemented in the code and which functions from https://github.com/Perl/perl5/blob/blead/perlio.c it calls and in what sequence (e.g. for perl -e 'open(f,"<","my.txt");print(<f>);' and for perl -e 'open(f,"<","my.txt");$line=<f>;print($line);')

Thanks!


If anyone's curious to see what my hack looks like - https://github.com/vadimkantorov/perlpack, but it's very much a WIP

My current problem is that overriding open /close / read / stat / lseek / access / fopen / fileno was sufficient for perl -e 'use Cwd;print(Cwd::cwd(),"\n");', so it can successfully discover and load the Cwd.pm file from my virtual read-only FS, but doing perl -e 'open(F,"<","/mnt/perlpack/.../Cwd.pm");print(<F>);' does not work - probably because Perl is trying to do fcntl/ioctl/some other version of stat call and I am not implementing these. In any case, it is currently not invoking the read function for some reason when I'm using the diamond operator because of some failures on the way. Which IO/stdio calls are used by Perl in a typical opening/reading a file? strace shows open -> fcntl -> ioctl -> lseek -> fstat -> mmap -> read, but these are raw syscalls, so I'm wondering what are the concrete libc/stdio IO functions (I imagine this is somewhere in perlio.c or do_io.c but there are quite a few of indirection layers - so hard to parse through by a novice in the perl's codebase) are used by Perl in a typical opening/reading a file (e.g. stat has many variants) - so that I can override them.

@vadimkantorov vadimkantorov changed the title [doc] Setting up PerlIO callbacks when embedding a Perl using C (before any *.pm modules were ever loaded or any Perl code executed) [doc] Setting up PerlIO callbacks when embedding a Perl interpreter using C (before any *.pm modules were ever loaded or any Perl code executed) Sep 4, 2024
@tonycoz
Copy link
Contributor

tonycoz commented Sep 5, 2024

There is PERL_IMPLICIT_SYS, but that replaces all I/O (not just module loading) and only has a host implementation on Windows.

If you just want modules to be loaded from memory you can add a hook to @INC that checks for a known name and loads that module from memory, see perldoc -f require.

@vadimkantorov
Copy link
Author

vadimkantorov commented Sep 5, 2024

I'll check out PERL_IMPLICIT_SYS - replacing all I/O is fine for my usecase, as my custom I/O functions only serve from in-memory for some special prefixes like /mnt/perl. Is anywhere any docs / examples of using PERL_IMPLICIT_SYS to override? (and what functions need to be overridden for ensuring both module loading and for perl -e 'open(f,"<","my.txt");print(<f>);'. I'm only concerned for compiling/running on Linux for now.

If you just want modules to be loaded from memory you can add a hook to @INC that checks for a known name and loads that module from memory, see perldoc -f require.

Actually, interested both for modules and for regular, basic file reads. For modules, can such INC-hook be added via C perlembed interface (without executing Perl code)?

and only has a host implementation on Windows.

And regarding PerlIO infra, is it relevant for my usecase (module / *.pm loads and regular basic file reads)? Can it be configured via C perlembed interface? Or would you recommend using PERL_IMPLICIT_SYS? Or is using PERL_IMPLICIT_SYS on Linux impossible?

Thank you!

@Leont
Copy link
Contributor

Leont commented Sep 5, 2024

I'll check out PERL_IMPLICIT_SYS - replacing all I/O is fine for my usecase, as my custom I/O functions only serve from in-memory for some special prefixes like /mnt/perl. Is anywhere any docs / examples of using PERL_IMPLICIT_SYS to override? (and what functions need to be overridden for ensuring both module loading and for perl -e 'open(f,"<","my.txt");print();'. I'm only concerned for compiling/running on Linux for now.

It's only ever been done before for Windows but there's no reason it would be impossible on Linux. See perlhost.h, win32.c and perllib.c in win32/ for prior art.

@vadimkantorov
Copy link
Author

Thanks for the pointers! I'll look into what entails using PERL_IMPLICIT_SYS on Linux.

And maybe the last question, if you would know if PerlIO can also be used for this I/O override goal? And if so, can it be configured via a C API before any Perl code gets executed?

@tonycoz
Copy link
Contributor

tonycoz commented Sep 6, 2024

For modules, can such INC-hook be added via C perlembed interface (without executing Perl code)?

After perl_construct() something like:

CV *hook = newXS("MyPackage::my_hook", \&xs_my_hook_xs, __FILE__);
AV *inc = get_av("INC");
av_unshift(inc, 1);
av_store(inc, 0, newRV_noinc(hook));

You could also define the hook sub in perl with eval_pv()/eval_sv().

And maybe the last question, if you would know if PerlIO can also be used for this I/O override goal? And if so, can it be configured via a C API before any Perl code gets executed?

You might be able to do it by modifying PL_def_layerlist or via PERLIO in the environment, but I've never tried it.

It also won't allow you to hook operations like stat() and fcntl().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants