Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ebuild backend #274

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Kangie
Copy link
Contributor

@Kangie Kangie commented Dec 2, 2023

I stumbled across fapolicyd and thought "that sounds pretty cool" and "I bet I could teach it to read the portage database".

After a month or so of "Fun" this PR is the result.

It's currently a WIP but I figured that good enough to request feedback and see if (once I write some tests and validate that I didn't break anything else...) we can sneak it in before the next release.

I am a C novice, so please let me know what can be improved!

Highlights:

  • New ebuild backend
  • init -> extra (and categorised some files)

TODOs:

  • Investigate use of libmd
  • Go over this code after some sleep
  • Tests!
  • Documentation
  • Work out how to whitelist portage properly (we can handle this downstream)
  • Add openrc init files
  • Fix ASAN identified memory leaks in new code
  • Work out how I broke unescape_shell during the refactoring (I didn't, just noticed something extant with asan enabled, will investigate further)

@Kangie Kangie force-pushed the emerge-backend branch 3 times, most recently from 8a345e5 to a13fe6b Compare December 3, 2023 05:06
Copy link

@thesamesam thesamesam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some initial comments. I'd consider rebasing and cleanly splitting the refactoring before adding the ebuild backend as well, like the changes to the Debian side & md5 parts

AS_HELP_STRING([--with-ebuild],[Use the ebuild database as a trust source]),
use_ebuild=$withval,use_ebuild=no)

if test x$use_ebuild = xyes ; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use AS_IF. Bare if has issues with quoting and is no longer recommended in the autoconf manual.

configure.ac Outdated Show resolved Hide resolved
src/Makefile.am Show resolved Hide resolved
src/library/ebuild-backend.c Show resolved Hide resolved
@Kangie Kangie force-pushed the emerge-backend branch 6 times, most recently from ff73fd5 to 1796a0f Compare December 9, 2023 01:30
@Kangie
Copy link
Contributor Author

Kangie commented Dec 10, 2023

@stevegrubb

Here's the other work I'm hoping to land before the next release, but if we don't make it I can snapshot the code for Gentoo after it's merged.

#277 has been raised to separate out the deb-backend changes; I'll rebase once that's merged so that there's less to review here.

Would you like me to split out this commit too, or should we squash all of the ebuild-related mess in the middle down into a few logical commits?

Edit: I have my TODO up the top, but aside from getting derailed by some 'not caused by my changes' ASAN stuff that I'm debugging in the background, I need to:

  • fix some memory leaks in the ebuild-backend code
  • Write some basic ebuild tests

On my todo, but not part of this PR, is adding a Gentoo CI/CD stage - there's some work that I'll need to do to come up with something suitable for downstream use. I suppose a short-term alternative, as the ebuild backend adds no new dependencies, would be to tarball that up as test data and just extract it in the test container?

I'll ping you again once I think the code is ready for review, but if you have any early feedback I'm willing to address it. :)

Edit the second: It won't make this PR / release, but would you be receptive to a Meson build system as an alternative to (or replacement for) autotools? It's starting to get to me a bit, so I'm willing to put in the porting work.

@Kangie Kangie force-pushed the emerge-backend branch 4 times, most recently from bffb243 to e4eabce Compare December 10, 2023 05:09
@stevegrubb
Copy link
Member

PR #277 appears ready to merge. Do we want to do that first and then rework this patch (if needed)? I have not yet looked this patch over.

@Kangie Kangie force-pushed the emerge-backend branch 2 times, most recently from 7c256e3 to 2f3e330 Compare December 16, 2023 02:59
@Kangie
Copy link
Contributor Author

Kangie commented Dec 17, 2023

PR #277 appears ready to merge. Do we want to do that first and then rework this patch (if needed)? I have not yet looked this patch over.

Yes, please - that's a prereq for this PR. I've had some people better at C than I provide some feedback offline and I'll be addressing it over the next few days.

@Kangie Kangie force-pushed the emerge-backend branch 2 times, most recently from b239ec1 to 08b0dfa Compare December 18, 2023 12:17
@Kangie
Copy link
Contributor Author

Kangie commented Dec 18, 2023

I've adapted the deb test to provide some basic assurances that the ebuild backend can read a vdb. I spent a bit of time today working on generating a psuedo-vdb that would enable us to test this without relying on a real Portage installation / database, but I didn't get there in the end. I'll keep plugging away at it in the background, but it may not make it in here.

I'm reasonably happy with the current state, reading the VDB is fast enough, though I could stand to work out if there's anything else on a Gentoo system can be safely filtered to reduce the amount of md5ing that happens when populating the trust db.

Happy to address any feedback!

image

@Kangie Kangie changed the title [WIP] Ebuild backend Ebuild backend Dec 18, 2023
@stevegrubb
Copy link
Member

stevegrubb commented Dec 18, 2023

OK, the other patch was merged which appears to create a couple conflicts that need fixing. I'll see if I can look this PR over soon.

@Kangie Kangie force-pushed the emerge-backend branch 2 times, most recently from 3149256 to 65041a5 Compare December 19, 2023 12:43
@stevegrubb
Copy link
Member

Was it really necessary to rename init.d to extra? That added a whole lot of churn.

There is a lot of code in /usr/share. It's in the HFS docs as the place for purely interpreted libraries.

Have you compiled and run this with address/undefined behavior sanitizers?

@Kangie
Copy link
Contributor Author

Kangie commented Jan 15, 2024

Apologies for the delayed response, I got deployed to help out with the response to the local severe weather event - today is my first day back at the keyboard.

Was it really necessary to rename init.d to extra? That added a whole lot of churn.

It seemed like the best way to separate the systemd init scripts, new openrc init stuff, and the application data / configuration that was all stored together under the 'init' folder. If you have a better way of organising it (or just one that you prefer) I'm happy to amend that commit.

There is a lot of code in /usr/share. It's in the HFS docs as the place for purely interpreted libraries.

In Gentoo it's 'architecture independent application data which is not modified at runtime.', but I do see some python, JS, and... C header files in amongst all the other binary files installed there on my development box.

I'll have to come up with a better filter. I'll force push and drop that for the short term.

Have you compiled and run this with address/undefined behavior sanitizers?

Yes. I actually have a branch where I'm tracking down some weirdness that I thought I caused but apparently existed in the code before I started touching it. I have a minimal broken example so if I can't work that out I'll log a bug:

library/escape.c:100:9: runtime error: load of address 0x7fb4ae604175 with insufficient space for an object of type 'char'
0x7fb4ae604175: note: pointer points here
 72 6f 6f 74 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00
             ^ 
    #0 0x55bd93a119a6 in unescape_shell library/escape.c:100
    #1 0x55bd939f9c8f in handle_mounts daemon/fapolicyd.c:373
    #2 0x55bd939f8661 in main daemon/fapolicyd.c:637
    #3 0x7fb4b325fee9 in __libc_start_call_main (/usr/lib64/libc.so.6+0x23ee9)
    #4 0x7fb4b325ffa4 in __libc_start_main_alias_1 (/usr/lib64/libc.so.6+0x23fa4)
    #5 0x55bd939f96c0 in _start (/data/development/temp/fapolicyd/src/fapolicyd+0x946c0)

library/escape.c:116:7: runtime error: store to address 0x7fb4ae604175 with insufficient space for an object of type 'char'
0x7fb4ae604175: note: pointer points here
 72 6f 6f 74 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00
             ^ 
    #0 0x55bd93a11cad in unescape_shell library/escape.c:116
    #1 0x55bd939f9c8f in handle_mounts daemon/fapolicyd.c:373
    #2 0x55bd939f8661 in main daemon/fapolicyd.c:637
    #3 0x7fb4b325fee9 in __libc_start_call_main (/usr/lib64/libc.so.6+0x23ee9)
    #4 0x7fb4b325ffa4 in __libc_start_main_alias_1 (/usr/lib64/libc.so.6+0x23fa4)
    #5 0x55bd939f96c0 in _start (/data/development/temp/fapolicyd/src/fapolicyd+0x946c0)

I am currently leaking 13 bytes allocated here that I haven't been able to track down, I'll try and look into that soon but I'm still recovering from working a week and a half of my vacation!

@stevegrubb
Copy link
Member

stevegrubb commented Jan 16, 2024

Hello, sounds like you've been busy. I think it might be best to not change the name of the directory as part of this patch set. That can/should be done separately. The idea is to minimize the changes in case we need to bisect back to this. The original idea was that everything related to config and initialization would live in the init directory. Let's address this after this patch is merged. Also, I think we should schedule this patch for after the 1.3.3 release so that we can get other bug fixes out to the community. Maybe since this is a new platform being on-boarded, we call the next release 1.4.

Regarding /usr/share, you might look at how we do this for rpm. We use a config file, fapolicyd-filter.conf. There is a library/filter.h file that shows the API that uses it. We had to resort to this because we found that we may be getting rid of files that the user really wants to trust. So, we gave them a way to take control.

As for the address sanitizer issue, I think that should be filed separately and treated as a 1.3.3 release blocker. I can see an issue it my testing too. Thanks for spotting it. Issue #281 was opened for this. @radosroka , have you seen the above address sanitizer issue?

@Kangie
Copy link
Contributor Author

Kangie commented Jan 18, 2024

I think it might be best to not change the name of the directory as part of this patch set. That can/should be done separately. The idea is to minimize the changes in case we need to bisect back to this. The original idea was that everything related to config and initialization would live in the init directory

Would you be happy for a patch/PR that moves the existing files into the new structure so that the 'gentoo PR' can just add the openrc files, or is there a strong preference for sticking with the existing approach?

Also, I think we should schedule this patch for after the 1.3.3 release so that we can get other bug fixes out to the community. Maybe since this is a new platform being on-boarded, we call the next release 1.4.

No worries, less time pressure to look at filtering. 😄

Gentoo has a binary package host now, so it will be sane to add a containerised CI/CD option where it still wasn't late last year - I'll look into that too if I have time!

Regarding /usr/share, you might look at how we do this for rpm. We use a config file, fapolicyd-filter.conf. There is a library/filter.h file that shows the API that uses it. We had to resort to this because we found that we may be getting rid of files that the user really wants to trust. So, we gave them a way to take control.

Will do.

@radosroka
Copy link
Member

library/escape.c:100:9: runtime error: load of address 0x7fb4ae604175 with insufficient space for an object of type 'char'
0x7fb4ae604175: note: pointer points here
72 6f 6f 74 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
#0 0x55bd93a119a6 in unescape_shell library/escape.c:100
#1 0x55bd939f9c8f in handle_mounts daemon/fapolicyd.c:373
#2 0x55bd939f8661 in main daemon/fapolicyd.c:637
#3 0x7fb4b325fee9 in __libc_start_call_main (/usr/lib64/libc.so.6+0x23ee9)
#4 0x7fb4b325ffa4 in __libc_start_main_alias_1 (/usr/lib64/libc.so.6+0x23fa4)
#5 0x55bd939f96c0 in _start (/data/development/temp/fapolicyd/src/fapolicyd+0x946c0)

library/escape.c:116:7: runtime error: store to address 0x7fb4ae604175 with insufficient space for an object of type 'char'
0x7fb4ae604175: note: pointer points here
72 6f 6f 74 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
#0 0x55bd93a11cad in unescape_shell library/escape.c:116
#1 0x55bd939f9c8f in handle_mounts daemon/fapolicyd.c:373
#2 0x55bd939f8661 in main daemon/fapolicyd.c:637
#3 0x7fb4b325fee9 in __libc_start_call_main (/usr/lib64/libc.so.6+0x23ee9)
#4 0x7fb4b325ffa4 in __libc_start_main_alias_1 (/usr/lib64/libc.so.6+0x23fa4)
#5 0x55bd939f96c0 in _start (/data/development/temp/fapolicyd/src/fapolicyd+0x946c0)


I am currently leaking 13 bytes [allocated here](https://github.com/linux-application-whitelisting/fapolicyd/pull/274/files#diff-b2e2388c4837a311031ffe6082dfdc129c60bad8def1a9f1fbe5f73ad3f90622R284) that I haven't been able to track down, I'll try and look into that soon but I'm still recovering from working a week and a half of my vacation!

Is this from the daemon itself or CLI?

@Kangie
Copy link
Contributor Author

Kangie commented Jan 25, 2024

It's from the daemon itself.

@radosroka
Copy link
Member

It's from the daemon itself.

And was it on this branch or main?

@Kangie
Copy link
Contributor Author

Kangie commented Apr 12, 2024

Not forgotten, just had a few other projects eat up my time. I'll aim to get this finished off for late April / Early may; I have some leave coming up :)

Kangie and others added 8 commits October 26, 2024 16:51
- add ebuild backend to backend manager
- add ebuilddb to database
- remove `MD5` call from `USE_DEB` gating in `get_hash_from_fd2`
Based on examples from debian backend (md5) and RPM backend.
- ./init -> ./extra (there's more than just init stuff here)
- systemd unit -> extra/systemd
- new ./extra/openrc/{init,conf}.d/fapolicy
- exclude operc files from RPM build
Also:
 - properly nest FAQ content
 - fix indentation of code block under notes
 - fix table formatting (at least, as good as GFM tables get)
@Kangie
Copy link
Contributor Author

Kangie commented Oct 26, 2024

OK I put fvwm3 to bed and finally have some time for this.

Thanks for the pointer on the filter - I'm almost there with it's implementation - I seem to be having some difficulty getting anything to match the filters though, which is annoying. I'll try and track that down over the next little bit.

All that I can tell for now is that the filter file is loaded (just a slightly modified fedora config because our kernel path is /usr/src/linux-*gentoo/) , does not seem to be invalid (as filter_load_file seems happy) however none of the full paths that I pass to filter_check seem to result in a match.

 - usr/src/linux-*/
  + */scripts/*
  + */tools/objtool/*

My "sophisticated debugging" techniques here seem to reveal that filter_check always returns 1:

10/26/24 19:06:22 [ DEBUG ]: filter_check: /usr/src/linux-6.10.1-gentoo/drivers/gpu/drm/nouveau/nvkm/subdev/acr/gp10b.c
10/26/24 19:06:22 [ DEBUG ]: filter_check: result 1
10/26/24 19:06:22 [ DEBUG ]: Adding /usr/src/linux-6.10.1-gentoo/drivers/gpu/drm/nouveau/nvkm/subdev/acr/gp10b.c
10/26/24 19:06:22 [ DEBUG ]:    Expected MD5: 8ff4c8c354f0ea101adb83290e56679b
10/26/24 19:06:22 [ DEBUG ]:    File size: 2076

I also note that /usr/include, a default excluded path in the fedora config (if I'm reading that correctly) is also returning a 1:

+ /
 - usr/include/
10/26/24 19:44:06 [ INFO ]: Adding app-text/ghostscript-gpl-10.03.1:0/10.03 (::gentoo) to the ebuild backend; 341 files
10/26/24 19:44:06 [ DEBUG ]: filter_check: /usr/include/ijs/ijs_server.h
10/26/24 19:44:06 [ DEBUG ]: filter_check: result 1
10/26/24 19:44:06 [ DEBUG ]: Adding /usr/include/ijs/ijs_server.h
10/26/24 19:44:06 [ DEBUG ]:    Expected MD5: 61b6bc496441226d94acbf3bec7c4c65
10/26/24 19:44:06 [ DEBUG ]:    File size: 4223

Otherwise we seem to be looking pretty good; after 15 minutes or so in permissive mode; it's even happy with the package manager (at least in terms of it removing things!):

10/26/24 19:24:05 [ INFO ]: shutting down...
10/26/24 19:24:05 [ DEBUG ]: rule=1 dec=allow perm=open auid=1000 pid=459150 exe=/usr/bin/python3.12 : path=/var/tmp/portage/._unmerge_/sys-kernel/gentoo-sources-6.11.4/temp/build.log ftype=text/plain trust=0
10/26/24 19:24:05 [ DEBUG ]: Exiting decision thread
10/26/24 19:24:05 [ DEBUG ]: Update poll interrupted
10/26/24 19:24:05 [ DEBUG ]: Update poll timeout expired
10/26/24 19:24:06 [ DEBUG ]: Inter-thread max queue depth 7
10/26/24 19:24:06 [ DEBUG ]: Allowed accesses: 110003
10/26/24 19:24:06 [ DEBUG ]: Denied accesses: 12
10/26/24 19:24:06 [ DEBUG ]: Trust database max pages: 256000
10/26/24 19:24:06 [ DEBUG ]: Trust database pages in use: 66174 (25%)
10/26/24 19:24:06 [ DEBUG ]: Subject cache size: 1549
10/26/24 19:24:06 [ DEBUG ]: Subject slots in use: 1469 (94%)
10/26/24 19:24:06 [ DEBUG ]: Subject hits: 108546
10/26/24 19:24:06 [ DEBUG ]: Subject misses: 4798
10/26/24 19:24:06 [ DEBUG ]: Subject evictions: 3329 (3%)
10/26/24 19:24:06 [ DEBUG ]: Object cache size: 8191
10/26/24 19:24:06 [ DEBUG ]: Object slots in use: 8184 (99%)
10/26/24 19:24:06 [ DEBUG ]: Object hits: 101831
10/26/24 19:24:06 [ DEBUG ]: Object misses: 59834
10/26/24 19:24:06 [ DEBUG ]: Object evictions: 51650 (50%)

library/md5-backend.c \
library/md5-backend.h
library/md5-backend.h \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since WITH_EBUILD pulls in NEED_MD5 now, this should be removable. I would similarly suggest that WITH_DEB should be checked to ensure it does, and remove that from the WITH_DEB chunk as well.

typedef struct contents {
char *md5;
char *path;
} ebuildfiles;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but this describes a single file, so it should properly be ebuildfile - otherwise it is confusing if the object is an array of file objects or just a single file.

@@ -462,11 +462,7 @@ char *get_hash_from_fd2(int fd, size_t size, const int is_sha)
if (is_sha) {
SHA256(mapped, size, (unsigned char *)&hptr);
} else {
#ifdef USE_DEB
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should likely use an MD5 preprocessor definition instead of becoming unconditional.

Comment on lines +32 to +36
#include "fapolicyd-backend.h" // for SRC_EBUILD, backend
#include "filter.h" // for filter_destroy, filter_init, filter_l...
#include "llist.h" // for list_empty, list_init
#include "md5-backend.h" // for add_file_to_backend_by_md5
#include "message.h" // for msg
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From other code in this repository, 8 column tabs are used; these comments don't align with the others when using 8 column tabs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants