new(libsinsp): add concatenated process lineage filter fields + sinsp_filter_check_thread helpers cleanup #1625

incertum · 2024-01-17T01:18:03Z

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Less loops and more efficient if you want to export all process ancestors up to a certain level anyways
More convenient and intuitive display of information -> security analysts prefer an output like this /bin/java->/bin/bash->/bin/python->/bin/bash to quickly understand the process origins
Enhanced threat detection capabilities -> can string match exact lineage / sequence
Needed in the planned anomaly detection framework

Note: I don't think we need this for the other proc.a* fields aka only for the process names, exe, and exepaths ...

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

new(libsinsp): add concatenated process lineage fields as filter / display fields

poiana · 2024-01-17T01:18:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: incertum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [incertum]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

incertum · 2024-01-17T01:20:25Z

CC @loresuso @darryk10

Andreagit97

I proposed some code cleanups. This is not an issue of this PR of course but since we are not in a rush we can maybe find a way to reduce code duplication

userspace/libsinsp/sinsp_filtercheck_thread.cpp

incertum · 2024-01-17T15:21:02Z

Yeah I had similar thoughts that we are getting to a point where there is lots of code duplication in this regard (not limited to the type of fields touched in this PR).

I'll check what makes sense to clean up in this PR!

incertum · 2024-01-20T02:29:23Z

@Andreagit97 added 3 cleanup commits on top, performing significant consolidation w/ new helpers, beyond the initial request. WDYT and do you have additional ideas?

leogr · 2024-03-07T16:40:12Z

@incertum

Can you rebase pls? Then the sanitizer CI jobs should run.

…splay fields * Less loops and more efficient if you want to export all process ancestors up to a certain level anyways * More convenient and intuitive display of information -> security analysts prefer an output like this /bin/java->/bin/bash->/bin/python->/bin/bash to quickly understand the process origins * Enhanced threat detection capabilities -> can string match exact lineage / sequence * Needed in the planned anomaly detection framework Signed-off-by: Melissa Kilby <[email protected]>

Signed-off-by: Melissa Kilby <[email protected]>

incertum · 2024-03-07T17:22:47Z

It seems that while simply rebasing worked, there are updates needed because a few things changed I suspect ... will look into it soon.

Andreagit97 · 2024-03-11T11:01:57Z

Considering what we have today for the proc name (it is the same for exepath, exe, ...), I would try to understand what would be the end goal here...

Today we have:

proc.name
proc.pname
proc.sname
proc.vpgid.name
proc.aname
proc.concat_aname // in this PR

A possible solution could be

proc.name
proc.aname[]
proc.lineage[]

Where:

proc.name is the same as today
proc.aname[] which always takes an argument. It could be an index (0,1,2,...) or a specific key (SESSION, PARENT, GROUP). We cover proc.pname, proc.sname, proc.vpgid.name, proc.aname.
proc.lineage[] which is a list and could take an argument. if used without arguments it returns a list with the whole process lineage, if with an argument (an index 0,1,2,...) returns the lineage starting from the index. We cover proc.aname, proc.concat_aname.

Moreover, we need to implement other operators on the list, like =, startwith, ... to cover all the use cases we want with the aforementioned filter checks.

In this way, we can uniform our filter checks and solve the issues we have highlighted in this PR with ->. WDYT?

leogr · 2024-03-11T14:09:07Z

Moreover, we need to implement other operators on the list, like =, startwith, ... to cover all the use cases we want with the aforementioned filter checks.

👍

incertum · 2024-03-13T03:31:45Z

re #1625 (comment)

proc.aname[] ... proc.aname[] which always takes an argument

This would break the existing use of proc.aname in the filter expression where we traverse all levels up.

proc.name
proc.pname
proc.sname
proc.vpgid.name
proc.aname

Frankly I think the current variants are actually pretty clear, I don't see direct benefits of changing the naming. If you don't mind me asking, what are the benefits? It could also backfire and be less user friendly? WDYT?
From my perspective if anything we should create a super generic filtercheck class that let's folks access any thread property at any level etc. Something definitely beyond the more immediate improvement suggested in this PR, but likely worth it and it would be a bit in line with what @Andreagit97 suggested here.

solve the issues we have highlighted in this PR with ->

Forgive me, I still don't understand the issues with -> to display a lineage / linkage? It appears to be the best choice as a process lineage encodes the order and direction and is not just a list. Should we rather talk of EBPF_IS_LINKED_LIST or something that is closer to our internal concept of IP TUPLES -- anything that directly implies the direction. Now if we want an unordered list as well, that's ok, but we definitely need the new EBPF_IS_LINKED_LIST concept. Graphics or tutorials around linked lists always use -> to explain the links.

@leogr

The use case is legit, but the proposed solution is sub-optimal. Performing a full-text search of zsh->bash into a very long string with all the lineage will create performance penalties. I would rather go for a new operator for lists to satisfy the specific requirement of this use case.

I believe it comes down to the algorithm. Many rules search in proc.cmdline, perhaps we could up our game and make string searching way more performant across the board? What's our current Big O and algo?
Strings are extremely powerful, for example we could now also search for ->zsh->bash implying there was a parent to that sub process tree lineage or zsh->bash-> implying that there was a child ... sublists search couldn't support that, BUT ...

@Andreagit97

Moreover, we need to implement other operators on the list, like =, startwith, ... to cover all the use cases we want with the aforementioned filter checks.

This also opens up new searches as we could now search for sublists where the executable path, for example, ended with a string ... something the string representation cannot truly handle the way we would want to.

In summary:

Understanding that the filterchecks are the core of effectively using Falco, I rather would like to expose too many variants and empower end users. Comparatively they have been much easier to maintain. Historically we have had many many more issues in our parsers.

Let's support:

string representation of the lineage
new linked list concept
and why not an unordered list concept as well (internally) for more performant intersection searches (it could make proc.aname in (x, y, z) even more performant)?

Andreagit97 · 2024-03-13T10:08:28Z

re #1625 (comment)

proc.aname[] ... proc.aname[] which always takes an argument

This would break the existing use of proc.aname in the filter expression where we traverse all levels up.

According to its description in sinsp

When used without any arguments, proc.aname is applicable only in filters and matches any of the process ancestors. For instance, you can use `proc.aname=bash` to match any process ancestor whose name is `bash`

This behavior should be covered by the new proc.lineage, which will provide a list with the whole lineage and so you could do something like (proc.lineage) intersects bash

proc.name
proc.pname
proc.sname
proc.vpgid.name
proc.aname
Frankly I think the current variants are actually pretty clear, I don't see direct benefits of changing the naming. If you don't mind me asking, what are the benefits? It could also backfire and be less user friendly? WDYT? From my perspective if anything we should create a super generic filtercheck class that let's folks access any thread property at any level etc. Something definitely beyond the more immediate improvement suggested in this PR, but likely worth it and it would be a bit in line with what @Andreagit97 suggested here.

I just think that these filter checks are doing almost the same thing in the end, they are accessing a single ancestor in the lineage. Providing users with just one method for doing it, is more intuitive IMO. Of course, this is just my idea let's see what other thinks. Moreover having just one unique method will simplify our lives and will help us keep the code "bug-free". BTW yes this would be a long-term plan,

solve the issues we have highlighted in this PR with ->

Forgive me, I still don't understand the issues with -> to display a lineage / linkage? It appears to be the best choice as a process lineage encodes the order and direction and is not just a list. Should we rather talk of EBPF_IS_LINKED_LIST or something that is closer to our internal concept of IP TUPLES -- anything that directly implies the direction. Now if we want an unordered list as well, that's ok, but we definitely need the new EBPF_IS_LINKED_LIST concept. Graphics or tutorials around linked lists always use -> to explain the links.

Uhm this is not an issue, I used the wrong word sorry, it's just a matter of what Jason well explained here #1625 (comment). Why do we need to support a new string representation of the lineage with -> when we can do a very similar thing with what we have today?

Let's consider again the new proposed filter check proc.lineage[] + the startwith operation between lists. Writing something like proc.lineage startwith (tail, bash) should do exactly what we are talking about... please note that proc.lineage will contain all the ancestors already in order, so for example proc.lineage -> tail, bash, containerd-shim, systemd. The only thing that changes here is the representation so (tail, bash) instead of (tail->bash), or maybe i'm missing something...

Moreover, as a side note, these days we are working on the possibility of comparing 2 filter checks so proc.name = proc.exe and on the possibility of adding new modifiers like toupper(evt.arg.name) = toupper(fd.path) so having less corner cases to handle would surely simplify things. Just to provide you with an example, at the moment we cannot implement the 2 aforementioned features on all the fields like proc.aexe or proc.aexepath because they are using a custom comparing logic, and so we need to uniform them with others before supporting these new features. So I'm not saying we shouldn't add new approaches or new custom logic, I'm just saying that maybe we could obtain all we need from what we have already in place or that we will try to add in the next weeks.

leogr · 2024-03-26T14:31:42Z

Hey folks,

It looks like we have a lot of ideas here, but we haven't reached a consensus, so I will try to summarize the discussion, collect feedback, and come back with a design proposal.

/assign

incertum · 2024-03-27T10:10:54Z

Hey folks,

It looks like we have a lot of ideas here, but we haven't reached a consensus, so I will try to summarize the discussion, collect feedback, and come back with a design proposal.

/assign

SGTM Leo.

incertum · 2024-03-27T10:12:09Z

Also depending on the proposal we may end up using this PR just for some sinsp_filter_check_thread helpers cleanup and create a new one for the new fields? We will see, whichever makes the most sense.

leogr · 2024-04-12T15:31:11Z

Proposal [1/2]: introducing `join(<list>, <sep>)` for concatenating list with a custom separator

This first part of my proposal aims to address the displaying issue raised in this discussion. Later, I will post the [2/2] part of the proposal, which focuses on the filtering issue.

Requirements: This proposal depends on #1789, which needs to be implemented first.

Use case: string representation of the processes lineage (as reported by @incertum)

Solution:

introducing proc.lineage, which is a list (ie. EPF_IS_LIST) with the whole process lineage (similar to the @Andreagit97 proposal, we can discuss if this field should support arguments separately)
introducing the join(<list>, <sep>) transformer, as proposed by me in this comment

As a result, we can use join(proc.lineage, "->") in the output: field of a rule.

Note: This solution might also address the filtering part but with some performance penalty. For instance, one could use join(proc.lineage, "->") contains ->zsh->bash to match a sequence. At the functional level, this is equivalent to what is already implemented by this PR. However, I'd discourage users from doing so because performing a full-text search in a long string usually comes with severe performance penalties (we have seen this in the past). It might still a legit workaround until we have a specific implementation to deal with sequences (I will post my solution in the 2nd part of this proposal).

Action items:

If we reach a consensus on this part of the proposal, I'd recommend rescoping this PR to only add the required field(s) (e.g., proc.lineage).

🙏

incertum · 2024-04-12T15:50:27Z

If we reach a consensus on this part of the proposal, I'd recommend rescoping this PR to only add the required field(s) (e.g., proc.lineage).

SGTM, only suggestion would be to perhaps follow existing more specific naming conventions, for instance:

Option A: proc.exe.lineage, proc.name.lineage or variants
Option B: proc.lineage[name], proc.lineage[exe], personally don't like that as it would be something entirely new, but if we intend to expand this new concept then I would be ok with it.
...

leogr · 2024-04-12T16:32:35Z

If we reach a consensus on this part of the proposal, I'd recommend rescoping this PR to only add the required field(s) (e.g., proc.lineage).

SGTM, only suggestion would be to perhaps follow existing more specific naming conventions, for instance:

Option A: proc.exe.lineage, proc.name.lineage or variants

Option B: proc.lineage[name], proc.lineage[exe], personally don't like that as it would be something entirely new, but if we intend to expand this new concept then I would be ok with it.

...

Option A is acceptable for me as an extension of my proposal.

poiana · 2024-07-11T21:54:34Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

incertum · 2024-07-11T23:50:57Z

/remove-lifecycle stale

incertum · 2024-08-28T02:43:11Z

This PR is too mixed with cleanups and new features.

Will split this off into 2 PRs in the new dev cycle.

We can keep the cleanups here
and the new PR will add the new filtercheck fields (exposing the proc lineage as list in original order),
while the .join() operator will be introduced by someone else in yet another PR.

/milestone 0.19.0

loresuso · 2024-09-11T15:46:44Z

Hey! I wanted to add something to this discussion.
Navigating the thread table from children to parents is not the only "direction" we are interested in. We might want to add another dimension: navigating a process sibling (or processes of the same group).
Shells use groups to put together a bunch of processes, so that they can wait for all the process in the group or send the same signal to each of them. Also we can notice that all these processes are siblings to each other. So for instance, when we execute:

wget https://something | base64 -d | python3 ..

we might be able to navigate the wget and base64 -d processes when we see some python3 being spawned with it's proc.stdin.type=pipe.

Just an idea that we might want to take into account when we design this better!

FedeDP · 2024-11-13T09:13:24Z

/milestone 0.20.0

leogr · 2025-01-07T11:11:57Z

Since we are working on the new transformer (see #2025 and #1925), I propose re-evaluate this once them are ready, so

/milestone 0.21.0

poiana added release-note kind/feature New feature or request dco-signoff: yes area/libsinsp labels Jan 17, 2024

poiana requested review from Andreagit97 and hbrueckner January 17, 2024 01:18

poiana added approved size/L labels Jan 17, 2024

incertum force-pushed the concat-lineage-fields branch from 7dd75f2 to ec010c9 Compare January 17, 2024 01:24

Andreagit97 added this to the 0.15.0 milestone Jan 17, 2024

Andreagit97 reviewed Jan 17, 2024

View reviewed changes

userspace/libsinsp/sinsp_filtercheck_thread.cpp Show resolved Hide resolved

userspace/libsinsp/sinsp_filtercheck_thread.cpp Outdated Show resolved Hide resolved

incertum force-pushed the concat-lineage-fields branch from ec010c9 to df1a23a Compare January 20, 2024 02:26

poiana added size/XXL and removed size/L labels Jan 20, 2024

incertum changed the title ~~new(libsinsp): add concatenated process lineage fields as filter / display fields~~ new(libsinsp): add concatenated process lineage filter fields + sinsp_filter_check_thread helpers cleanup Jan 20, 2024

leogr requested review from Andreagit97, FedeDP, LucaGuerra and jasondellaluce March 7, 2024 16:40

incertum added 4 commits March 7, 2024 16:57

cleanup(libsinsp): add sinsp_filter_check_thread::get_main_thread helper

f19ca75

Signed-off-by: Melissa Kilby <[email protected]>

cleanup(libsinsp): add concat_attribute_thread_hierarchy helper

6b99e18

Signed-off-by: Melissa Kilby <[email protected]>

cleanup(libsinsp): add extract_leader_attribute_thread_hierarchy helper

8a35134

Signed-off-by: Melissa Kilby <[email protected]>

incertum force-pushed the concat-lineage-fields branch from df1a23a to 8a35134 Compare March 7, 2024 16:57

Andreagit97 modified the milestones: 0.15.0, 0.16.0 Mar 11, 2024

incertum modified the milestones: 0.16.0, TBD Mar 25, 2024

poiana assigned leogr Mar 26, 2024

leogr mentioned this pull request Jun 20, 2024

New transformer: join(<list>, <sep>) #1925

Open

poiana added the lifecycle/stale label Jul 11, 2024

poiana removed the lifecycle/stale label Jul 11, 2024

poiana modified the milestones: TBD, 0.19.0 Aug 28, 2024

poiana modified the milestones: 0.19.0, 0.20.0 Nov 13, 2024

poiana modified the milestones: 0.20.0, 0.21.0 Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new(libsinsp): add concatenated process lineage filter fields + sinsp_filter_check_thread helpers cleanup #1625

new(libsinsp): add concatenated process lineage filter fields + sinsp_filter_check_thread helpers cleanup #1625

incertum commented Jan 17, 2024

poiana commented Jan 17, 2024

incertum commented Jan 17, 2024

Andreagit97 left a comment

incertum commented Jan 17, 2024

incertum commented Jan 20, 2024 •

edited

Loading

leogr commented Mar 7, 2024

incertum commented Mar 7, 2024

Andreagit97 commented Mar 11, 2024 •

edited

Loading

leogr commented Mar 11, 2024

incertum commented Mar 13, 2024

Andreagit97 commented Mar 13, 2024

leogr commented Mar 26, 2024

incertum commented Mar 27, 2024

incertum commented Mar 27, 2024

leogr commented Apr 12, 2024

incertum commented Apr 12, 2024

leogr commented Apr 12, 2024

poiana commented Jul 11, 2024

incertum commented Jul 11, 2024

incertum commented Aug 28, 2024

loresuso commented Sep 11, 2024 •

edited

Loading

FedeDP commented Nov 13, 2024

leogr commented Jan 7, 2025

new(libsinsp): add concatenated process lineage filter fields + sinsp_filter_check_thread helpers cleanup #1625

Are you sure you want to change the base?

new(libsinsp): add concatenated process lineage filter fields + sinsp_filter_check_thread helpers cleanup #1625

Conversation

incertum commented Jan 17, 2024

poiana commented Jan 17, 2024

incertum commented Jan 17, 2024

Andreagit97 left a comment

Choose a reason for hiding this comment

incertum commented Jan 17, 2024

incertum commented Jan 20, 2024 • edited Loading

leogr commented Mar 7, 2024

incertum commented Mar 7, 2024

Andreagit97 commented Mar 11, 2024 • edited Loading

leogr commented Mar 11, 2024

incertum commented Mar 13, 2024

Andreagit97 commented Mar 13, 2024

leogr commented Mar 26, 2024

incertum commented Mar 27, 2024

incertum commented Mar 27, 2024

leogr commented Apr 12, 2024

Proposal [1/2]: introducing join(<list>, <sep>) for concatenating list with a custom separator

incertum commented Apr 12, 2024

leogr commented Apr 12, 2024

poiana commented Jul 11, 2024

incertum commented Jul 11, 2024

incertum commented Aug 28, 2024

loresuso commented Sep 11, 2024 • edited Loading

FedeDP commented Nov 13, 2024

leogr commented Jan 7, 2025

incertum commented Jan 20, 2024 •

edited

Loading

Andreagit97 commented Mar 11, 2024 •

edited

Loading

Proposal [1/2]: introducing `join(<list>, <sep>)` for concatenating list with a custom separator

loresuso commented Sep 11, 2024 •

edited

Loading