Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: Analyze Impact and Alternatives for Reducing Inventory Events #573

Closed
cborla opened this issue Feb 4, 2025 · 8 comments
Closed
Assignees
Labels
level/task Task issue module/agent module/inventory Inventory module spike Spike type/enhancement Enhancement issue

Comments

@cborla
Copy link
Member

cborla commented Feb 4, 2025

Objective

Investigate the impact of disabling the inventory of active processes and ports in use, as well as explore potential alternative solutions to reduce the volume of inventory events without affecting key functionalities.

Research Areas

  1. Event Volume Analysis

    • Measure the number of create/update/delete events generated by active processes and ports in use under normal agent operation.
    • Identify how often these events are triggered and their contribution to overall event load.
  2. Performance Impact

    • Evaluate the CPU and memory footprint when these inventories are enabled versus disabled.
    • Determine if reducing these events significantly improves agent performance.
  3. Alternative Solutions

    • Investigate the feasibility of event aggregation instead of direct disablement.
    • Consider implementing sampling strategies to reduce event frequency.
    • Explore configurable thresholds to detect relevant process or port changes without flooding the system.

Deliverables

  • A report summarizing the findings, including event volume metrics and performance impact.
  • A list of recommended solutions with pros and cons.
  • If applicable, a proposal for a new feature to optimize inventory event generation.

Expected Outcome

The results of this spike will guide the final decision on whether to simply disable these inventories or implement a more refined approach to event reduction.

@cborla cborla added level/task Task issue module/agent module/inventory Inventory module type/change Change performed in a resource or Wazuh Cloud environment labels Feb 4, 2025
@wazuhci wazuhci moved this to In progress in XDR+SIEM/Release 5.0.0 Feb 4, 2025
@cborla cborla added spike Spike and removed type/change Change performed in a resource or Wazuh Cloud environment labels Feb 5, 2025
@LucioDonda
Copy link
Member

LucioDonda commented Feb 5, 2025

Basic time analysis

  • By measuring time spent in each function we can take a simple comparison on the impact of the use of ports_all or processes in the whole inventory scan, in some cases more than one run was measured:
case ScanProcesses time ScanPorts time All Scans
Full Scan 0.397699 - 0.837886 0.054255 - 0.127410 3.0684 - 6.531224
No All Ports 0.384774 - 0.836661 0.025591 - 0.049919 3.015369 - 6.389182
No Ports 0.818377 0 6.310172
No Processes 0 0.113063 4.511051
No Processes and Ports 0 0 2.626694 - 5.477683
  • As can be seen, by several OS processes outside the agent itself, the results can change severely. That's why a profiling or a benchmarking tool is needed.
  • Besides that, we can make a simple percentage calculation for each run:
case ScanProcesses time ScanPorts time All Scans
Full Scan 12.9% - 12.82% 1.76% - 1.95% 100% - 100%
No All Ports 12.7% - 13.1% 0.84% - 0.78% 100% - 100%
No Ports 12% 0 100%
No Processes 0 2.5% 100%

@LucioDonda
Copy link
Member

LucioDonda commented Feb 5, 2025

Benchmarking and analysis tools

After some research a couple of tools were used

  • gprof for CPU time and call graphs:
    Can tell which functions consume the most CPU time. and give us a call graph (how functions call each other). It can identifying CPU bottlenecks in code.
example of execution for all scan case:

root@pm-ubuntu24-server:/home/pm-vagrant/workspace/wazuh-agent/build# cat analysis.txt 
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  8.33      0.02     0.02                             std::vector<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, std::allocator<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void> > >* nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>::create<std::vector<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, std::allocator<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void> > >, std::vector<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, std::allocator<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void> > > const&>(std::vector<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, std::allocator<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void> > > const&)
  8.33      0.04     0.02                             _init
  8.33      0.06     0.02                             btreeParseCellPtr
  8.33      0.08     0.02                             sqlite3RunParser
  4.17      0.09     0.01                             EVP_DigestFinal_ex
  4.17      0.10     0.01                             MultiTypeQueue::storedItems(MessageType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
  4.17      0.11     0.01                             nlohmann::json_abi_v3_11_3::detail::serializer<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void> >::dump_escaped(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
  4.17      0.12     0.01                             nlohmann::json_abi_v3_11_3::detail::lexer<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>, nlohmann::json_abi_v3_11_3::detail::iterator_input_adapter<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::scan_string()
  4.17      0.13     0.01                             nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>::dump(int, char, bool, nlohmann::json_abi_v3_11_3::detail::error_handler_t) const
  4.17      0.14     0.01                             std::locale::id::_M_id() const
  4.17      0.15     0.01                             std::__basic_file<char>::open(char const*, std::_Ios_Openmode, int)
  4.17      0.16     0.01                             std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void> >, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void> > >, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void> > > >::_M_get_insert_hint_unique_pos(std::_Rb_tree_const_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void> > >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
  4.17      0.17     0.01                             cleanup_old_md_data
  4.17      0.18     0.01                             fe51_sub
  4.17      0.19     0.01                             gcm_ghash_4bit
  4.17      0.20     0.01                             lengthFunc
  4.17      0.21     0.01                             pcache1Fetch
  4.17      0.22     0.01                             sqlite3SrcListAppend
  4.17      0.23     0.01                             sqlite3VdbeExec
  4.17      0.24     0.01                             sqlite3VdbeSetNumCols

 %         the percentage of the total running time of the
time       program used by this function.

cumulative a running sum of the number of seconds accounted
 seconds   for by this function and those listed above it.

 self      the number of seconds accounted for by this
seconds    function alone.  This is the major sort for this
           listing.

calls      the number of times this function was invoked, if
           this function is profiled, else blank.

 self      the average number of milliseconds spent in this
ms/call    function per call, if this function is profiled,
	   else blank.

 total     the average number of milliseconds spent in this
ms/call    function and its descendents per call, if this
	   function is profiled, else blank.

name       the name of the function.  This is the minor sort
           for this listing. The index shows the location of
	   the function in the gprof listing. If the index is
	   in parenthesis it shows where it would appear in
	   the gprof listing if it were to be printed.


Copyright (C) 2012-2024 Free Software Foundation, Inc.

Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.

  • perf for system-wide and hardware-level performance:
    Gives us CPU usage, cache misses, branch mispredictions, and other hardware events. Kernel and user-space performance.
    Best for: Analyzing both application and system-level performance. Identifying hardware-related bottlenecks (e.g., cache misses, CPU stalls).
Result of perf report to scan all case

Overhead Command Shared Object Symbol
2.99% wazuh-agent libc.so.6 [.] malloc
2.86% wazuh-agent libc.so.6 [.] __memmove_ssse3
2.39% wazuh-agent wazuh-agent [.] sqlite3VdbeExec
2.23% wazuh-agent wazuh-agent [.] nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, bool, long, unsigned int, double, std::allocator<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >
1.98% wazuh-agent libc.so.6 [.] _int_malloc
1.90% wazuh-agent libc.so.6 [.] cfree@GLIBC_2.2.5
1.80% wazuh-agent libc.so.6 [.] _int_free
1.59% wazuh-agent wazuh-agent [.] std::__detail::_Executor<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<unsigned long, std::pair<unsigned long, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > >
1.58% wazuh-agent wazuh-agent [.] sqlite3RunParser
1.57% wazuh-agent wazuh-agent [.] nlohmann::json_abi_v3_11_3::detail::serializer<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, bool, long, unsigned int, double, std::allocator<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >
1.42% wazuh-agent wazuh-agent [.] nlohmann::json_abi_v3_11_3::detail::lexer<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, bool, long, unsigned int, double, std::allocator<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >
1.36% wazuh-agent wazuh-agent [.] lengthFunc
1.33% wazuh-agent libc.so.6 [.] __memcmp_sse2
1.07% wazuh-agent libc.so.6 [.] malloc_consolidate
0.82% wazuh-agent [kernel.kallsyms] [k] asm_sysvec_apic_timer_interrupt
0.74% wazuh-agent wazuh-agent [.] yy_reduce.isra.0
0.71% wazuh-agent wazuh-agent [.] nlohmann::json_abi_v3_11_3::detail::lexer<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, bool, long, unsigned int, double, std::allocator<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >
0.70% wazuh-agent wazuh-agent [.] std::vector<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > >
0.69% wazuh-agent libc.so.6 [.] __memchr_sse2
0.68% wazuh-agent libc.so.6 [.] pthread_mutex_lock@@GLIBC_2.2.5
0.67% wazuh-agent libc.so.6 [.] __strlen_sse2
0.65% wazuh-agent wazuh-agent [.] std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::pair<unsigned long, std::pair<unsigned long, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >, std::_Identity<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::pair<unsigned long, std::pair<unsigned long, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > >, std::less<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::pair<unsigned long, std::pair<unsigned long, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > > >
0.57% wazuh-agent wazuh-agent [.] resolveExprStep
0.56% wazuh-agent wazuh-agent [.] sqlite3Insert
0.54% wazuh-agent wazuh-agent [.] findElementWithHash.constprop.0
0.54% wazuh-agent wazuh-agent [.] nlohmann::json_abi_v3_11_3::detail::serializer<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, bool, long, unsigned int, double, std::allocator<nlohmann::json_abi_v3_11_3::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits, std

Details

  • Callgrind (valgrind) for detailed instruction and cache analysis:
    It can tell us the number of instructions executed by each function, cache misses and memory access patterns, not quite what we want in this case.

Files

@LucioDonda
Copy link
Member

Update 05/02

  • Base analysis on time and some tools that can be used to avoid specific system variations.
  • Pending analysis:
    • add quantity of events in time analysis.
    • apply the functions analysis of perf in each case

@LucioDonda
Copy link
Member

Basic analysis

Additional analysis made on the same case:

  • by number of events
case Processes Ports All
Events 214(13.7%) 26(1.67%) 1555 (100%)
  • by sizes of events
case Processes Ports All
Full Size 107KB(18%) 10.8K(1.85%) 581KB (100%)

@LucioDonda
Copy link
Member

Event Volume Analysis

  • During an hour execution of the agent with a 60s interval
# cat /etc/wazuh-agent/wazuh-agent.yml | grep -A 11 inventory:
inventory:
  enabled: true
  interval: 60s
  scan_on_start: true
  hardware: true
  system: true
  networks: true
  packages: true
  ports: true
  ports_all: true
  processes: true 
  hotfixes: true

Summary of statistics gathered:

  • Number of scans: 61 (1h).
  • Inventory events: 2912
  • Procces Inventory events: 928 (31%).
  • Ports Inventory events: 1136 (39%).
  • Hardware Inventory events: 120 (4.1%).
  • Networks Inventory events: 728 (25%).
  • Events with operation field (1512): Create 483 (31.9%), delete 339 (22.4%), update 710 (46.9%)
  • Average secconds taken on each scan All 0.1810495738 , Processes 0.03864167213, Ports 0.04535318033.

@LucioDonda
Copy link
Member

LucioDonda commented Feb 7, 2025

Event Volume Analysis

  • Same analysis during an hour execution repeated in order to solve the lack packages events.

Summary of statistics gathered:

  • Number of scans: 63 (~1h).
  • Events with operation field (2905): Create 1842(63.4%), delete 310 (10.67%), update 753 (25.9%)
    • Inventory events: 4270
    • Procces events: 983 (23%).
    • Ports events: 1118 (26%).
    • Hardware events: 123 (2.9%).
    • Networks events: 738 (17.3%).
    • Packages events: 1307 (30.6%).
  • The update operations are caused by hardware 61 (8.1%) networks 366 (48.6%), ports 25 (3.32%) and processes 301 (39.9%)
  • Ports updates are caused by changes in:
    • network/egress or network/ingress
    • Same happens on network events.
    • Waiting for an answer to check if deleting them can cause any issue on other modules.
  • Processes updates are caused by changes in:
  • All the cases of update operations of processes events are caused by kworker processes., these are generated by the kernel and shouldn't be deleted.

@vikman90 vikman90 added the type/enhancement Enhancement issue label Feb 10, 2025
@LucioDonda
Copy link
Member

Closing comment

Possible feature improvements

  • A simpler approach is to disable ports_all and processes by default in the configuration:
    • A PR will be opened for this.
  • Statistical analysis of events for future changes (like in 4.x):
    • This way, users or us developers can get some clear ideas of what events are being generated and how fast they can change.
  • Benchmarking of some key event sending methods to see if some of them can affect agent specs.
    IMO this is not the case for the analysis requested.
  • Interface address stats can be disabled (all related to ingress or egress impact in ports and network
const auto stats { m_interfaceAddress->stats() };
    network["tx_packets"] = stats.txPackets;
    network["rx_packets"] = stats.rxPackets;
    network["tx_bytes"] = stats.txBytes
    network["rx_bytes"] = stats.rxBytes
    network["tx_errors"] = stats.txErrors
    network["rx_errors"] = stats.rxErrors
    network["tx_dropped"] = stats.txDropped
    network["rx_dropped"] = stats.rxDropped;

A cleaner option is to modify dbsync not to check changes: internally this can be done with

input["options"]["ignore"].push_back("tx_bytes");
input["options"]["ignore"].push_back("tx_packets");
input["options"]["ignore"].push_back("rx_bytes")
input["options"]["ignore"].push_back("rx_packets")
input["options"]["ignore"].push_back("tx_dropped")
input["options"]["ignore"].push_back("tx_errors")
input["options"]["ignore"].push_back("rx_dropped")
input["options"]["ignore"].push_back("rx_errors")

This can generate events with these fields, but won't send a notification after a change.

Additional notes:

  • The frequency of events is directly related to the update interval and which fields are compared.
  • In an hourly analysis, the influence of updates on the total communication is not significant, it is still one third of the creation operations.
    • This is also in an environment where the refresh interval was 60 seconds.
    • In real-world scenarios this is not quite the same and can vary.

@cborla
Copy link
Member Author

cborla commented Feb 10, 2025

LGTM!

@cborla cborla closed this as completed Feb 10, 2025
@wazuhci wazuhci moved this from In progress to Done in XDR+SIEM/Release 5.0.0 Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level/task Task issue module/agent module/inventory Inventory module spike Spike type/enhancement Enhancement issue
Projects
Status: Done
Development

No branches or pull requests

3 participants