Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve efficiency (catch all) #84

Open
jtmoon79 opened this issue Mar 17, 2023 · 7 comments
Open

improve efficiency (catch all) #84

jtmoon79 opened this issue Mar 17, 2023 · 7 comments
Labels
code improvement enhancement not seen by the user

Comments

@jtmoon79
Copy link
Owner

jtmoon79 commented Mar 17, 2023

Summary

Improve efficiency either as CPU usage or Memory usage. This is a catch-all Issue for efficiency improvements that are too small to merit an individual Issue.

Current behavior

Some implementation could be faster.

Many of these areas are noted with code comment magic word cost-savings, like

// TODO: cost-savings: ...

For example,

pub(crate) fn captures_to_buffer_bytes(
    buffer: &mut [u8],
    // ...
    year_opt: &Option<Year>,
    // ...
) {
    // ...
    match year_opt {
        Some(year) => {
            // TODO: 2022/07/11 cost-savings: pass in `Option<&[u8]>`, avoid creating `String`
            let year_s: String = year.to_string();
            debug_assert_eq!(year_s.len(), 4, "Bad year string {:?}", year_s);
            defo!("using fallback year {:?}", year_s);
            copy_slice_to_buffer!(year_s.as_bytes(), buffer, at);
        }
    }
    // ...
}

Suggested behavior

Change the implementation to be faster or more efficient.

Other

These changes may violate recommended practice for when to optimize (too much work; too little gain).

@jtmoon79 jtmoon79 added the code improvement enhancement not seen by the user label Mar 17, 2023
jtmoon79 added a commit that referenced this issue Mar 18, 2023
jtmoon79 added a commit that referenced this issue May 2, 2023
Pre-create strings of the FixedOffset used by SyslineReader.
Avoid creating a new String for every sysline processed.

Issue #84
jtmoon79 added a commit that referenced this issue May 2, 2023
Use PfhMap compile-time map from timezone names, e.g. "PST", to
timezone values, e.g. "-07:00"

Issue #84
jtmoon79 added a commit that referenced this issue May 6, 2023
Remove inefficient hashset creation/destruction for tracking
keys found in `next_short`. Just use local `bool`s.

Issue #84
@jtmoon79
Copy link
Owner Author

jtmoon79 commented May 21, 2023

@jtmoon79
Copy link
Owner Author

jtmoon79 commented May 22, 2023

Great article comparing thread memory usage. Apparently tokio is has very efficient memory-use for each thread.

@jtmoon79
Copy link
Owner Author

jtmoon79 commented Mar 3, 2024

StringZilla looks promising. However, I'd need to review the flamegraph to see where it should be applied. I could apply it everywhere but that's a fair amount of work. So be little smarter about it.

Update: benchmarked here. Does well but is not as great as I hoped. memchr does better. Implemented in 55b8777

jtmoon79 added a commit that referenced this issue Apr 10, 2024
Compile RegEx on-demand with the help of `once_cell`

Issue: #84
@jtmoon79
Copy link
Owner Author

After chronotope/chrono#1559 is implemented and released, then https://github.com/jtmoon79/super-speedy-syslog-searcher/blob/0.6.70/src/data/datetime.rs#L5002 should be faster and I don't think any changes will be necessary.

  • verify chrono refactor of parsing is used in s4

@jtmoon79
Copy link
Owner Author

jtmoon79 commented Apr 15, 2024

This comment moved to it's own Issue #288


A very worthwhile improvement is combining the multiple regular expressions for various numeric timezones into one regular expression.
So instead of three regular expressions for timezone patterns +00:00, +0000, and +00, have one regular expression that handles those.

This would significantly reduce the number of hardcoded regular expressions reducing run-time and memory footprint, and simplify some of the code in datetime.rs
https://github.com/jtmoon79/super-speedy-syslog-searcher/blob/0.6.70/src/data/datetime.rs#L2503-L2509

@jtmoon79
Copy link
Owner Author

@jtmoon79
Copy link
Owner Author

pre-compile Regex

looks like it's not possible to pre-compile a Regex.

The full story here really is not only a monumental amount of work, but it's not a one-time investment. By exposing a stable binary representation, it also makes internal evolution so much harder because any changes need to be reconciled with the binary format. I really just do not know if it will ever happen.

It's possible to pre-compile a DFA.

Moreover, regex-automata 0.2 (and especially 0.3 once it's released) will support serialization of regex DFAs.

However, DFA does not support capture groups.

This crate does not support sub-match extraction, which can be achieved with the regex crate's "captures" API. This may be added in the future, but is unlikely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code improvement enhancement not seen by the user
Projects
None yet
Development

No branches or pull requests

1 participant