Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage on default allocator #93

Open
i1i1 opened this issue Sep 8, 2022 · 4 comments
Open

High memory usage on default allocator #93

i1i1 opened this issue Sep 8, 2022 · 4 comments

Comments

@i1i1
Copy link
Contributor

i1i1 commented Sep 8, 2022

After switching from rocksdb to parity db we discovered high memory usage on system allocator:

system jemalloc
btree 5.93G 4.34G
kv 7.27G 5.49G

Here is the code for reproducing the issue:

#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;

fn main() {
    let batch_size = 1024 * 1024;
    let handles = (0..4)
        .map(|i| {
            let p = format!("some-path/{i}");
            let _ = std::fs::remove_dir_all(&p);
            let opts = parity_db::Options {
                path: p.into(),
                columns: vec![parity_db::ColumnOptions {
                    preimage: false,
                    btree_index: true,
                    uniform: false,
                    ref_counted: false,
                    compression: parity_db::CompressionType::NoCompression,
                    compression_threshold: 4096,
                }],
                sync_wal: true,
                sync_data: true,
                stats: false,
                salt: None,
            };
            std::thread::spawn(move || {
                let db = parity_db::Db::open_or_create(&opts).unwrap();

                for range in (0..100u64).map(|i| i * batch_size..(i + 1) * batch_size) {
                    db.commit(range.map(|i| (0, i.to_be_bytes(), Some(i.to_le_bytes().to_vec()))))
                        .unwrap();
                }
            })
        })
        .collect::<Vec<_>>();
    for handle in handles {
        handle.join().unwrap();
    }
}
@arkpar
Copy link
Member

arkpar commented Sep 8, 2022

The problem is with large batch size and small individual values. There's a lot of overhead for this use case.

Most of the overhead is storing modified index pages in IndexLogOverlay . Since we store full pages, each modified index page requires at least 512+24 bytes of memory until the record is flushed. This is not something we can fix easily.
I'd advise you to simply reduce batch_size

@nazar-pc
Copy link
Contributor

nazar-pc commented Sep 8, 2022

Is there something on ParityDB side that doesn't allow allocator to later reclaim most of this memory BTW? We see our app using ~1.3G of RAM until this step, but doesn't go down to quite the same level afterwards.

@arkpar
Copy link
Member

arkpar commented Sep 9, 2022

Yes, some of the internal overlays just stay at peak usage. We'll add a patch to release them when writes are idling.

@nazar-pc
Copy link
Contributor

nazar-pc commented Sep 9, 2022

Hm... that explains further growth as we open more and more databases (about 10 of that kind typically).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants