Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Bulk Loading/Writing for Chunks #535

Open
wants to merge 100 commits into
base: master
Choose a base branch
from

Conversation

Mili-ssm
Copy link
Contributor

@Mili-ssm Mili-ssm commented Feb 9, 2025

This PR implement the Bulk API for chunks loading/writing with some function changes that get leverage of the bulk operation (like reducing allocations, locks and more).

Also this includes some Anvil refactor and the creation of a Traits/API that can be re-used on entity storage system and to implements diferents Writers/Loaders (as DB cached, one that use in Cloud Storage or more optimizations).

Actual Improves

  • Don´t write a chunk on disk when loaded from disk.

  • Lock the Chunk Cache entry while generating a new chunk.
    ( if more threads wants that same chunks, they can wait without doing any work until the chunk is generated )

  • Every time a bunch of chunks (of the same player) are deallocated, the file is write only 1 time.

  • The Files that have ongoing writes/reads operations are stored on memory (to avoid disk readings and file lockings).

  • We have a DashMap for File Cache to avoid writing collisions and don`t corrupt files. (not an Anvil issue at least usually but it was a problem with the Linear format that enforce an FULL FILE WRITE)

  • Chunk writing operation now are made on a path.tmp so now the writing operation cant corrupt files ( when the writting finish, the file name is change, so the file is never swap before the full writting )

  • Now the file Lock/Cache implements a clean_cache fn that ensure no memory leak along the server run.

Actual Changes

  • The system behind ChunkReader and ChunkWriter is now a ChunkIO who use ChunkSerializer.
  • ChunkIO is a trait that use a Serializer.
  • ChunkSerializer is a trait that helps to serialize some type of chunk data (like Chunks terrain or Chunks entitis)
  • Now the ChunkIO instead of a Result<Data,Err> it returns a LoadedData<Data,Error> wich helps with:
    • LoadedData::Error((position,Err)) : Manage errors per chunk loaded on the bulk.
    • LoadedData::Missing(position) : Manage missing chunks (not previusly generated) becaouse the return chunks order is not enforced.
    • LoadedData::Loaded(Data) : Manage normal returned chunks.
  • The fetch made by clients threads use par_chunk instead of iter, this helps to group chunks by the asked sorted way but grouped in 64 chunks of Chunks, it take advantage of the file cache and bulk API.
  • Now when cleaning chunks, we check if the chunks is not being watched prev to remove it (to improve the undesirable instant loading of removed chunks) but if we cant remove it, we still write it in the file, this way we make the chunk writing worth and avoid unsaved changes.
  • The File Lock/Cache is refactored and embedded into the ChunkFileManager. (the actual ChunkIO implementation for Anvil and Linear)

Mili and others added 30 commits January 24, 2025 21:56
…ment and use exception to know if file exist
@Snowiiii
Copy link
Collaborator

Snowiiii commented Mar 1, 2025

Also when flying at max speed (which you can set using Mod Menu). The Chunks are lagging behind, it seems like it first tries to load the old ones, It would make more sense to just cancel the old chunk loading "Job" and just load the new one

2025-03-01.16-20-58.mp4

@Snowiiii
Copy link
Collaborator

Snowiiii commented Mar 1, 2025

Also when quiting using CTRL+C it crashes

thread 'main' panicked at pumpkin-world/src/chunk/io/chunk_file_manager.rs:309:18:
                                                                                  We initialize the once cells immediately
                                                                                                                          note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@Mili-ssm
Copy link
Contributor Author

Mili-ssm commented Mar 1, 2025

@Snowiiii that's little hard because you don't know from the bunch of chunks:

  • which ones you want.
  • which ones you want to drop.

for that, we should fetch all of the chunks the player can watch but send only needed ones. Otherwise cancel a job could cancel chunks that will be in your field of view.
This means change a lot of logic in the client connection logic i think, also is not entirely bad, if you decide to go backwards, those chunk will be cached some fractions of a second.
Also with this PR, once you load the file and semi-parse it, retrieve chunks are very fast so cancel the job could be faulty and hurt the performance. (this is my overall opinion for now)

@Mili-ssm
Copy link
Contributor Author

Mili-ssm commented Mar 1, 2025

Also when quiting using CTRL+C it crashes

thread 'main' panicked at pumpkin-world/src/chunk/io/chunk_file_manager.rs:309:18:
                                                                                  We initialize the once cells immediately
                                                                                                                          note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

That issue is "known", i will push now a change for that.
This error happens because while you are closing the server some chunks were still loading. i will print warns about that, not crash.

@Snowiiii
Copy link
Collaborator

Snowiiii commented Mar 1, 2025

@Snowiiii i saw that is an error from parsing the chunks, when the chunk is not detected as Full (the chunk status != Full).

[ERROR] Failed to load chunk at Vector2 { x: 23, z: -19 }: Failed to parse Chunk from bytes: The chunk isn't generated yet (regenerating)

When i use this branch to generate a world i didn't see that error, neither creating a world with master and opening with the branch. Could this be an error that comes from using a pre-existent save or when loading many semi-generated chunks?

And yes I used a pre generated world

@Mili-ssm Mili-ssm requested a review from kralverde March 1, 2025 19:19
@Snowiiii
Copy link
Collaborator

Snowiiii commented Mar 1, 2025

Can someone may create a benchmark how long it takes to load/save a Chunk/Chunk group for master. So we can compare this branch with master

@Mili-ssm
Copy link
Contributor Author

Mili-ssm commented Mar 1, 2025

Can someone may create a benchmark how long it takes to load/save a Chunk/Chunk group for master. So we can compare this branch with master

master

image

Bulk Api

image

  • Disclaimer
    The branch master can't achieve 100% CPU usage (in my laptop was like a 25% CPU usage) while BulkAPI PR get 85-95% CPU usage.

@kralverde
Copy link
Contributor

More cpu usage is better (as long as it's not inefficient) because that means we are doing more work in parallel

@kralverde
Copy link
Contributor

kralverde commented Mar 2, 2025

You need to create a new Level from the root folder for each test, or else it will cache.

Also why are you sorting stuff?

Also you uploaded the test folder, maybe add it to git ignore?

@kralverde
Copy link
Contributor

kralverde commented Mar 2, 2025

Also you added the temp dir... read the comment right above it

@Mili-ssm
Copy link
Contributor Author

Mili-ssm commented Mar 2, 2025

You need to create a new Level from the root folder for each test, or else it will cache.

Also why are you sorting stuff?

Also you uploaded the test folder, maybe add it to git ignore?

This is because the return order from the fetch is random, so its possible if you only retrieve a part of all of generated chunks that the ones that we use in the benchmark are grouped (for example all of them being part of the same file).

About the cache thing that's the reason i use the level.clean_memory();, if not we need to test with more chunks or the benchmark will be more about level instantiation.

Any ways i will fix the thing about temps dirs and test folder thanks, actually this explain somethings i was seeing in my experiments <3 .

@kralverde
Copy link
Contributor

Clean memory only happens once if I'm reading it correctly. Also level initialization has no overhead

@Mili-ssm
Copy link
Contributor Author

Mili-ssm commented Mar 3, 2025

Clean memory only happens once if I'm reading it correctly. Also level initialization has no overhead

Sorry thats true, i missunderstand something in the previus code, the level initialization has almost 0 overhead, and also doing the clean_memory is not enough to avoid some tipe of caching.

@Mili-ssm
Copy link
Contributor Author

Mili-ssm commented Mar 3, 2025

These are the result with the updated/fixed benchmark (and running the equivalent one on master).

Master Bench

image

BulkApi Bench

image


Master Bench code
use std::{fs, num::NonZeroU8, path::PathBuf, sync::Arc};

use criterion::{BenchmarkId, Criterion, criterion_group, criterion_main};
use pumpkin_util::math::vector2::Vector2;
use pumpkin_world::{
    GlobalProtoNoiseRouter, GlobalRandomConfig, NOISE_ROUTER_ASTS, bench_create_and_populate_noise,
    chunk::ChunkData, cylindrical_chunk_iterator::Cylindrical, global_path, level::Level,
};
use tokio::sync::RwLock;

fn bench_populate_noise(c: &mut Criterion) {
    let seed = 0;
    let random_config = GlobalRandomConfig::new(seed);
    let base_router =
        GlobalProtoNoiseRouter::generate(NOISE_ROUTER_ASTS.overworld(), &random_config);

    c.bench_function("overworld noise", |b| {
        b.iter(|| bench_create_and_populate_noise(&base_router, &random_config));
    });
}

async fn test_reads(root_dir: PathBuf, positions: Vec<Vector2<i32>>) {
    let level = Level::from_root_folder(root_dir);
    let (send, mut recv) = tokio::sync::mpsc::channel(positions.len());

    let rt = tokio::runtime::Handle::current();
    level.fetch_chunks(&positions, send, &rt);

    while let Some(x) = recv.recv().await {
        // Don't compile me away!
        let _ = x;
    }
}

async fn test_writes(root_dir: PathBuf, chunks: Vec<(Vector2<i32>, Arc<RwLock<ChunkData>>)>) {
    let level = Level::from_root_folder(root_dir);
    for item in chunks {
        level.write_chunk(item).await;
    }
}

// Depends on config options from `./config`
fn bench_chunk_io(c: &mut Criterion) {
    // System temp dirs are in-memory, so we cant use temp_dir
    let root_dir = global_path!("./bench_root_tmp");
    let _ = fs::remove_dir_all(&root_dir); // delete if it exists
    fs::create_dir(&root_dir).unwrap(); // create the directory

    let async_handler = tokio::runtime::Builder::new_current_thread()
        .build()
        .unwrap();

    println!("Initializing data...");
    // Initial writes
    let mut chunks = Vec::new();
    let mut positions = Vec::new();

    async_handler.block_on(async {
        let (send, mut recv) = tokio::sync::mpsc::channel(10);

        // Our data dir is empty, so we're generating new chunks here
        let level = Level::from_root_folder(root_dir.clone());
        tokio::spawn(async move {
            let cylindrical = Cylindrical::new(Vector2::new(0, 0), NonZeroU8::new(32).unwrap());
            let chunk_positions = cylindrical.all_chunks_within();
            let rt = tokio::runtime::Handle::current();
            level.fetch_chunks(&chunk_positions, send, &rt);
            level.clean_chunks(&chunk_positions).await;
        });

        while let Some((chunk, _)) = recv.recv().await {
            let pos = chunk.read().await.position;
            chunks.push((pos, chunk));
            positions.push(pos);
        }
    });

    // Sort by distance from origin to ensure a fair selection
    // when using a subset of the total chunks for the benchmarks
    chunks.sort_unstable_by_key(|chunk| chunk.0.x * chunk.0.x + chunk.0.z * chunk.0.z);
    positions.sort_unstable_by_key(|pos| pos.x * pos.x + pos.z * pos.z);

    // These test worst case: no caching done by `Level`
    // testing with 16, 64, 256 chunks
    let mut write_group = c.benchmark_group("write_chunks");
    for n_chunks in [16, 64, 256] {
        let chunks = &chunks[..n_chunks];
        println!("Testing with {} chunks", n_chunks);
        write_group.bench_with_input(
            BenchmarkId::from_parameter(n_chunks),
            &chunks,
            |b, chunks| {
                b.to_async(&async_handler)
                    .iter(|| test_writes(root_dir.clone(), chunks.to_vec()))
            },
        );
    }
    write_group.finish();

    // These test worst case: no caching done by `Level`
    // testing with 16, 64, 256 chunks
    let mut read_group = c.benchmark_group("read_chunks");
    for n_chunks in [16, 64, 256] {
        println!("Testing with {} chunks", n_chunks);
        let positions = &positions[..n_chunks];
        read_group.bench_with_input(
            BenchmarkId::from_parameter(n_chunks),
            &positions,
            |b, positions| {
                b.to_async(&async_handler)
                    .iter(|| test_reads(root_dir.clone(), positions.to_vec()))
            },
        );
    }
    read_group.finish();

    fs::remove_dir_all(&root_dir).unwrap(); // cleanup
}

criterion_group!(benches, bench_populate_noise, bench_chunk_io);
criterion_main!(benches);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants