Implement Bulk Loading/Writing for Chunks #535

Mili-ssm · 2025-02-09T13:09:45Z

This PR implement the Bulk API for chunks loading/writing with some function changes that get leverage of the bulk operation (like reducing allocations, locks and more).

Also this includes some Anvil refactor and the creation of a Traits/API that can be re-used on entity storage system and to implements diferents Writers/Loaders (as DB cached, one that use in Cloud Storage or more optimizations).

Actual Improves

Don´t write a chunk on disk when loaded from disk.
Lock the Chunk Cache entry while generating a new chunk.
_{( if more threads wants that same chunks, they can wait without doing any work until the chunk is generated )}
Every time a bunch of chunks (of the same player) are deallocated, the file is write only 1 time.
The Files that have ongoing writes/reads operations are stored on memory (to avoid disk readings and file lockings).
We have a DashMap for File Cache to avoid writing collisions and don`t corrupt files. (not an Anvil issue at least usually but it was a problem with the Linear format that enforce an FULL FILE WRITE)
Chunk writing operation now are made on a path.tmp so now the writing operation cant corrupt files _{( when the writting finish, the file name is change, so the file is never swap before the full writting )}
Now the file Lock/Cache implements a clean_cache fn that ensure no memory leak along the server run.

Actual Changes

The system behind ChunkReader and ChunkWriter is now a ChunkIO who use ChunkSerializer.
ChunkIO is a trait that use a Serializer.
ChunkSerializer is a trait that helps to serialize some type of chunk data (like Chunks terrain or Chunks entitis)
Now the ChunkIO instead of a Result<Data,Err> it returns a LoadedData<Data,Error> wich helps with:
- LoadedData::Error((position,Err)) : Manage errors per chunk loaded on the bulk.
- LoadedData::Missing(position) : Manage missing chunks (not previusly generated) becaouse the return chunks order is not enforced.
- LoadedData::Loaded(Data) : Manage normal returned chunks.
The fetch made by clients threads use par_chunk instead of iter, this helps to group chunks by the asked sorted way but grouped in 64 chunks of Chunks, it take advantage of the file cache and bulk API.
Now when cleaning chunks, we check if the chunks is not being watched prev to remove it (to improve the undesirable instant loading of removed chunks) but if we cant remove it, we still write it in the file, this way we make the chunk writing worth and avoid unsaved changes.
The File Lock/Cache is refactored and embedded into the ChunkFileManager. (the actual ChunkIO implementation for Anvil and Linear)

…ations

…ment and use exception to know if file exist

…improved usability

… and adjust file truncation logic

…y save method

…variable nameing

… Level

… in LinearFile and tmp filenames on write

…e operations

…e chunks at once

… management logic

…oncurrency

… access

Snowiiii · 2025-03-01T15:27:04Z

Also when flying at max speed (which you can set using Mod Menu). The Chunks are lagging behind, it seems like it first tries to load the old ones, It would make more sense to just cancel the old chunk loading "Job" and just load the new one

2025-03-01.16-20-58.mp4

Snowiiii · 2025-03-01T15:32:53Z

Also when quiting using CTRL+C it crashes

thread 'main' panicked at pumpkin-world/src/chunk/io/chunk_file_manager.rs:309:18:
                                                                                  We initialize the once cells immediately
                                                                                                                          note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Mili-ssm · 2025-03-01T15:37:32Z

@Snowiiii that's little hard because you don't know from the bunch of chunks:

which ones you want.
which ones you want to drop.

for that, we should fetch all of the chunks the player can watch but send only needed ones. Otherwise cancel a job could cancel chunks that will be in your field of view.
This means change a lot of logic in the client connection logic i think, also is not entirely bad, if you decide to go backwards, those chunk will be cached some fractions of a second.
Also with this PR, once you load the file and semi-parse it, retrieve chunks are very fast so cancel the job could be faulty and hurt the performance. (this is my overall opinion for now)

Mili-ssm · 2025-03-01T15:39:22Z

Also when quiting using CTRL+C it crashes

thread 'main' panicked at pumpkin-world/src/chunk/io/chunk_file_manager.rs:309:18:
                                                                                  We initialize the once cells immediately
                                                                                                                          note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

That issue is "known", i will push now a change for that.
This error happens because while you are closing the server some chunks were still loading. i will print warns about that, not crash.

…ement during closure

pumpkin-world/src/chunk/format/anvil.rs

pumpkin-world/src/level.rs

Snowiiii · 2025-03-01T18:22:58Z

@Snowiiii i saw that is an error from parsing the chunks, when the chunk is not detected as Full (the chunk status != Full).

[ERROR] Failed to load chunk at Vector2 { x: 23, z: -19 }: Failed to parse Chunk from bytes: The chunk isn't generated yet (regenerating)

When i use this branch to generate a world i didn't see that error, neither creating a world with master and opening with the branch. Could this be an error that comes from using a pre-existent save or when loading many semi-generated chunks?

And yes I used a pre generated world

pumpkin-world/src/chunk/format/mod.rs

…mory leak in the level cache cleanup

Snowiiii · 2025-03-01T19:23:16Z

Can someone may create a benchmark how long it takes to load/save a Chunk/Chunk group for master. So we can compare this branch with master

…rion dependencies

Mili-ssm · 2025-03-01T22:23:21Z

Can someone may create a benchmark how long it takes to load/save a Chunk/Chunk group for master. So we can compare this branch with master

master

Bulk Api

Disclaimer
The branch master can't achieve 100% CPU usage (in my laptop was like a 25% CPU usage) while BulkAPI PR get 85-95% CPU usage.

kralverde · 2025-03-02T01:45:06Z

More cpu usage is better (as long as it's not inefficient) because that means we are doing more work in parallel

kralverde · 2025-03-02T23:11:08Z

You need to create a new Level from the root folder for each test, or else it will cache.

Also why are you sorting stuff?

Also you uploaded the test folder, maybe add it to git ignore?

kralverde · 2025-03-02T23:13:03Z

Also you added the temp dir... read the comment right above it

Mili-ssm · 2025-03-02T23:33:09Z

You need to create a new Level from the root folder for each test, or else it will cache.

Also why are you sorting stuff?

Also you uploaded the test folder, maybe add it to git ignore?

This is because the return order from the fetch is random, so its possible if you only retrieve a part of all of generated chunks that the ones that we use in the benchmark are grouped (for example all of them being part of the same file).

About the cache thing that's the reason i use the level.clean_memory();, if not we need to test with more chunks or the benchmark will be more about level instantiation.

Any ways i will fix the thing about temps dirs and test folder thanks, actually this explain somethings i was seeing in my experiments <3 .

kralverde · 2025-03-02T23:58:14Z

Clean memory only happens once if I'm reading it correctly. Also level initialization has no overhead

…p dirs

Mili-ssm · 2025-03-03T01:00:48Z

Clean memory only happens once if I'm reading it correctly. Also level initialization has no overhead

Sorry thats true, i missunderstand something in the previus code, the level initialization has almost 0 overhead, and also doing the clean_memory is not enough to avoid some tipe of caching.

…ion avoiding caching

Mili-ssm · 2025-03-03T01:25:16Z

These are the result with the updated/fixed benchmark (and running the equivalent one on master).

Master Bench

BulkApi Bench

Master Bench code

use std::{fs, num::NonZeroU8, path::PathBuf, sync::Arc};

use criterion::{BenchmarkId, Criterion, criterion_group, criterion_main};
use pumpkin_util::math::vector2::Vector2;
use pumpkin_world::{
    GlobalProtoNoiseRouter, GlobalRandomConfig, NOISE_ROUTER_ASTS, bench_create_and_populate_noise,
    chunk::ChunkData, cylindrical_chunk_iterator::Cylindrical, global_path, level::Level,
};
use tokio::sync::RwLock;

fn bench_populate_noise(c: &mut Criterion) {
    let seed = 0;
    let random_config = GlobalRandomConfig::new(seed);
    let base_router =
        GlobalProtoNoiseRouter::generate(NOISE_ROUTER_ASTS.overworld(), &random_config);

    c.bench_function("overworld noise", |b| {
        b.iter(|| bench_create_and_populate_noise(&base_router, &random_config));
    });
}

async fn test_reads(root_dir: PathBuf, positions: Vec<Vector2<i32>>) {
    let level = Level::from_root_folder(root_dir);
    let (send, mut recv) = tokio::sync::mpsc::channel(positions.len());

    let rt = tokio::runtime::Handle::current();
    level.fetch_chunks(&positions, send, &rt);

    while let Some(x) = recv.recv().await {
        // Don't compile me away!
        let _ = x;
    }
}

async fn test_writes(root_dir: PathBuf, chunks: Vec<(Vector2<i32>, Arc<RwLock<ChunkData>>)>) {
    let level = Level::from_root_folder(root_dir);
    for item in chunks {
        level.write_chunk(item).await;
    }
}

// Depends on config options from `./config`
fn bench_chunk_io(c: &mut Criterion) {
    // System temp dirs are in-memory, so we cant use temp_dir
    let root_dir = global_path!("./bench_root_tmp");
    let _ = fs::remove_dir_all(&root_dir); // delete if it exists
    fs::create_dir(&root_dir).unwrap(); // create the directory

    let async_handler = tokio::runtime::Builder::new_current_thread()
        .build()
        .unwrap();

    println!("Initializing data...");
    // Initial writes
    let mut chunks = Vec::new();
    let mut positions = Vec::new();

    async_handler.block_on(async {
        let (send, mut recv) = tokio::sync::mpsc::channel(10);

        // Our data dir is empty, so we're generating new chunks here
        let level = Level::from_root_folder(root_dir.clone());
        tokio::spawn(async move {
            let cylindrical = Cylindrical::new(Vector2::new(0, 0), NonZeroU8::new(32).unwrap());
            let chunk_positions = cylindrical.all_chunks_within();
            let rt = tokio::runtime::Handle::current();
            level.fetch_chunks(&chunk_positions, send, &rt);
            level.clean_chunks(&chunk_positions).await;
        });

        while let Some((chunk, _)) = recv.recv().await {
            let pos = chunk.read().await.position;
            chunks.push((pos, chunk));
            positions.push(pos);
        }
    });

    // Sort by distance from origin to ensure a fair selection
    // when using a subset of the total chunks for the benchmarks
    chunks.sort_unstable_by_key(|chunk| chunk.0.x * chunk.0.x + chunk.0.z * chunk.0.z);
    positions.sort_unstable_by_key(|pos| pos.x * pos.x + pos.z * pos.z);

    // These test worst case: no caching done by `Level`
    // testing with 16, 64, 256 chunks
    let mut write_group = c.benchmark_group("write_chunks");
    for n_chunks in [16, 64, 256] {
        let chunks = &chunks[..n_chunks];
        println!("Testing with {} chunks", n_chunks);
        write_group.bench_with_input(
            BenchmarkId::from_parameter(n_chunks),
            &chunks,
            |b, chunks| {
                b.to_async(&async_handler)
                    .iter(|| test_writes(root_dir.clone(), chunks.to_vec()))
            },
        );
    }
    write_group.finish();

    // These test worst case: no caching done by `Level`
    // testing with 16, 64, 256 chunks
    let mut read_group = c.benchmark_group("read_chunks");
    for n_chunks in [16, 64, 256] {
        println!("Testing with {} chunks", n_chunks);
        let positions = &positions[..n_chunks];
        read_group.bench_with_input(
            BenchmarkId::from_parameter(n_chunks),
            &positions,
            |b, positions| {
                b.to_async(&async_handler)
                    .iter(|| test_reads(root_dir.clone(), positions.to_vec()))
            },
        );
    }
    read_group.finish();

    fs::remove_dir_all(&root_dir).unwrap(); // cleanup
}

criterion_group!(benches, bench_populate_noise, bench_chunk_io);
criterion_main!(benches);

Mili and others added 30 commits January 24, 2025 21:56

feat: add linear chunk format with zstd lib dependency

d5f3797

fixing a typo

65f1904

Merge remote-tracking branch 'upstream/master' into linear_chunk_format

9517837

refactor: rename file handling methods for clarity and more lint fixes

a02003f

refactor: improve clarity on validation methods

9f85aa6

fix: invert version check in LinearFile header validation

28e1222

fix: remove leading dot from file path formatting in LinearChunkFormat

c9f6a7e

refactor: improve error handling and method naming in LinearFile oper…

a1cbe33

…ations

fix: ensure file is flushed after writing in LinearFile operations

f2d51fd

fix: lints and warnings

2f8c589

fix: more lints

63a73a9

refactor: streamline LinearFile structure and improve header management

8f8ef47

refactor: change chunks_headers to use Box for improved memory manage…

21c46e4

…ment and use exception to know if file exist

refactor: update path parameter types to PathBuf for consistency and …

9308469

…improved usability

refactor: update path parameter types to Path avoiding uknown changes…

48a2883

… and adjust file truncation logic

fix: avoid using the same path for different tests

7563a81

fix: fix some bytes sizes for the header based on the spec

34604ea

refactor: enable file truncation when opening for writing and simplif…

c14ee64

…y save method

fix: removing IoSlices for windows issues.

f5405b2

fix: clippy lints

0ca59aa

refactor: improve incompatible files cheking, some documentation and …

fc543cf

…variable nameing

feat: add file_format to ChunkConfig and implement format handling in…

3612b71

… Level

Merge branch 'master' into linear_chunk_format

9340bbc

feat: add error logging for invalid file headers and data size checks…

3404700

… in LinearFile and tmp filenames on write

feat: implement file locking mechanism for concurrent chunk read/writ…

1953773

…e operations

fix: bug at desserializing the chunks headers.

c89bbc7

feat: (Bulk API) refactor chunk reading and writing to handle multipl…

5c8f799

…e chunks at once

refactor: streamline chunk reading and writing methods, improve chunk…

293886e

… management logic

feat: enhance chunk reading and writing to support bulk and improve c…

7c89480

…oncurrency

feat: add file locking during chunk writing to ensure safe concurrent…

f469867

… access

Mili and others added 2 commits March 1, 2025 15:50

fix: improve logging for file operations and ensure proper lock manag…

94ddca9

…ement during closure

fix extra log

6938b87

kralverde requested changes Mar 1, 2025

View reviewed changes

pumpkin-world/src/chunk/format/anvil.rs Outdated Show resolved Hide resolved

pumpkin-world/src/level.rs Outdated Show resolved Hide resolved

Snowiiii reviewed Mar 1, 2025

View reviewed changes

pumpkin-world/src/chunk/format/mod.rs Outdated Show resolved Hide resolved

Mili added 2 commits March 1, 2025 18:50

fix: chunk generation threading issue (not using rayon spawn) and a e…

aee0d8f

…mory leak in the level cache cleanup

fix: handle expected chunk reading errors without logging as failures

ad87181

Mili-ssm requested a review from kralverde March 1, 2025 19:19

Mili and others added 2 commits March 1, 2025 21:39

feat: add intial benchmark for chunk I/O performance and update crite…

9e510d9

…rion dependencies

remove panic for bad chunk return

46aab7e

refactor: comment out benchmark code for test without BulkAPI PR

d5ab07c

Mili-ssm and others added 2 commits March 2, 2025 19:04

Merge branch 'master' into chunk_bulk_api_v2

9857b1d

Refactor benchmarking code and remove unused chunk_io benchmark

3bc764d

fix: remove dummy benchmark file and using paths/files instead of tem…

bafa340

…p dirs

refactor: update benchmark to re-instanciate the level on each iterat…

ef2dde1

…ion avoiding caching

Mili added 2 commits March 3, 2025 01:29

fix: clippy lints and add comments

76722f6

fix: correct chunk count logging in benchmarking tests

8d34b17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Bulk Loading/Writing for Chunks #535

Implement Bulk Loading/Writing for Chunks #535

Mili-ssm commented Feb 9, 2025 •

edited

Loading

Snowiiii commented Mar 1, 2025

Snowiiii commented Mar 1, 2025

Mili-ssm commented Mar 1, 2025

Mili-ssm commented Mar 1, 2025

Snowiiii commented Mar 1, 2025

Snowiiii commented Mar 1, 2025

Mili-ssm commented Mar 1, 2025 •

edited

Loading

kralverde commented Mar 2, 2025

kralverde commented Mar 2, 2025 •

edited

Loading

kralverde commented Mar 2, 2025 •

edited

Loading

Mili-ssm commented Mar 2, 2025

kralverde commented Mar 2, 2025

Mili-ssm commented Mar 3, 2025 •

edited

Loading

Mili-ssm commented Mar 3, 2025 •

edited

Loading

Implement Bulk Loading/Writing for Chunks #535

Are you sure you want to change the base?

Implement Bulk Loading/Writing for Chunks #535

Conversation

Mili-ssm commented Feb 9, 2025 • edited Loading

Actual Improves

Actual Changes

Snowiiii commented Mar 1, 2025

Snowiiii commented Mar 1, 2025

Mili-ssm commented Mar 1, 2025

Mili-ssm commented Mar 1, 2025

Snowiiii commented Mar 1, 2025

Snowiiii commented Mar 1, 2025

Mili-ssm commented Mar 1, 2025 • edited Loading

master

Bulk Api

kralverde commented Mar 2, 2025

kralverde commented Mar 2, 2025 • edited Loading

kralverde commented Mar 2, 2025 • edited Loading

Mili-ssm commented Mar 2, 2025

kralverde commented Mar 2, 2025

Mili-ssm commented Mar 3, 2025 • edited Loading

Mili-ssm commented Mar 3, 2025 • edited Loading

Mili-ssm commented Feb 9, 2025 •

edited

Loading

Mili-ssm commented Mar 1, 2025 •

edited

Loading

kralverde commented Mar 2, 2025 •

edited

Loading

kralverde commented Mar 2, 2025 •

edited

Loading

Mili-ssm commented Mar 3, 2025 •

edited

Loading

Mili-ssm commented Mar 3, 2025 •

edited

Loading