Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary Empty ListTag Type Persistence #40

Closed
Offroaders123 opened this issue Nov 28, 2023 · 5 comments
Closed

Binary Empty ListTag Type Persistence #40

Offroaders123 opened this issue Nov 28, 2023 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@Offroaders123
Copy link
Owner

While looking into Schematic, Litematica, and Structure Block files, I wanted to see how NBTify's current support for them ranges. As of diffing the outputs from NBTify compared to the original source data, the tests didn't pass for the Schematic demo, and it ended up being because some of the ListTag values were types with a specific item type, which is something NBTify doesn't carry over during serialization if the array is empty. So if the ListTag in the file is typed for StringTag values, and the ListTag for them is empty, NBTify just uses EndTag as the item type, meaning the output diffs are inaccurate. This is not incorrect, and it's still properly accessible in terms of the spec and what the game can load, but it's not as accurate as being a perfect 1:1 deserialization-reserialization process, which is something I heavily strive for here with NBTify. If the exact same file isn't output, then NBTify is missing something, as I want it to be completely compatible with anything thrown at it.

This hasn't been a problem until now, because all of the other NBT files I have seen and tested against, all use EndTag for the empty array types. So I mostly haven't uncovered this just because I haven't come across a file that's implemented this way. Now that I have an example of one, I think it's easier to figure out how I should implement this.

I think this can actually work nicely with a custom Symbol() property on the array, which allows this persistence to be optional, and maybe even handled automatically by NBTify.

The other option I only recently found to be plausible, is that using Proxy objects to define custom NBTify primitive wrapper objects to limit/validate what kinds of properties can be assigned to a given wrapper primitive implementation. I didn't actually know about how Proxy objects worked, were implemented, or used, so this is a big hope to look into, and it gives me more ideas of ways to validate property assignment to these primitives, before passing them off to the write/serialization process.

My take for the longest time, since moving to using primitives as closely as possible, is to leave the run-time validation up to the time it is used, rather than when it is built/assigned to. Since JS doesn't have type validation when assigning to plain objects, arrays, or things like that, I didn't think it would be worth constructing full-blown custom structure primitives (objects, arrays) just to add type validation support, since that can be handled safely down the line anyways. Now that Proxy objects are in the picture, it might actually be feasible to validate types of things as they are assigned!

These were some links I used to explore this, I don't know too much as to what the best practices are for using Proxies with classes, say if you wanted to use them for these primitive wrappers, or for Web Components, something like that. My idea for using them with Web Components would help with not needing to use get() and set() accessors for everything, and in addition an internal private #field to store that value behind those public validators.

https://www.javascripttutorial.net/es6/javascript-proxy/
https://javascript.info/proxy
https://stackoverflow.com/questions/47779762/proxy-a-webcomponents-constructor-that-extends-htmlelement
https://grrr.tech/posts/2023/typescript-proxy-objects/

Listening to Devin Townsend Infinity for the first time, let's see how it goes! I'm already really liking it.
Wow, I'm already excited to listen to this again :)

@Offroaders123 Offroaders123 added the bug Something isn't working label Nov 28, 2023
@Offroaders123 Offroaders123 self-assigned this Nov 28, 2023
Offroaders123 added a commit that referenced this issue Nov 28, 2023
Right now the tests aren't passing, and that's because of my current handling for ListTag item types, for empty lists. If a ListTag is empty, right now NBTify will just type the list with an EndTag as the item type. So to make things compile symmetrically, I need to find a way to persist the intended item type for empty ListTags.

https://www.reddit.com/r/technicalminecraft/comments/gf62x4/how_to_convert_litematic_file_to_schematic_file/
https://github.com/GoldenDelicios/Lite2Edit
https://minecraft.wiki/w/Schematic_file_format
https://github.com/SmylerMC/litemapy
https://github.com/maruohon/litematica
https://github.com/Lunatrius/Schematica
https://minecraft.wiki/w/Structure_file
https://learn.microsoft.com/en-us/minecraft/creator/documents/introductiontostructureblocks?view=minecraft-bedrock-stable
https://github.com/microsoft/minecraft-samples/blob/main/structure_blocks_sample_behavior_pack/README.md
https://www.youtube.com/watch?v=fQZOPAoxw9Q
https://www.planetminecraft.com/project/small-medieval-house-lobby/

Devin Townsend Infinity, I'm loving the sound of this one. The story behind it is very driving too.

#40
#41

I really need to work on an improved organizational setup for file type subtests, right now they are all grouped together into a single one, and it's hard to determine what is doing what now. Originally it wasn't so bad because I was only testing against binary NBT files, but now I have things like these world save types, SNBT, and things like that. Each one needs a little bit different of handling for how it should be tested.

Another important thing I noticed is that I tried to open a fairly large `.schem` file, and it crashed Dovetail, either because of the UI, or because JS can't handle it's decompressed size, or it's parsed NBT object size, something along those lines. So it afterall may be too hard on the JS implementation of mine to try to do heavy lifting work on world data if it's having trouble opening a Schematic file. I'm hoping I can manage it effectively, but I'm not totally sure. For this big project to work, I may *need* to move to another language for it to work. Up until now, I haven't had any issues with NBTify's power to parse and save things, but it is concerning to have run into them now. Part of the song I'm listening to sounds like it's saying 'don't give up' though, right as I'm writing this, so maybe I should keep trying anyways though :P
@Offroaders123
Copy link
Owner Author

Ok, almost got it working, but realized part of an issue I don't think I can fix without going heavier into things. The ListTag item types can't be persisted when saved as SNBT, I can't do anything about saving to and from that stage. My code for using the Symbol("nbt.list.type") optional property on arrays appears to possibly be working when only using the NBT in JS land, and saving back to the binary format, but the SNBT step doesn't account for this handling. This can likely be another test case to handle in #41, since this is specific to the handling of the binary format, and it can't be managed when also using SNBT. I think it's a fair trade-off to support it in JS-only land, and not deal with it if you convert the NBT to SNBT and back. Mojang's use of SNBT itself has no way of handling that kind of info, so I don't think my implementation of SNBT should have to harp over it either. It's already a minor detail for keeping track of item types, for empty arrays, so I don't think it's something that specifically needs to be handled too tightly. One should already be validating their use of NBT keys and value types if they want to ensure everything is ship-shape anyways, so I think using TypeScript to virtually manage that insurance, or your own class validator, I think it's realistic for that information to be handled by the serializer itself, like I'm currently doing.

@Offroaders123 Offroaders123 changed the title Binary ListTag Type Persistence Binary Empty ListTag Type Persistence Nov 28, 2023
Offroaders123 added a commit that referenced this issue Nov 28, 2023
Check out the message about my findings for this commit, in #40 down below, above when this was referenced in that feed when you click on it.

Note how the tests are still not passing, it's because SNBT has no way of tracking the types for empty ListTag entries.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol
https://stackoverflow.com/questions/31897015/what-is-global-symbol-registry
https://stackoverflow.com/questions/49619054/whats-the-purpose-of-registry-symbols-symbol-for-in-javascript-es6

#40
#41

And woah! Who knew the fonts for road signs are standardized! The main one is called Highway Gothic, and it's been around since the 50s. The new one was going to be Clearview, but looks like they reversed making that one the new standard. It's present across a lot of the southwest, I see it a lot when crossing the river into Arizona. I really like it! I hope it stays around.
Offroaders123 added a commit that referenced this issue Nov 28, 2023
This is a recompiled version of the Schematic file, I'm going to use this for the proper diffing of the file for the test. The only changes to the actual output having used it in NBTify is the gzip compression format layout, and the empty ListTag type instance. I'm going to manage that on a check on it's own, outside the scope of this file specifically.

Ants is a good song I think, it's my kind of jam for sure. I really like how many places this album goes. I'll have to check out Ocean Machine at some point too, he mentions that one a good deal, it seems to be one of his more widely known early projects.

#40
#41
Offroaders123 added a commit that referenced this issue Nov 28, 2023
I need to better account for compression differences with my tests, as right now it depends on which zlib compression implementation you write your NBT files with, will depend on whether the tests will pass or not. This should be excluded I think, I thought it would be perfect to ensure that the compressed versions are the exact same as well, but I don't think that's worthwhile if you can check the raw bytes themselves anyways.

Say, to make this file recompiled with the matching zlib layouts, and with the newly updated empty ListTag types ( #40 ), simply opening it with Dovetail and re-saving it didn't work, because it looks like Chrome's zlib backing is different than Node's, even though they both use V8, and they are both on the same macOS ARM machine. So I can rule those things out. I already knew this was the case between Linux and macOS, but I thought this wasn't different between Chrome and Node on the same machine for some reason. I guess I assumed that they would *happen* to be using the same compression implementations, even if they may be two separate installations of that. That would make sense, I didn't expect them to use the same install of it. Just the same or nearly same versions.

#40
#41
Offroaders123 added a commit that referenced this issue Nov 28, 2023
I need to better account for compression differences with my tests, as right now it depends on which zlib compression implementation you write your NBT files with, will depend on whether the tests will pass or not. This should be excluded I think, I thought it would be perfect to ensure that the compressed versions are the exact same as well, but I don't think that's worthwhile if you can check the raw bytes themselves anyways.

Say, to make this file recompiled with the matching zlib layouts, and with the newly updated empty ListTag types ( #40 ), simply opening it with Dovetail and re-saving it didn't work, because it looks like Chrome's zlib backing is different than Node's, even though they both use V8, and they are both on the same macOS ARM machine. So I can rule those things out. I already knew this was the case between Linux and macOS, but I thought this wasn't different between Chrome and Node on the same machine for some reason. I guess I assumed that they would *happen* to be using the same compression implementations, even if they may be two separate installations of that. That would make sense, I didn't expect them to use the same install of it. Just the same or nearly same versions.

Another weird thing I encountered. Since I discovered the zlib differences between Node and the browser, I thought I'd use my local global install of NBTify in the CLI to resave the file, using Node as well. That actually encountered an error, and it was only happening with the `--pipe` flag to write it back to the original file. I'm not sure what it was about, that was frustrating though. Going to look into debugging that one. #25
@Offroaders123
Copy link
Owner Author

In relation to Bedrock's Block Entity actor storage format, it has come up that the game may indeed validate the item type for empty List tag values, which as this issue covers, NBTify doesn't currently support.

@Offroaders123
Copy link
Owner Author

This was an actor file @JaylyDev sent me to help debug why the read-write process wasn't being symmetrical.

input.bin is the raw file from a Bedrock world's LevelDB database, and output.bin is what that file looks like when re-written to disk again using NBTify.

Since this uses some custom file formatting that isn't just a single plain NBT tag, you have to read it in chunks. The reader function is from my Bedrock-LevelDB project, and JaylyDev inspired my writer function here to write the NBT back to a single actor file again.

input.bin.gz
output.bin.gz

Reader / Writer Code
import { readFile } from "node:fs/promises";
import { read, write } from "nbtify";

import type { RootTagLike, NBTData, ReadOptions } from "nbtify";

const format = {
  name: "",
  endian: "little",
  compression: null,
  bedrockLevel: null,
  strict: true
} as const satisfies Required<ReadOptions>;

async function readNBTList<T extends RootTagLike>(data: Uint8Array): Promise<NBTData<T>[]> {
  const entries: NBTData<T>[] = [];

  while (true){
    if (data.byteLength === 0) break;
    try {
      const entry: NBTData<T> = await read<T>(data,format);
      entries.push(entry);
      break;
    } catch (error){
      const message: string = (error as Error).message ?? `${error}`;
      const length: number = parseInt(message.slice(46));
      const entry: NBTData<T> = await read<T>(data,{ ...format, strict: false });
      entries.push(entry);
      data = data.subarray(length);
    }
  }

  return entries;
}

async function writeNBTList<T extends RootTagLike>(entries: NBTData<T>[]): Promise<Uint8Array> {
  const results: Uint8Array[] = await Promise.all(entries.map(entry => write(entry,format)));
  const data = new Uint8Array(results.reduce<number>((previous,current) => previous + current.byteLength,0));
  let byteOffset = 0;

  for (const result of results){
    data.set(result,byteOffset);
    byteOffset += result.byteLength;
  }

  return data;
}

const old = new URL("./old.nbt",import.meta.url); // This is `input.bin`

const input = await readFile(old);
console.log(input.join(" "));

const nbtList = await readNBTList(input);
console.log(nbtList);

const output = Buffer.from(await writeNBTList(nbtList)); // This is `output.bin`
console.log(output.join(" "));

const reinput = await readNBTList(output);
console.log(reinput);

When loading the re-written actor data back to the world save, the Block Entity data appears to be parsed incorrectly, maybe because Bedrock validates the Items List tag type, even though the NBT spec says that it will accept empty List tags typed with an End tag as the value type.

Loading Invalid Block Entity Actor

This is a diff of the byte content between input.bin and output.bin. You can see that only the item types for each of the Chest's Items key are changed, everything is byte-symmetrical.

Block Entity Actor Diff

Block Entity Actor Content

Offroaders123 added a commit that referenced this issue Jan 2, 2024
This updates the empty list file test so I can properly add support for this handling.

#40
Offroaders123 added a commit that referenced this issue Jan 2, 2024
Yay, it's working!

Was going to use `Symbol.for()` to implement this, but I think I'm going to use NBTify's exports for it instead. Since you are likely going to import features from NBTify to read and write NBT files anyways, I think it makes more sense to import the Symbol for List tags from there if you needed to use it for something.

https://stackoverflow.com/questions/34587901/difference-between-symbol-and-symbol-for
https://nodejs.org/api/util.html#utilinspectcustom
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol

Notes on how this works for the library itself:

Going to use an optional Symbol() property to keep track of the type on the array

You don't have to define it manually yourself when writing NBT files from scratch, this is mostly meant just to ensure existing files like these can retain their existing type information
When reading files from NBT files, it will always attach this property to your `ListTag` arrays, and it will use them in the writing stage if they are present
I want them to be optional because I still want you to build your own NBT shapes using plain arrays, and not need to assign that property yourself every time

(This is from Discord DMs, so that's why this is a little weird of context)
Ooooh
I bet I can define that as a hidden property too, so it won't clog up things when logging it to the console
Of course when you manually add it yourself like that it will show up, but from the reader it could do that for you

(Back to normal commit discussion)

As mentioned in the commit "SNBT Empty ListTag Types", SNBT doesn't have a viable way to persist this information, so this is an unfortunate case where going back and forth from SNBT with empty list tags will lose the original item type. If you have any ideas on how this could be handled, that would be great! I think Bedrock's lack of SNBT support is more of an example/correlation as to why their parsing code requires specific List item types, because if they supported SNBT in commands and such, they would need to allow un-typed (End tag) empty List tags as well, since they would have the same problem there with SNBT, since the format itself just doesn't have a way to represent the item type for empty lists. I guess the difference between my use and the game's implementation though is that theirs is likely driven by runtime types, so it gets validated or checked there instead. I'm not sure. Still a bit funky that it may not support un-typed empty Lists, a bit peculiar indeed.

#40

rBnaodn "so, many, mistakes" (Me!!! Started working on releasing it just last week or so)
Marco Minnemann "My Sister" (Remembering how much I love this album, and just how detailed it is!)
Offroaders123 added a commit that referenced this issue Jan 2, 2024
I think this pretty much fixes the issues with #40!
@Offroaders123
Copy link
Owner Author

I think this is just about fixed! Going to wait to close this a little bit, just to test it out for a little while beforehand. Everything passes my tests, so I think we're good on that front for now at least.

@Offroaders123
Copy link
Owner Author

Here are a few screenshots to go along with the commits I linked above these two comments:

List Type Test Failed

Tag Type Symbol Addition

Array Symbol Property Example

This shows the difference between defining this new Symbol property as as regular key, compared to defining it as non-enumerable with Object.defineProperty().

List Type Test Passing

Offroaders123 added a commit that referenced this issue Jan 3, 2024
- Binary Empty ListTag Type Persistence #40
- Runtime `typeof` Validation, now type-validated in TS too! Essentially, my API type validation code is now type-checked as well, since I want to ensure the runtime checks properly match that of their TS types/shapes. This is done for function parameters, and things like that, where NBTify is designed to throw when passed in incorrect values or primitive types.
- Options object spreading / defining. This moves things away from destructuring values of API options objects in the function definition into the body itself, in favor of allowing the options to be passed in as a single object again in other places in the body, rather than having to build the same options object again later (restructuring? lol, hehe). This helps aid in not forgetting to add newly added config options to the restructuring call, because I had actually accidentally done this before, since TS won't catch it because the config objects allow config types to be optional. Now you would have to explicitly set it to `undefined` if you didn't want the value to be passed in from the parent options parameter. Here's a better demo for this crazy talk:

```ts
export interface Hi {
  nice: number;
  haha: number;
}

export interface HiOptions extends Partial<Hi> {}

export function hiLegacy({ nice, haha }: HiOptions = {}): Hi {
  if (nice === undefined){
    return hi({ nice: 10, haha });
  }

  nice satisfies Hi["nice"];

  if (haha === undefined){
    // forgot `nice`, no errors, eek!
    return hi({ haha: 25 });
  }

  haha satisfies Hi["haha"];

  return { nice, haha };
}

export function hi(options: HiOptions = {}): Hi {
  const { nice, haha } = options;

  if (nice === undefined){
    return hi({ ...options, nice: 10 });
  }

  nice satisfies Hi["nice"];

  if (haha === undefined){
    return hi({ ...options, haha: 25 });
  }

  haha satisfies Hi["haha"];

  return { nice, haha };
}
```

- RootName + Bedrock Level Primitive
- NBTReader/NBTWriter State Handling

Another thing tested during this update period was MUTF-8 support, but that's not quite solved out yet, so I'm going to hold it forward until another release. #42

Clouseaupolice...!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant