Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): implement DependencyGraph #4929

Merged
merged 4 commits into from
Jan 22, 2025
Merged

Conversation

arendjr
Copy link
Contributor

@arendjr arendjr commented Jan 20, 2025

Summary

Implements a major part of #3307. Through the DependencyGraph we can conveniently look up which file a given import statement or expression is pointing towards. Note this only works for JS/TS files (no CSS @import for now).

Based on an unmerged PR for oxc-resolver (oxc-project/oxc-resolver#360), so I'll wait with merging this until that one is merged. Update: Bumped oxc-resolver to 4.0.

I major part of getting this to work with oxc-resolver was connecting our types for package.json and tsconfig.json with the new traits used by oxc-resolver. For full compatibility with package.json this even required storing and querying arbitrary JSON values, so I've created the biome_json_value crate for this, exporting types such as JsonValue, JsonArray, JsonObject and JsonString. These can be created relatively cheaply from our CST representations, while JsonString also contains fully unescaped string values.

Caveats

  • The DependencyGraph is not yet available to lint rules, so we require one more step before making use of it.
  • We still have no method of updating the DependencyGraph in the LSP, so information is likely to get stale when files are created/moved/deleted.

Test Plan

Tests added, but I'm still iterating on this a bit: It's a bit of a hassle that symlinks don't work in the memory FS...

@arendjr arendjr requested review from a team January 20, 2025 16:22
@github-actions github-actions bot added A-CLI Area: CLI A-Core Area: core A-Project Area: project A-Linter Area: linter A-Parser Area: parser A-Formatter Area: formatter A-Tooling Area: internal tools A-LSP Area: language server protocol L-JavaScript Language: JavaScript and super languages L-CSS Language: CSS L-JSON Language: JSON and super languages A-Diagnostic Area: diagnostocis A-Changelog Area: changelog L-HTML Language: HTML L-Grit Language: GritQL labels Jan 20, 2025
@arendjr arendjr changed the base branch from main to next January 20, 2025 16:24
@github-actions github-actions bot removed A-Parser Area: parser A-Formatter Area: formatter A-Tooling Area: internal tools A-LSP Area: language server protocol L-CSS Language: CSS A-Diagnostic Area: diagnostocis A-Changelog Area: changelog L-HTML Language: HTML L-Grit Language: GritQL labels Jan 20, 2025
Boshen pushed a commit to oxc-project/oxc-resolver that referenced this pull request Jan 20, 2025
This introduces `PackageJson` and `TsConfig` traits. The old structs
have been renamed to `PackageJsonSerde` and TsConfigSerde` respectively.
Both are behind the `fs_cache` feature flag now. `serde` as a dependency
has become optional and is only needed for the `fs_cache` feature too.

For now, I have opted not to replace the `FileSystem::read_to_string()`
method with trait-specific versions yet, since this functionality is now
all encapsulated within the `FsCache`. Consumers that wish to use custom
implementation of the manifest traits most likely want to use a custom
cache altogether (I know this will be true for Biome at least), so I
didn't see much reason for additional complexity and breaking changes
there now.

~~I'm still working on a Biome PoC based on this, so I'll keep the PR on
Draft until I can verify everything works. Feedback is certainly already
welcome!~~ **Update:** See biomejs/biome#4929

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Copy link

codspeed-hq bot commented Jan 20, 2025

CodSpeed Performance Report

Merging #4929 will not alter performance

Comparing arendjr:dependency-graph (9a03e7e) with next (7dc32e6)

Summary

✅ 95 untouched benchmarks

Copy link
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left questions to better understand the direction of the PR

@github-actions github-actions bot added the A-Formatter Area: formatter label Jan 21, 2025
@arendjr
Copy link
Contributor Author

arendjr commented Jan 21, 2025

I've addressed most of the feedback, but I'm leaving improvements for other languages out of scope for now. I feel it's more productive now to iterate on getting JS support right, than to introduce a whole bunch of additional traits to make CSS support possible.

@arendjr arendjr requested a review from ematipico January 21, 2025 19:37
Comment on lines 386 to 387
if let Some(value) = value.into_json_value() {
result.raw_json.insert(key.into(), value.into());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need raw JSON here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s because oxc-resolver wants us to implement these getter functions that can pull fields out of the package.json using arbitrary paths.

I considered just limiting our support to a fixed set of paths, so we can pull it from pre-parsed static structures, but I tried a similar thing in the oxc-resolver repo when I introduced the traits, and it made the tests fail. I haven’t gone too deep into figuring out why, but I figured I just play it safe :)

@Conaclos
Copy link
Member

Oh finally a dependency graph 🍬 ❤️

For full compatibility with package.json this even required storing and querying arbitrary JSON values, so I've created the biome_json_value crate for this, exporting types such as JsonValue, JsonArray, JsonObject and JsonString. These can be created relatively cheaply from our CST representations, while JsonString also contains fully unescaped string values.

Could we directly create JsonValue, JsonArray, JsonObject and JsonString from DeserailizableValue (simply by implementing Deserializable on a sum type of these types)? This could avoid introducing the into_json_value() method on the DeseraiizableValue trait.

@arendjr
Copy link
Contributor Author

arendjr commented Jan 21, 2025

Could we directly create JsonValue, JsonArray, JsonObject and JsonString from DeserailizableValue (simply by implementing Deserializable on a sum type of these types)? This could avoid introducing the into_json_value() method on the DeseraiizableValue trait.

That’s what I originally tried to use, but I couldn’t get it to work. I think it was actually based on a misunderstanding of how these traits are supposed to work. JsonValue already is a sum type, and AnyJsonValue already is too. That’s why the into_json_value() method works and can simply return Some(self). But JsonValue already implements Deserializable, so I tried to use it as you suggest, but couldn’t get that to work with the deserialize method of PackageJson. Maybe I missed something?

Copy link
Member

@Conaclos Conaclos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use it as you suggest, but couldn’t get that to work with the deserialize method of PackageJson. Maybe I missed something?

I made a code suggestion.
Let me know if it works and If you have any question. biome_deserialize is not an easy crate. I am quite satisfied of the provided abstractions. This why shy I am picky 😅

I also left more suggestions for the PR.

Comment on lines 385 to 388
key => {
if let Some(value) = value.into_json_value() {
result.raw_json.insert(key.into(), value.into());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JsonValue already is a sum type
JsonValue already implements Deserializable, so I tried to use it as you suggest, but couldn’t get that to work with the deserialize method of PackageJson. Maybe I missed something?

So you already did the hard work 😅
I think you just need to write something like:

Suggested change
key => {
if let Some(value) = value.into_json_value() {
result.raw_json.insert(key.into(), value.into());
}
key_text => {
if let (Some(key), Some(value)) = (
JsonString::deserialize(ctx, &key, ""),
JsonValue::deserialize(ctx, &value, key_text),
) {
result.raw_json.insert(key, value);
}
}

or using type inference:

                key_text => {
                    if let (Some(key), Some(value)) = (
                        Deserializable::deserialize(ctx, &key, ""),
                        Deserializable::deserialize(ctx, &value, key_text),
                    ) {
                        result.raw_json.insert(key, value);
                    }
                }

I could also rename raw_json as rest (this will match the naming convention of the rest biome_deserialize_macros field attribute) or extra.

This should allow you to remove the DeserializableValue::into_json_value and also the dependency between biome_json_value crate and biome_json_syntax (assuming you also remove the From<AnyJsonValue> impl and related impls).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were totally right, thanks! I guess I just chased my tail a bit before fixing it 🤣

///
/// See the [module-level documentation](self) for more info.
#[derive(Clone, Debug, Default, Deserializable, PartialEq)]
pub struct JsonObject(IndexMap<JsonString, JsonValue, FxBuildHasher>);
Copy link
Member

@Conaclos Conaclos Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of IndexMap is certainly a good fit here in terms of data modeling.
However, looking at the usage in the impl of oxc_resolver::ImportsExportsMap, I think it could be better to just use Vec<(JsonString, JsonValue)> or even Box<[(JsonString, JsonValue)]>.
This avoids a dependency on the indexmap crate.
Eventually I think Biome should completely get rid of its dependency on indexmap.

Note that using Box<[(JsonString, JsonValue)]> will require some change in impl Deserializable for PackageJson because we will have to build a vec and after the loop to turn it into a boxed slice before assigning it to PackageJson::raw_json.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for getting rid of indexmap? I'm not strongly opinionated towards it, but I think we should at least have an alternative that provides equivalent performance for lookups, which a Vec wouldn't provide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is order preserving an important thing in our context? We could use FxHashMap instead.

What's the motivation for getting rid of indexmap?

We primarily used ordered set (indexmap::IndexSet) in our configuration system.
However, ordered set was not a good fit in the places it was used (It created erroneous configuration merging).
Removing this usage, this leads to the use of the indexmap only in some places.
And, most of the time, a Vec, a HashSet or HashMap is sufficient.

Now we only have a few places that use the indexmap crate.
This raises the question of Biome's dependency on this crate.
If we only use indexmap two or three times in the codebase, it might be worth removing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do rely on the order preservation here, yes. This is because the resolving of imports and exports in package.json iterates through the keys to find matching entries.

@arendjr arendjr merged commit 481ad24 into biomejs:next Jan 22, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-CLI Area: CLI A-Core Area: core A-Formatter Area: formatter A-Linter Area: linter A-Project Area: project L-JavaScript Language: JavaScript and super languages L-JSON Language: JSON and super languages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants