Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next gen api formula json v3 #16541

Merged

Conversation

apainintheneck
Copy link
Contributor

@apainintheneck apainintheneck commented Jan 27, 2024

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

This is my third attempt at this PR. The goal here is to add a new JSON representation for formulae that is more compact so that there is less data that has to go over the network and less data that has to be loaded at runtime.

The thing is that we wanted to be more explicit about which fields should be included or not. This means that we shouldn't base the JSON representation on the internal #to_hash method because that can be changed independently.

As a result of these changes, we expect the brew generate-formula-api command to be significantly slower (probably 2x slower) than before because we're not able to memoize as much information as we did in #16460. I'll explore if there are any easy wins in this area.

Update: There were none. It's mostly just slow Formula loading which is unsurprising.

Code Changes:

  • Add #to_api_hash method
  • Modify #to_hash_with_variations to work with both hash methods
  • Modify #bottle_hash to have an optional compact version
  • Add #urls_hash method to share urls serialization logic
  • Add #serialized_requirements to share serialization logic
  • Add #dependencies_hash to share serialization logic

API Changes:
I've tested the current API hash generation for backwards compatibility and the only difference now is that the optional head_dependencies key is right after the other dependency keys instead of at the end each resulting hash after variations. Not sure if we care about that but it should be functionally equivalent.

Related links

@@ -69,11 +77,16 @@ def generate_formula_api
File.write("api/formula/#{name}.json", FORMULA_JSON_TEMPLATE)
File.write("formula/#{name}.html", html_template_name)
end

homebrew_core_tap_hash["formulae"][formula.name] =
formula.to_hash_with_variations(hash_method: :to_api_hash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
formula.to_hash_with_variations(hash_method: :to_api_hash)
formula.to_hash_with_variations(hash_method: :to_api_internal_v3_hash)

or similar? may also be worth renaming the existing one, too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt we'll run into a situation where we have more than 2 hash types we're generating with this method and if we do run into that problem, we can change the names then.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far!

Copy link
Member

@Bo98 Bo98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependencies hash might be worth iterating on to combine e.g. uses_from_macos_bounds but I think that should be separate PR as it may need testing whether it makes performance/size better or worse

"tag" => stable_spec.specs[:tag],
"revision" => stable_spec.specs[:revision],
"using" => (stable_spec.using if stable_spec.using.is_a?(Symbol)),
"checksum" => stable_spec.checksum&.to_s,
Copy link
Member

@Bo98 Bo98 Jan 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a nit but this is the only checksum in the API that doesn't explicitly say sha256 and I feel this should, either by renaming this key or by using the ruby_source_checksum hash format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed with renaming here (and avoiding the stable nesting as mentioned above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How important is this name change really? I don't think it's worth complicating the logic especially since it reflects the internal name of the method.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, more comments, messed with the PR review the first time.

"urls" => urls_hash.transform_values(&:compact),
"post_install_defined" => post_install_defined?,
"ruby_source_path" => ruby_source_path,
"ruby_source_checksum" => { "sha256" => ruby_source_checksum&.hexdigest },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"ruby_source_checksum" => { "sha256" => ruby_source_checksum&.hexdigest },
"ruby_source_ sha256" => ruby_source_checksum&.hexdigest,

general principle: remove most nesting where there's only a single value like this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was added by @Bo98 on purpose to potentially allow for different hashing algorithms in the future. Not sure if we still need that flexibility though.

Copy link
Member

@Bo98 Bo98 Jan 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to design things in a generic fashion as it makes things so much easier down the line with hopefully little upfront cost (though obviously is a bad idea if it increases complexity greatly). Like how we have a generic OS that realistically isn't actually used, but makes things much easier down the line should we need it.

With that said consistency should be the first priority here so we should probably consistently do one way or consistently the other way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to design things in a generic fashion as it makes things so much easier down the line

Disagree, particularly in this case of an internal API we're building to be bumpable in future. YAGNI.

Like how we have a generic OS that realistically isn't actually used

Somewhat off-topic: it is used, although perhaps confusingly. It exists to ensure that neither Linux nor macOS platform-specific code accidentally creep in elsewhere. It's a fairly standard pattern in cross-platform applications. It will make things easier if we ever had a Windows port (not unlikely), WSL optimisations (likely) or FreeBSD port (unlikely) in future and has caught a few bugs in the past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we move to a new hashing algorithm it wouldn't be hard to add a new key but since that's not something we plan on doing anytime soon I'll update it to match @MikeMcQuaid's suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 3a70b6a

Comment on lines +2306 to +2314
dep_hash = dependencies_hash
.except("recommended_dependencies", "optional_dependencies")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth this being an argument to dependencies_hash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll hold off on making more changes here because of @Bo98's proposal to serialize dependencies differently.

Library/Homebrew/formula.rb Outdated Show resolved Hide resolved
Comment on lines +2336 to +2347
api_hash["conflicts_with"] = conflicts.map(&:name)
api_hash["conflicts_with_reasons"] = conflicts.map(&:reason)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe overkill but high be nice to have these as an array of name:, reason:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to add more keys to the resulting JSON blob.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apainintheneck can you explain the motivation? Not disagreeing necessarily, just curious!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not a big deal in this case but increased nesting will probably increase the number of keys which in turn will increase the number of strings that need to get allocated when the JSON gets loaded into memory. This field is most likely not set for the majority of formulae so it probably doesn't matter much either way though. I just think it's easy to get excited about making a new format and forget about the original goal which is to reduce the size of the JSON blob.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apainintheneck That's fair. I think I'm either in team "let's nest everything that makes sense" or "let's remove all nesting" but not really a middle ground.

Copy link
Member

@Bo98 Bo98 Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion on it is grouping per Ruby DSL makes most sense as that aligns with how the reading side works.

Makes the conditionals also more obvious rather than picking one key to be the primary check for multiple

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the disable, deprecation and conflicts with code is super complex in Formulary with the current format. There's also no obvious benefit to changing it.

if (deprecation_date = json_formula["deprecation_date"].presence)
reason = DeprecateDisable.to_reason_string_or_symbol json_formula["deprecation_reason"], type: :formula
deprecate! date: deprecation_date, because: reason
end
if (disable_date = json_formula["disable_date"].presence)
reason = DeprecateDisable.to_reason_string_or_symbol json_formula["disable_reason"], type: :formula
disable! date: disable_date, because: reason
end
json_formula["conflicts_with"]&.each_with_index do |conflict, index|
conflicts_with conflict, because: json_formula.dig("conflicts_with_reasons", index)
end

For more complex DSLs like the bottle, url and dependency ones, the code in Formulary gets quite complex and so the gains from modeling the JSON after the DSL are greater in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before: let's either remove nesting or add more. The state of the existing formula JSON API should be completely ignored: it's irrelevant whether it nests or does not nest things and we should decide what's best based on the best output rather than the simpler implementation or hewing closer to the existing version.

Comment on lines 2341 to 2342
api_hash["deprecation_date"] = deprecation_date
api_hash["deprecation_reason"] = deprecation_reason # this might need to be checked too
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here perhaps: anything where we've got field_foo, field_bar smells like it might be nicer as field: { foo:, bar: }

Library/Homebrew/formula.rb Outdated Show resolved Hide resolved
Library/Homebrew/formula.rb Outdated Show resolved Hide resolved
"tag" => stable_spec.specs[:tag],
"revision" => stable_spec.specs[:revision],
"using" => (stable_spec.using if stable_spec.using.is_a?(Symbol)),
"checksum" => stable_spec.checksum&.to_s,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed with renaming here (and avoiding the stable nesting as mentioned above)

Comment on lines +2516 to +2547
dep_hash["build_dependencies"] = spec_deps.select(&:build?)
.reject(&:uses_from_macos?)
.map(&:name)
.uniq
dep_hash["dependencies"] = spec_deps.reject(&:optional?)
.reject(&:recommended?)
.reject(&:build?)
.reject(&:test?)
.reject(&:uses_from_macos?)
.map(&:name)
.uniq
dep_hash["test_dependencies"] = spec_deps.select(&:test?)
.reject(&:uses_from_macos?)
.map(&:name)
.uniq
dep_hash["recommended_dependencies"] = spec_deps.select(&:recommended?)
.reject(&:uses_from_macos?)
.map(&:name)
.uniq
dep_hash["optional_dependencies"] = spec_deps.select(&:optional?)
.reject(&:uses_from_macos?)
.map(&:name)
.uniq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bo98 you've historically had opinions about these: can you share here? Something like having the tags for the dependencies rather than duplicating.

Copy link
Member

@Bo98 Bo98 Jan 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earliest I can get something together is Wednesday.

Basic idea I hoped for is to use what we do for Requirements and have the same for dependencies, e.g.

"dependencies": [
  {
    "name": "a",
    "tags": ["build", "test"]
  }
]

Motivation being it will reduce the logic massively on both the generation and reading side as we don't need to do a mess of select, reject etc and reading and combining several different arrays. uses_from_macos and uses_from_macos_bounds would also fold into this. The code has always looked horrible to me.

What effect it makes sense on the JSON size is unclear, so I'm ok with this being a second PR if people are concerned and want an easier before/after comparison.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible to see the difference either way and I'm happy with waiting. It might be easier to review in a separate PR though.

I wonder if this format would make sense too.

"dependencies": {
  "a": ["build", "test"]
}

The current file size is around 10MB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also wouldn't need to share that logic between both hash functions anymore.

Copy link
Member

@Bo98 Bo98 Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this format would make sense too.

Perhaps - it looks like a decent option. Depends if we want to combine head_dependencies and uses_from_macos(_bounds) into it or not, e.g.

"dependencies": [
  {
    "name": "a",
    "tags": ["build", "test"],
    "specs": ["stable", "head"],
    "uses_from_macos": { "since": "catalina" }
  }
]

Internally in brew all dependencies (including uses_from_macos) exist in the same dependency list (hence the logic cleanup if we match that for tags at least), though that dependency list is separated between specs. Specs combining and separation would therefore require a little bit of logic, but doing so could significantly reduce the overhead of new format in the JSON itself. So there's some trade offs here and there to consider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earliest I can get something together is Wednesday.

No rush here 👍🏻

Motivation being it will reduce the logic massively on both the generation and reading side as we don't need to do a mess of select, reject etc and reading and combining several different arrays. uses_from_macos and uses_from_macos_bounds would also fold into this. The code has always looked horrible to me.

This makes a lot of sense to me 👍🏻

Specs combining and separation would therefore require a little bit of logic, but doing so could significantly reduce the overhead of new format in the JSON itself.

Seems nice if the reading logic doesn't blow up too much as a result 👍🏻

@apainintheneck
Copy link
Contributor Author

I'm seeing a lot of suggestions here to change the formula hash structure by reducing nesting and renaming keys. This makes sense in the abstract but will require more compatibility code or an entirely new method for loading formulae from the v3 API. As long as we're okay with that, I'm fine with making those changes.

@MikeMcQuaid
Copy link
Member

I'm seeing a lot of suggestions here to change the formula hash structure by reducing nesting and renaming keys. This makes sense in the abstract but will require more compatibility code or an entirely new method for loading formulae from the v3 API. As long as we're okay with that, I'm fine with making those changes.

@apainintheneck Thanks for the explicit mention. Yes, I'm fine with that. I'd rather we get the JSON looking 😘 here and have the reading/writing a little more 🤮 for a while.

I expect whatever internal API v3 looks like: it'll end up informing the external API v3, too.

@apainintheneck
Copy link
Contributor Author

Okay, I addressed some of the comments about reducing nesting with the stable version and bottle along with the Ruby source checksum. Beyond that I renamed the service serialization names.

I'm hesitant to add more nesting for the deprecation, disable and conflicts with fields. IMO, readability shouldn't be a big priority when it comes to an internal data representation. We should prioritize size and simplicity.

@MikeMcQuaid
Copy link
Member

IMO, readability shouldn't be a big priority when it comes to an internal data representation. We should prioritize size and simplicity.

Agreed 👍🏻

Comment on lines 2228 to 2232
"versions" => {
"stable" => stable&.version&.to_s,
"head" => head&.version&.to_s,
"bottle" => bottle_defined?,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"versions" => {
"stable" => stable&.version&.to_s,
"head" => head&.version&.to_s,
"bottle" => bottle_defined?,
},
"stable_version" => stable&.version&.to_s,
"head_version" => head&.version&.to_s,
"bottle_defined" => bottle_defined?,

to kill some nesting

Copy link
Member

@Bo98 Bo98 Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

head_version and bottle_defined seems unnecessary to have at all.

IMO we should have a stable hash that has url version sha256 together as that aligns with how the DSL works:

stable do
  url "..."
  version ".."
  sha256 "..."
end
"stable": {
  "url": "...",
  "version": "...",
  "sha256": "..."
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

head_version and bottle_defined seems unnecessary to have at all.

I think I said earlier but just to restate: we should not have any keys included here that are not used/needed at all in the Formulary API loader.

IMO we should have a stable hash that has url version sha256 together as that aligns with how the DSL works:

I agree (I think?) with @apainintheneck that for the internal API: I think it's nicer to have this be either nicely nested everywhere or entirely flat everywhere rather than half and half. I am assuming that entirely flat is "more efficient" to download and/or read into memory; if this is not the case (or the opposite is the case): let's nest everything.

I also think the public API warrants a (separate, later) debate about human readability vs. programmatic parsing vs. size.

My $0.02 as a consumer of the Homebrew API many times over across a bunch of languages at this point: JSON nesting always makes my life harder rather than easier for a very, very small "this looks nice".

Copy link
Member

@Bo98 Bo98 Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My $0.02 as a consumer of the Homebrew API many times over across a bunch of languages at this point: JSON nesting always makes my life harder rather than easier for a very, very small "this looks nice".

That's because it was backwards a lot of the time

I am assuming that entirely flat is "more efficient" to download and/or read into memory

In theory yes, but we're getting into territory where it's so marginal that it might not even be measurable. How this all affects gzip compression done by GitHub Pages is another factor. The frozen string change I made should also in theory reduce the effect of duplicated strings.

We're probably at the point where we probably just need to go with what we've got as a baseline and try iterative PRs to see what actually makes a difference. Ultimately, benchmarks is the reason we started this PR in the first place and we've not seen any benchmarks on any of the changes so far yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because it was backwards a lot of the time

Sorry, to clarify: I'd also extend this to "literally any nesting at any time".

We're probably at the point where we probably just need to go with what we've got as a baseline and try iterative PRs to see what actually makes a difference. Ultimately, benchmarks is the reason we started this PR in the first place and we've not seen any benchmarks on any of the changes so far yet.

That works for me, just wanted to make sure folks know I may not want to roll this out fully until I've gone a final pass of the JSON like this 👍🏻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the current external API which is out of the scope for this PR. If we want to change it, we will have to go through the whole deprecation cycle because it will be a breaking change.

The only change to the output of the #to_hash method will be keys changing order which shouldn't matter when parsing JSON. This change is already mentioned in the PR description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming that entirely flat is "more efficient" to download and/or read into memory

In theory yes, but we're getting into territory where it's so marginal that it might not even be measurable. How this all affects gzip compression done by GitHub Pages is another factor. The frozen string change I made should also in theory reduce the effect of duplicated strings.

We're probably at the point where we probably just need to go with what we've got as a baseline and try iterative PRs to see what actually makes a difference. Ultimately, benchmarks is the reason we started this PR in the first place and we've not seen any benchmarks on any of the changes so far yet.

100% agree with this. As long as we don't change the old format in this PR and the brew generate-formula-api command is not too slow (those are benchmarks I need to add to this PR) I think we're good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the current external API which is out of the scope for this PR. If we want to change it, we will have to go through the whole deprecation cycle because it will be a breaking change.

@apainintheneck I agree we should not change the existing external API. It is/was not obvious from this diff where the external API is being changed and when the new one is changed. As I mentioned before: I don't think we should be blocked on or relying on formatting from the existing API when creating this one. If it seems like I'm suggesting breaking the existing API: I am not, I am suggesting the change in the new one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output of Formula#to_api_hash reflects the new internal API format, The Formula#to_hash output needs to remain backwards compatible since it's used to generate the current API and the output for brew info --json=. All changes to Formula#to_hash in the diff are related to sharing logic with Formula#to_api_hash for things that shouldn't differ too much between them. We could potentially separate them further but that would mean more code duplication.

The suggestion in this thread would change the format of the Formula#to_hash method which is off limits here. Let me know if there is anything else that is unclear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for explaining ❤️.

I think there may be a benefit in separating them more but that doesn't need to block this PR.

Library/Homebrew/formula.rb Outdated Show resolved Hide resolved
Library/Homebrew/formula.rb Outdated Show resolved Hide resolved
@Bo98
Copy link
Member

Bo98 commented Jan 31, 2024

IMO, readability shouldn't be a big priority when it comes to an internal data representation. We should prioritize size and simplicity.

It can affect the readability of the code on the Formulary side.

@apainintheneck
Copy link
Contributor Author

I agree that this will not necessarily be the final version of the internal API. I also see @Bo98's point that nesting can be helpful at times when we want to group things similarly to how they are grouped in the Formula DSL. In all likelihood, those changes will not drastically change the size of the JSON or the loading speed.

It's also true that changing the JSON format may allow us to simplify the serialization/deserialization logic which would be nice too. We'll have to weigh the positives and negatives once we see the implementation and the resulting file size.

@MikeMcQuaid
Copy link
Member

I agree that this will not necessarily be the final version of the internal API.

That's fine. I don't think we should be generating it in formulae.brew.sh, though, until we believe we're at or close to the final version.

As I mentioned in the previous version of this PR: I'm finding it very difficult to review the actual JSON output without a test that fully outputs it and checks for equality.

I also see @Bo98's point that nesting can be helpful at times when we want to group things similarly to how they are grouped in the Formula DSL.

I agree too. What I said before is: if you're saying "let's not nest this for memory/efficiency reasons" in one place and "let's leave this nesting as-is" in another place: I'm going to point out the contradiction here and I'm going to keep requesting it be changed until we hit the final implementation of the internal API.

Similarly, aside from changes for size/performance reasons and dropping unneeded keys: I am also planning that this internal v3 API look quite a lot like what the new v3 API will look like.

If it makes life easier: I will hold off doing any more review on this PR until it's further along and I'm specifically requested for that.

@apainintheneck
Copy link
Contributor Author

I agree that this will not necessarily be the final version of the internal API.

That's fine. I don't think we should be generating it in formulae.brew.sh, though, until we believe we're at or close to the final version.

It would help us better test the format and its merits if we could generate the JSON. That was how I was assuming we were planning on iterating on this.

As I mentioned in the previous version of this PR: I'm finding it very difficult to review the actual JSON output without a test that fully outputs it and checks for equality.

The JSON output is pretty easy to check in the REPL. pp :jrnl.f.to_api_hash should do the trick. You can also run the brew generate-formula-api command locally to see what the entire JSON file will look like. I'd rather not add tests like that though until we've decided on a final format otherwise we'll just have to keep changing them. My thought was that that could wait until I start writing the loading code in a separate PR. Since we're planning on conditionally including data in the hash, the format will be variable though which means that even a few tests won't necessarily reflect how it will look with any individual formula.

I also see @Bo98's point that nesting can be helpful at times when we want to group things similarly to how they are grouped in the Formula DSL.

I agree too. What I said before is: if you're saying "let's not nest this for memory/efficiency reasons" in one place and "let's leave this nesting as-is" in another place: I'm going to point out the contradiction here and I'm going to keep requesting it be changed until we hit the final implementation of the internal API.

The format will never be a perfect representation of the formula DSL or a completely flat data structure. It's not contradictory to weigh the positives and negatives of each section when deciding on the format. What @Bo98 seems to be suggesting is that adding additional nesting in some cases could simplify the loader code in Formulary which might be worth the trade-off.

Similarly, aside from changes for size/performance reasons and dropping unneeded keys: I am also planning that this internal v3 API look quite a lot like what the new v3 API will look like.

I don't think that's a good assumption. The goals of the two APIs are not the same. It can be helpful to have a more verbose and readable format for public consumption but that's not what we're shooting for here.

@apainintheneck apainintheneck marked this pull request as ready for review February 3, 2024 04:09
@apainintheneck apainintheneck added the install from api Relates to API installs label Feb 3, 2024
@apainintheneck apainintheneck changed the title [WIP] Next gen api formula json v3 Next gen api formula json v3 Feb 3, 2024
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @apainintheneck! Fine with me to merge as-is and then we can iterate more after that.

Copy link
Member

@Bo98 Bo98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed to merge as non-final.

We need to get benchmarks here so we know we've actually made an improvement here (I hope so!)

@apainintheneck
Copy link
Contributor Author

Okay, I'll double-check again that brew generate-formula-api is generating the same output as before and see if there are any simple changes I can make to keep the hash keys in the same order. After that I'll merge this in probably sometime tomorrow.

@apainintheneck
Copy link
Contributor Author

I changed it so that the hash keys stay in the same order and that helps make the diff a bit easier to follow as well.

This will be used internally to generate a slimmer api hash representation
of formulas that will require less space and be faster to load.

Changes:
- Added #to_api_hash
- Modified #to_hash_with_variations to work with both #to_hash and #to_api_hash
- Modified #bottle_hash to have compact representation for the api hash
- Added #urls_hash to share url hash generation logic between the hash methods
Note: This changes where the "head_dependencies" key in the hash
shows up but not the hash's contents.

"head_dependencies" now shows up directly after all of the other
dependency keys. Before it was always at the end of the hash after
variations.
This adds the code to generate the homebrew-core.json file which
represents the entire tap instead of just the previous array of
formula hashes. Any shared logic has been moved into the top-level
hash scope including aliases, renames, tap_git_head and tap_migrations.

I also added a check to skip adding the variations hash to the api
hash if it is empty.

Now we're down to 10MB from 24MB!!!
- move caveats serialization to a method
- remove unnecessary nesting where the hash would only have one key

Also, removed comment about checking disable and deprecation reasons.
Now the output of commands like `brew info --json=` and
`brew generate-formula-api` should be the same as before along with
the additional files for the internal API. Before this commit the
hash key order had changed.
@p-linnane
Copy link
Member

Rebased to resolve hanging jobs with the new macOS 14 runner.

@apainintheneck
Copy link
Contributor Author

Interesting that the macOS 14 runner seems so much slower than everything else.

@apainintheneck apainintheneck merged commit d947721 into Homebrew:master Feb 4, 2024
24 checks passed
@p-linnane
Copy link
Member

@Bo98
Copy link
Member

Bo98 commented Feb 4, 2024

Wasn't that slow initially so something seems wrong, though not entirely sure on who's side

@github-actions github-actions bot added the outdated PR was locked due to age label Mar 6, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
install from api Relates to API installs outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants