Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make tags use #+ syntax instead of //+ #4203

Merged
merged 16 commits into from
Sep 19, 2024

Conversation

karl-zylinski
Copy link
Contributor

@karl-zylinski karl-zylinski commented Sep 5, 2024

This draft PR makes tags such as

//+build darwin, linux

instead look like this:

#+build darwin, linux

I.e. it no longer looks like a comment. A bigger example:

#+private
#+build darwin, linux
#+no-instrumentation
// Important documentation about the file.
package game

In this discussion post I outlined why using a comment for this is a problem: #4200

Here's what I've done

  • Added a new token for #+ called FileTag
  • Reusing the same code for parsing the tags as before, with some minor changes.
  • Backwards compatible with //+ with a deprecation warning
  • Error if you put anything else than lines starting with // or #+ before package line
  • Added package_line_seen bool to the tokenizer. This is not strictly necessary. But it ensures you get less crazy errors if you write !+ in the code after the package line. By having package_line_seen I can make sure to treat #+ as a "FileTag" token only if the package line has not yet been seen. Since I only allow comments or file tags before the package line, it all makes sense in the end.

This is a draft PR and a proof-of-concept. I think this tag stuff has to change in some way, but it doesn't have to be #+.

Also if we were to go through with something like this I would of course search-replace all the //+ tags in core and base to the new format. And I'd also help with updating core:odin

Note: I did try using + instead of #+. It worked fine after I introduced package_line_seen. However it makes it hard to create syntax highlighting if you do it that way.

UPDATE: Since starting this PR I have changed from !+ to #+ based on the comments below. Currently the tag parsing still works like before. I.e. the only difference from before is that //+ has become #+. See #4203 (comment) for an up-to-date summary of what I've changed.

@karl-zylinski
Copy link
Contributor Author

karl-zylinski commented Sep 5, 2024

There was a suggestion in the discussion to use # instead.

I could keep the syntax the same other than that.

Then it'd be

#private
#build darwin, linux
#no-instrumentation
package game

However perhaps I'd change #no-instrumentation to #no_instrumentation, because the - looks inconsistent with regard to other # usages.

Or it could move to use a more #assert(...) style syntax:

#private
#build(darwin, linux)
#no_instrumentation
package game

A third idea is to not have anything at all in front the the tags:

private
build darwin linux,
no-instrumentation
package game

That should also be possible since we know if the tokenizer has seen the package line yet, so there can be a set of keywords that are only used in pre-package-line-land. However, that will again not play nicely with syntax highlighting etc since you must have very smart editors to know if you're before package or not.

@gingerBill
Copy link
Member

I like the idea of having something separate. I originally did the comment thing because it was trivial to do so with the current parser. But a custom token is not out of the question.

!+ is okay, but So could #+ too. I don't want it to be just # because I don't want to overload its meaning.

# is for directives to modify the behaviour of something or a built-in thing (e.g. #load, #config, etc)

#+ would be fine in my opinion and it visually distinct.

@karl-zylinski
Copy link
Contributor Author

@gingerBill I prefer #+ too since its distinct like you said. Also it means we are not introducing ! as yet another major way to start a line. A good balance.

If we do this, should we just preserve the current syntax of the tags? It would then be

#+private
#+build linux, darwin
#+no-instrumentation

We could also change it so the style is similar to that of #assert(...):

#+private
#+build(linux, darwin)
#+no_instrumentation

The above has two differences: parenthesis around build and the - in no-instrumentation has been changed to _

I feel like if we had used only # without the +, then moving to the #assert(...)-style would look good. But something about the + inbetween makes me prefer the first style (which is just the current style but replacing // with #).

@jakubtomsu
Copy link
Contributor

I really like this proposal, especially if it could also do error checking for unknown package attributes (or parameters)!

I like Karl's suggestion for the syntax similar to #assert, it's the most consistent with other things in the language. And #+ is a good idea as well IMO.

@laytan
Copy link
Collaborator

laytan commented Sep 6, 2024

This is already great, I wonder if we could also (not necessarily in this pr because it's already a huge improvement) make it more like normal expressions, something like #+build(darwin || (linux && amd64)) instead of its own distinct rules about these specific expressions. And indeed like @jakubtomsu said, have proper errors for undeclared things and bad expressions.

@laytan
Copy link
Collaborator

laytan commented Sep 6, 2024

That would involve operator precedence and things like that which probably balloons the scope of this^

@karl-zylinski
Copy link
Contributor Author

@laytan Having more advanced syntax is a bit of trouble because currently the parsing of the tags happens directly in parse_file. If we need more advanced syntax then we need to somehow involve later stages of the compiler at that point, or have some mini-dsl-thingy for the tags.

For example, look at parse_build_tag in the source right now. It's kind-of a hacky manual parsing. But it works and we can replace it later with something more fancy if we wish. However, if we want fancy syntax now then we must probably replace it sooner. I can probably get the parenthesis in for so we can write #+build(linux, darwin), but anything more than that will probably be a lot more work. Which is fine! But I'm not sure that I can do that work.

One thing to consider here is that changing this several times is probably a bad idea, I approach this with the mindset that whatever we change these tags to be now, is how it is gonna be forever.

@Kelimion
Copy link
Member

Kelimion commented Sep 6, 2024

@laytan Having more advanced syntax is a bit of trouble because currently the parsing of the tags happens directly in parse_file. If we need more advanced syntax then we need to somehow involve later stages of the compiler at that point, or have some mini-dsl-thingy for the tags.

And while I would prefer the same expression syntax as Laytan proposes, it would also deluge us with people asking why they can't use #config(FOO, true) or other constants defined somewhere in the package on that first line, as they might in a when statement.

Which isn't something that can be cured without a catch-22 of having to parse the package before you know you whether you should parse the file in the package. It would necessarily have to be a restricted expression subset that itself would induce people to ask "why can I not X?", simply because it would work in a when statement.

Given the required parser overhaul and the "why can't I" maintenance burden, I'd rather keep the current specific syntax with the newly proposed line prefix.

@laytan
Copy link
Collaborator

laytan commented Sep 6, 2024

I'm fine with leaving it, I just have to always look up what the syntax was again for any non-trivial case.

Like the difference between these two already get my head spinning:

//+build !linux
//+build !windows

and

//+build !linux, !windows

@Feoramund
Copy link
Contributor

And while I would prefer the same expression syntax as Laytan proposes, it would also deluge us with people asking why they can't use #config(FOO, true) or other constants defined somewhere in the package on that first line, as they might in a when statement.

That's a good point; I hadn't thought of that when I suggested the #(...) syntax.

I also suggested the possibility of no symbol in the discussion, which would be possible because these statements are valid only before the package declaration. It's probably better to have a visually distinct line initiator though, and likely any two characters that have been suggested would suffice, like #!, !+, #+.

I think at that point it might just be a matter of style. The #! shebang makes a bit of sense with how it's used in shell scripting but not quite. The !+ and #+ are both distinct prefixes. #+ has the + we already use and the # which we use for various compile-time features. The !+ is very distinct, but maybe the ! would be confusing with statements like //+build !darwin turning into !+build !darwin?

Thinking about it some more, #+ might be the best compromise from this perspective.

Like the difference between these two already get my head spinning:

Those two have different results?

@karl-zylinski
Copy link
Contributor Author

@Feoramund Great point about ! might being confusing along with the !darwin etc. I'm all for #+ now.

I'm still very on the fence if I prefer #+build darwin, linux or #+build(darwin, linux) or #+build(darwin && linux) -- the problem is that the "symbols" like darwin and linux etc are not symbols at all, so in line with what @Kelimion said it feels like having a too similar style might trick people into thinking that they can use other constants here. The style with && makes it look like these words are booleans in the code, but they are just platform names that don't exist outside the tag system.

If it did look like code then perhaps it should us real constants like #+build(ODIN_OS == .Windows), but that is of much greater scope than what we currently have, and also quite messy to have at the top of files when there are a lot of platforms.

@laytan I don't understand how the two examples you posted about have a different result, can you elaborate? I mean, you are on a certain platform. Either your platform is on the list or forbidden using the !, how can other platforms affect the one you are on right now?

@karl-zylinski karl-zylinski changed the title Make tags use !+ syntax instead of //+ Make tags use #+ syntax instead of //+ Sep 6, 2024
@karl-zylinski
Copy link
Contributor Author

I have pushed some changes that makes this use #+ instead of !+.

I've also made the usage of unknown tags into an error instead of a warning.

@karl-zylinski
Copy link
Contributor Author

Did some tests with sublime syntax highlight to see how it feels. Looks nice to me!
image

@Yawning
Copy link
Contributor

Yawning commented Sep 7, 2024

This is already great, I wonder if we could also (not necessarily in this pr because it's already a huge improvement) make it more like normal expressions, something like #+build(darwin || (linux && amd64)) instead of its own distinct rules about these specific expressions. And indeed like @jakubtomsu said, have proper errors for undeclared things and bad expressions.

I really want this (and this is a big reason why Go changed from //+build to //go:build <logical expression>, because it was a huge foot+gun and they surveyed packages and found a lot of places where things were getting it wrong).

Changing the syntax would be a good opportunity to make sure that core is getting it right at least (and there were cases where we weren't in the past), so consider this a vote for future work ("on my wishlist enough that I will go further than an emoji").

@Yawning
Copy link
Contributor

Yawning commented Sep 7, 2024

Like the difference between these two already get my head spinning:

Those two have different results?

Build if !linux AND !windows:

//+build !linux
//+build !windows

Build if !linux OR !windows:

//+build !linux, !windows

The foot+gun potential here is quite high. While having complex-ish conditionals is likely rare, I am in the camp of "Go tried this syntax, and explicitly changed it because it was error prone, we should learn from their experiences".

@Kelimion
Copy link
Member

Kelimion commented Sep 7, 2024

Like the difference between these two already get my head spinning:

Those two have different results?

Build if !linux AND !windows:

//+build !linux
//+build !windows

Build if !linux OR !windows:

//+build !linux, !windows

The foot+gun potential here is quite high. While having complex-ish conditionals is likely rare, I am in the camp of "Go tried this syntax, and explicitly changed it because it was error prone, we should learn from their experiences".

This isn't consistent with where clauses, in which a comma separated list still all has to be true, so more of an AND.

@karl-zylinski
Copy link
Contributor Author

I see, while I agree that having better conditionals would a good idea, one comment about the above is this:

Build if !linux AND !windows:

//+build !linux
//+build !windows

If you are on Windows you can never be on Linux at the same time. So isn't one solution to this to not allow more than one line with a build tag on it? Allowing more than one build tag line sounds like a bug to me. Or am I missing some benefit of allowing it?

@Yawning
Copy link
Contributor

Yawning commented Sep 7, 2024

I see, while I agree that having better conditionals would a good idea, one comment about the above is this:

Build if !linux AND !windows:

//+build !linux
//+build !windows

If you are on Windows you can never be on Linux at the same time. So isn't one solution to this to not allow more than one line with a build tag on it? Allowing more than one build tag line sounds like a bug to me. Or am I missing some benefit of allowing it?

Both logical OR and AND are useful and arguably required for this use-case. The existing syntax does not allow for this unless you have multiple lines.

@laytan
Copy link
Collaborator

laytan commented Sep 7, 2024

I see, while I agree that having better conditionals would a good idea, one comment about the above is this:

Build if !linux AND !windows:

//+build !linux
//+build !windows

If you are on Windows you can never be on Linux at the same time. So isn't one solution to this to not allow more than one line with a build tag on it? Allowing more than one build tag line sounds like a bug to me. Or am I missing some benefit of allowing it?

That's not linux and not windows, so everything else, I don't think you can do that with one line.

See how confusing these expressions are?

@karl-zylinski
Copy link
Contributor Author

karl-zylinski commented Sep 7, 2024

@laytan Thanks for the example, I get the issue now.

For anyone who hasn't understood:

//+build  !windows, !linux

The above will, contrary to what the user might think, build when you are on windows, because it will match the !linux one because this expression is like !windows || !linux. So to get that working correctly you must use a separate line, which gives an AND behavior. This is very easy to mess up and I agree that it should be changed.

What I think that the user expects when they write something like the line above is that !platform uses AND and platform uses OR. So for this:

//+build  !windows, !linux, freebsd, netbsd

The user expects this:

//+build !windows && !linux && (freebsd || netbsd)

I.e. when they look at the following

//+build  !windows, !linux, freebsd, netbsd

then the users expects that they cannot be on any of the platforms with a ! but they must be on one of the platforms without the !.

@Kelimion As you see this can't work like when, because we are only on a single platform at a time. when wants all the conditions to be true. While for platforms without ! we only need one of them to be true.

Adding in this magic AND / OR stuff depending on the presence of ! is probably too much black magic, so I agree that we need proper conditional operators instead.

@karl-zylinski
Copy link
Contributor Author

karl-zylinski commented Sep 7, 2024

I realized that you never mix !os and os, not even between separate tags. It is never done in any code, because it makes not sense. Look at this:

//+build !linux
//+build !windows
//+build !darwin
//+build netbsd

the above makes no sense. The moment you have something without the exclamation mark in there then all the other ones are not needed. Whenever we use build tags with ! we don't have a single on without the !. It never happens because it makes no sense.

Therefore the examples I've been using here such as //+build !windows && !linux && (freebsd || netbsd) make no sense. That example is just //+build free_bsd || net_bsd

Because of this, perhaps we should not have ! at all. This also means that perhaps we shouldn't have the conditionals that involve && and ||. The solution I would propose is to use two separate tags:

#+build linux, darwin

and

#+no_build linux, windows, darwin

build uses OR and no_build uses AND. I don't think its confusing, because what else would it do?

Also, with this style having both build and no_build is forbidden. It makes no sense:

#+build linux
#+no_build windows

This says it builds only on linux but also says that it doesn't build on windows. The moment you have a build tag the no_build tag should be an error.

Also with this style you cannot have multiple build and no_build lines, clearing any confusion about what that would mean.

As an example, this stuff from rand_generic.odin:

//+build !linux
//+build !windows
//+build !openbsd
//+build !freebsd
//+build !netbsd
//+build !darwin
//+build !js

would be

#+no_build linux, windows, openbsd, freebsd, netbsd, darwin, js

@karl-zylinski
Copy link
Contributor Author

From a discussion in the discord I have just learned that this uses OR:

//+build linux, windows

and this uses AND:

//+build windows linux

The difference is the comma. This is used to combine OSes with architectures:

//+build js wasm64p32, js wasm32

I.e. this gives us (js && wasm64p32) || (js && wasm32)

I should have read parse_build_tags more properly when working on this. But in any case, there was a bunch of people that didn't know this.

An idea on how to make this clearer is:

#+build js_wasm64p32, js_wasm32

So you can never take apart an OS and an architecture.
If that is too limiting then I'm afraid I'd recommend using conditionals again:

#+build js && (wasm64p32 || wasm32)

Note that this is an uncommon case, but it needs to be supported.

@gingerBill
Copy link
Member

To keep things simple, things should just be the following:

#+build os
#+build arch
#+build os arch, os arch
#+build os arch, !os arch

etc

@gingerBill
Copy link
Member

And error messages should happen when some one does something like

#+build windows linux

which is an impossible combination

@karl-zylinski
Copy link
Contributor Author

karl-zylinski commented Sep 7, 2024

@gingerBill I'll get on with making it like so. All the other stuff we talked about here went a bit over my head anyways.

One question I have is about how we should handle the tokenization. Currently when I hit #+ I skip the whole line and parse it later just like before. That worked fine when it was a comment. But now if you have #+build windows, linux // hello then the comment at the end gets swallowed and becomes an error in the tag parser. Should I add some "skip line, but only up until comment" in the tokenizer?

Update: I did like I said, skip line to newline or until start of comment.

…ike a comment. Old style still works but is deprecated with a warning. Using unknown tags is now an error instead of a warning. There is a new token for #+ which consumes the whole line (or until it hits a comment). The tags are parsed like before. There are errors to tell you if you use something invalid in the pre-package-line block.
@karl-zylinski
Copy link
Contributor Author

karl-zylinski commented Sep 8, 2024

Edit: Nevermind this message, I had misread in the code how the ! worked. I thought it was per (os arch) group because of a silly assumption.

@gingerBill Would you mind if I changed the syntax to os_arch?

#+build os
#+build arch
#+build os_arch, os_arch
#+build os_arch, !os_arch

This has the benefits that it:

  • Makes !os arch less confusing. With !os_arch it's clearer that the combo is notted and not just the OS.
  • Makes it look the same as file suffixes (I can make arch_os work too, so it works in an identical way).
  • The && between the os and arch is apparent.
  • Less confusion between what space and comma does because spaces don't do anything anymore. In each comma separated group there can be only a single os, arch or os_arch, with an optional ! in front. This also takes care of the impossible windows linux case.
  • Easier to parse.

@@ -6091,7 +6097,7 @@ gb_internal String build_tag_get_token(String s, String *out) {
}

gb_internal bool parse_build_tag(Token token_for_pos, String s) {
String const prefix = str_lit("+build");
String const prefix = str_lit("build");
Copy link
Contributor Author

@karl-zylinski karl-zylinski Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ is now removed earlier because we do a deprecation check outside of parse_file_tag that also skips over the +

bool first_invalid_token_set = false;
Token first_invalid_token = {};

while (f->curr_token.kind != Token_package && f->curr_token.kind != Token_EOF) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we need to find the file tag tokens and skip comment groups we now loop until we find the package token.

} else {
if (!first_invalid_token_set) {
first_invalid_token_set = true;
first_invalid_token = f->curr_token;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used so we can get nice error message in both situations where there are incorrect things before package line but also so that "Expected a package declaration" ends up at the top token instead at the bottom of the file.

} else {
docs = nullptr;
}
}
Copy link
Contributor Author

@karl-zylinski karl-zylinski Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this check is needed anymore? It must have been in case there was some invalid between the comments at the top and the package line, right? Now we have the new stuff earlier in this proc that makes sure we only have file tags and comments above the package line.

@karl-zylinski
Copy link
Contributor Author

karl-zylinski commented Sep 9, 2024

The compiler parts of this is ready for review. I've done what Bill said, including putting in some new error messages. I put in a few review comments of my own to explain what I've done.

If it all looks good then I can move on to fix the parser in core:odin. Lastly I also need to migrate all the base, core and vendor files, but that I will do just before merging so that I don't get wild merge conflicts / miss files.

@karl-zylinski karl-zylinski marked this pull request as ready for review September 14, 2024 18:01
@karl-zylinski
Copy link
Contributor Author

karl-zylinski commented Sep 14, 2024

This is ready for review now.

Summary of everything in this PR:

  • Introduced #+ token as "FileTag" token in the compiler
  • Made compiler read tags that use this new token and use mostly the old code path for parsing tags
  • Added errors in build tag reading when you create impossible combinations or use more than one OS or more than one arch per comma separated group
  • Added error if you type anything else than comments or tags above the package line
  • Added deprecation warning: Syntax Warning: '//+' is deprecated: Use '#+' instead
  • Introduced #+ token into core:odin/tokenizer and made the ast.File contain the tags so that the parsing of file tags can use that stuff later.
  • Migrated all .odin files in core, base, vendor, example and tests to use the new style.
  • Found and fixed a few errors where the old //+build OS tag was after the package line and thus silently ignored.

To help you find stuff in this jungle of changed things:

  • Compiler changes are in src/parser.cpp and src/tokenizer.cpp
  • Odin core parser changes are in core/odin/parser/parser.odin, core/odin/parser/file_tags.odin and core/odin/tokenizer/tokenizer.odin
  • The rest of the changes should be all //+ -> #+ migrations.

Sorry for all the confusion on my side earlier. I hadn't written much C in several years, so I misread some code and jumped to incorrect conclusions :)

@gingerBill
Copy link
Member

I want to merge this in at the end of the week. It appears to be absolutely fine!

@karl-zylinski
Copy link
Contributor Author

karl-zylinski commented Sep 17, 2024

Great! Thanks for the review. I just pushed some additional //+ -> #+ changes that change recent //+ additions or that I missed previously. I think that should be all of them.

@gingerBill gingerBill merged commit 6bbeb0a into odin-lang:master Sep 19, 2024
7 checks passed
@karl-zylinski karl-zylinski deleted the file-tags-without-comments branch September 19, 2024 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants