Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve variable name conflict detection in GOOL #3867

Open
B-rando1 opened this issue Jul 22, 2024 · 6 comments
Open

Improve variable name conflict detection in GOOL #3867

B-rando1 opened this issue Jul 22, 2024 · 6 comments

Comments

@B-rando1
Copy link
Collaborator

This conversation came out of work on the fix to #2179.

In GOOL, there are a number of places where GOOL checks to see if a variable name has been used in order to create a new variable. Examples include:

  • List slicing - temp is used to temporarily store the slice in, and begIdx and endIdx may be used to store the beginning or ending index, if they need to be determined at run-time.
  • thunkAssign - i is generated for a for-loop.
  • Made constDecDef capitalize constants in Python #3858 - we check to see if it is safe to capitalize the variable name before we do so.

In each of these cases, genVarName or one of its derivatives is used to make sure that the generated variable name has not been used before. For example, if we create GOOL code that declares a variable temp and then does a list slice, the Java code would look something like this:

int temp;

ArrayList<Double> temp0 = new ArrayList<Double>(0);for (int j = 1; j < 4; j += 2) {
    temp0.add(myOtherList.get(j));
}
mySlicedList2 = temp0;

This works well, but the problem is it only looks backwards; it doesn't look over the whole program. So if we did the list slice first and then declared the variable, the Java code would look like this:

ArrayList<Double> temp = new ArrayList<Double>(0);for (int j = 1; j < 4; j += 2) {
    temp.add(myOtherList.get(j));
}
mySlicedList2 = temp;

int temp;

This would throw an error, since we're declaring temp multiple times.

Ideally we would do two passes so that we know ahead of time which variable names are 'reserved by the user' - i.e. not to be generated.

If we want to do something like this, what approach should we use?

@B-rando1 B-rando1 changed the title Improve name conflict detection in GOOL Improve variable name conflict detection in GOOL Jul 22, 2024
@B-rando1
Copy link
Collaborator Author

We had discussed in our meeting that a more concrete example of this would be helpful, so I modified NameGenTest.hs on a separate branch. The GOOL code can be seen here:

[block [listDecDef temp [litInt 1, litInt 2, litInt 3],
listDec 2 result],
listSlice result (valueOf temp) (Just (litInt 1)) (Just (litInt 3)) Nothing,
block [
comment "This shadows a generated name:",
listDecDef temp0 [litInt 1, litInt 2, litInt 3],
comment "This shadows a user-given name:",
listDec 2 result]]

The generated Java code can be seen here:

ArrayList<Integer> temp = new ArrayList<Integer>(Arrays.asList(1, 2, 3));
ArrayList<Integer> result = new ArrayList<Integer>(2);
ArrayList<Integer> temp0 = new ArrayList<Integer>(0);
for (int i = 1; i < 3; i++) {
temp0.add(temp.get(i));
}
result = temp0;
// This shadows a generated name:
ArrayList<Integer> temp0 = new ArrayList<Integer>(Arrays.asList(1, 2, 3));
// This shadows a user-given name:
ArrayList<Integer> result = new ArrayList<Integer>(2);

As can be seen, the generated names with the for loop avoid conflicts with the first declared variables by appending a 0 to the end of the name. Right after that, however, more user-defined names create conflicts - one with a generated name, and one with another user-defined name.

This was generated by GOOL without any issues, using functionality that has been in place for a while now (I modified the listSlice implementation, but the way it's used here wasn't affected). It shows that while our generated names attempt to be unique, they aren't perfect; and that when conflicts happen GOOL doesn't do anything about it.

From our meeting, we came up with two solutions:

  • We can modify GOOL to throw an error if two variables are ever given the same name. We could modify the implementation of each declaration function for each renderer, or better yet we might be able to modify the useVarName implementation, which would catch everything in one place.
    • This would ensure that generated code is 'safe' in this respect, but it would not be user-friendly. We don't really want to get an error if a user-defined variable name conflicts with an auto-generated one, and instead it would be better to generate a different name. Still, it's better that GOOL throws an error than the generated code throwing an error when they try to run it, which is what currently happens.
  • A more robust solution would be to add some sort of 'initial pass' where we get a complete set of every user-defined variable name. This would allow us to only throw an error if the user made a mistake, i.e. they declared two variables with the same name. This would take more work, but is probably a better long-term solution.
    • I need to look more into how we'd do this, but my initial thoughts were outlined in my first comment above.

@JacquesCarette
Copy link
Owner

Amusingly, this is somewhere we could use Haskell's laziness to our advantage: we could lazily generate names and only force them to be actually generated once the full block of code has been processed, so that it would then have the full context in place in which to do correct unique generation. I've definitely seen some code "out there" that does this (I think I recall code for the generation of labels for a one-pass generation of assembly-level code).

@smiths
Copy link
Collaborator

smiths commented Jul 23, 2024

Great work @B-rando1. You took what we discussed in the meeting and showed results very quickly! I like the intermediate step of your first option - showing an error if two variables have the same name. From our discussion yesterday, this doesn't seem to be that difficult a change, and it would make GOOL safer. I'm not sure if it is even that bad from the user's point of view, assuming that the error message is descriptive. The user might want control over what their variable is called, instead of relying on the system making a unique name. If we do automatically make unique names in the future from the user input, there should be a log message that lets the user know, or they might be confused by the generated code.

@B-rando1
Copy link
Collaborator Author

@JacquesCarette that's a cool idea! I'm not quite sure what we need to do to get it there. My understanding is that Haskell is lazy by default, but some operations will force evaluation at a certain point. Is there something we need to change to allow Haskell to defer name generation?

@JacquesCarette
Copy link
Owner

In theory, nothing to do. In practice, you need to make sure you're not doing certain pattern matches "too early" as that will force evaluation. Plus someone's work (Noah?) on making things strict will definitely interact with this. https://wiki.haskell.org/Maintaining_laziness is a useful resource. I think http://comonad.com/reader/2014/fast-circular-substitution/ is one example of what I mean.

@B-rando1
Copy link
Collaborator Author

B-rando1 commented Jul 25, 2024

I tried locating the issue by using ['a' | _ <- [0..]] in a few places regarding genVarName. This allowed me to find the following:

  • genVarName is not lazy.
  • splitVarName is not lazy.
    • I saw it uses reverse, which the Haskell wiki page warned against. Maybe that's the reason.
  • Map.lookup is lazy.
  • bumpVarName is not lazy.
    • Nothing jumped out at me too hard here.
  • Data.State is lazy as far as I can tell, though it was hard to check so I might be wrong.

I'll need to do some more digging tomorrow to see what the root causes are, and if there's a good fix for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants