Improve variable name conflict detection in GOOL #3867

B-rando1 · 2024-07-22T16:12:28Z

This conversation came out of work on the fix to #2179.

In GOOL, there are a number of places where GOOL checks to see if a variable name has been used in order to create a new variable. Examples include:

List slicing - temp is used to temporarily store the slice in, and begIdx and endIdx may be used to store the beginning or ending index, if they need to be determined at run-time.
thunkAssign - i is generated for a for-loop.
Made constDecDef capitalize constants in Python #3858 - we check to see if it is safe to capitalize the variable name before we do so.

In each of these cases, genVarName or one of its derivatives is used to make sure that the generated variable name has not been used before. For example, if we create GOOL code that declares a variable temp and then does a list slice, the Java code would look something like this:

int temp;

ArrayList<Double> temp0 = new ArrayList<Double>(0);for (int j = 1; j < 4; j += 2) {
    temp0.add(myOtherList.get(j));
}
mySlicedList2 = temp0;

This works well, but the problem is it only looks backwards; it doesn't look over the whole program. So if we did the list slice first and then declared the variable, the Java code would look like this:

ArrayList<Double> temp = new ArrayList<Double>(0);for (int j = 1; j < 4; j += 2) {
    temp.add(myOtherList.get(j));
}
mySlicedList2 = temp;

int temp;

This would throw an error, since we're declaring temp multiple times.

Ideally we would do two passes so that we know ahead of time which variable names are 'reserved by the user' - i.e. not to be generated.

If we want to do something like this, what approach should we use?

I know that CodeInfo.hs already does some preprocessing, e.g. I believe it creates the function call map that the Java renderer uses to add throws declarations to methods. We could look there to see if it's the right place to add this.
@balacij started discussing stateful rendering in Stateful artifact rendering: Verify it makes sense to implement (and implications), design, and implement #3864. I'm not exactly sure what that means to be honest, but @samm82 said it might give us the functionality we need.

The text was updated successfully, but these errors were encountered:

B-rando1 · 2024-07-22T21:29:19Z

We had discussed in our meeting that a more concrete example of this would be helpful, so I modified NameGenTest.hs on a separate branch. The GOOL code can be seen here:

Drasil/code/drasil-code/test/NameGenExample.hs

Lines 18 to 25 in 5b8e1cd

    
           [block [listDecDef temp [litInt 1, litInt 2, litInt 3], 
        
             listDec 2 result], 
        
             listSlice result (valueOf temp) (Just (litInt 1)) (Just (litInt 3)) Nothing, 
        
             block [ 
        
               comment "This shadows a generated name:", 
        
               listDecDef temp0 [litInt 1, litInt 2, litInt 3], 
        
               comment "This shadows a user-given name:", 
        
               listDec 2 result]]

The generated Java code can be seen here:

Drasil/code/stable/gooltest/java/NameGenExample/NameGenExample/NameGenExample.java

Lines 9 to 21 in 5b8e1cd

    
           ArrayList<Integer> temp = new ArrayList<Integer>(Arrays.asList(1, 2, 3)); 
        
           ArrayList<Integer> result = new ArrayList<Integer>(2); 
        
           ArrayList<Integer> temp0 = new ArrayList<Integer>(0); 
        
           for (int i = 1; i < 3; i++) { 
        
               temp0.add(temp.get(i)); 
        
           } 
        
           result = temp0; 
        
           // This shadows a generated name: 
        
           ArrayList<Integer> temp0 = new ArrayList<Integer>(Arrays.asList(1, 2, 3)); 
        
           // This shadows a user-given name: 
        
           ArrayList<Integer> result = new ArrayList<Integer>(2);

As can be seen, the generated names with the for loop avoid conflicts with the first declared variables by appending a 0 to the end of the name. Right after that, however, more user-defined names create conflicts - one with a generated name, and one with another user-defined name.

This was generated by GOOL without any issues, using functionality that has been in place for a while now (I modified the listSlice implementation, but the way it's used here wasn't affected). It shows that while our generated names attempt to be unique, they aren't perfect; and that when conflicts happen GOOL doesn't do anything about it.

From our meeting, we came up with two solutions:

We can modify GOOL to throw an error if two variables are ever given the same name. We could modify the implementation of each declaration function for each renderer, or better yet we might be able to modify the useVarName implementation, which would catch everything in one place.
- This would ensure that generated code is 'safe' in this respect, but it would not be user-friendly. We don't really want to get an error if a user-defined variable name conflicts with an auto-generated one, and instead it would be better to generate a different name. Still, it's better that GOOL throws an error than the generated code throwing an error when they try to run it, which is what currently happens.
A more robust solution would be to add some sort of 'initial pass' where we get a complete set of every user-defined variable name. This would allow us to only throw an error if the user made a mistake, i.e. they declared two variables with the same name. This would take more work, but is probably a better long-term solution.
- I need to look more into how we'd do this, but my initial thoughts were outlined in my first comment above.

JacquesCarette · 2024-07-23T09:16:21Z

Amusingly, this is somewhere we could use Haskell's laziness to our advantage: we could lazily generate names and only force them to be actually generated once the full block of code has been processed, so that it would then have the full context in place in which to do correct unique generation. I've definitely seen some code "out there" that does this (I think I recall code for the generation of labels for a one-pass generation of assembly-level code).

smiths · 2024-07-23T14:34:43Z

Great work @B-rando1. You took what we discussed in the meeting and showed results very quickly! I like the intermediate step of your first option - showing an error if two variables have the same name. From our discussion yesterday, this doesn't seem to be that difficult a change, and it would make GOOL safer. I'm not sure if it is even that bad from the user's point of view, assuming that the error message is descriptive. The user might want control over what their variable is called, instead of relying on the system making a unique name. If we do automatically make unique names in the future from the user input, there should be a log message that lets the user know, or they might be confused by the generated code.

B-rando1 · 2024-07-25T15:21:34Z

@JacquesCarette that's a cool idea! I'm not quite sure what we need to do to get it there. My understanding is that Haskell is lazy by default, but some operations will force evaluation at a certain point. Is there something we need to change to allow Haskell to defer name generation?

JacquesCarette · 2024-07-25T17:49:30Z

In theory, nothing to do. In practice, you need to make sure you're not doing certain pattern matches "too early" as that will force evaluation. Plus someone's work (Noah?) on making things strict will definitely interact with this. https://wiki.haskell.org/Maintaining_laziness is a useful resource. I think http://comonad.com/reader/2014/fast-circular-substitution/ is one example of what I mean.

B-rando1 · 2024-07-25T20:58:16Z

I tried locating the issue by using ['a' | _ <- [0..]] in a few places regarding genVarName. This allowed me to find the following:

genVarName is not lazy.
splitVarName is not lazy.
- I saw it uses reverse, which the Haskell wiki page warned against. Maybe that's the reason.
Map.lookup is lazy.
bumpVarName is not lazy.
- Nothing jumped out at me too hard here.
Data.State is lazy as far as I can tell, though it was hard to check so I might be wrong.

I'll need to do some more digging tomorrow to see what the root causes are, and if there's a good fix for them.

B-rando1 changed the title ~~Improve name conflict detection in GOOL~~ Improve variable name conflict detection in GOOL Jul 22, 2024

B-rando1 mentioned this issue Jul 22, 2024

Made constDecDef capitalize constants in Python #3858

Merged

B-rando1 mentioned this issue Jul 26, 2024

Drasil Research Group Meeting, 3pm, Monday July 29, 2024, ITB/225 #3879

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve variable name conflict detection in GOOL #3867

Improve variable name conflict detection in GOOL #3867

B-rando1 commented Jul 22, 2024

B-rando1 commented Jul 22, 2024

JacquesCarette commented Jul 23, 2024

smiths commented Jul 23, 2024

B-rando1 commented Jul 25, 2024

JacquesCarette commented Jul 25, 2024

B-rando1 commented Jul 25, 2024 •

edited

Loading

Improve variable name conflict detection in GOOL #3867

Improve variable name conflict detection in GOOL #3867

Comments

B-rando1 commented Jul 22, 2024

B-rando1 commented Jul 22, 2024

JacquesCarette commented Jul 23, 2024

smiths commented Jul 23, 2024

B-rando1 commented Jul 25, 2024

JacquesCarette commented Jul 25, 2024

B-rando1 commented Jul 25, 2024 • edited Loading

B-rando1 commented Jul 25, 2024 •

edited

Loading