Only decide the bitsize of an integer literal when interpreting, not when parsing #49252

Seelengrab · 2023-04-02T10:31:12Z

Seelengrab
Apr 2, 2023
Collaborator

Currently, the parser decides the numeric type of a literal integer like 1 to be the native integer type of the platform the parser runs on:

julia> Meta.@dump 1
Int64 1

This is an issue for cross compilation to e.g. AVR, where the native integer size is 8 bit. This has the consequence that code like eachindex("foo") or 1:10 returns an iterable with eltype Int64 when compiling on a 64 bit host, forcing code intended for AVR to emulate 64 bit integer semantics when it's not really necessary to do so, using 8 registers per Int64 instead of just one for Int8. Likewise, cross compilation from 64 bit to 32 bit experiences the same issue, just with 2 registers.

One possible solution to this is to move that decision point (i.e., that a parsed literal needs to be an Int64 or Int32 on 32 bit platforms) from the parser to the interpreter/type inference, when we already know the target architecture. The idea is to have the parser emit Int 1, without introducing an additional type, i.e. make Int !== Int64 for the parser only, and resolve this to the native integer size of the target architecture before running type inference. This keeps the existing behavior that Int === Int64 on 64 bit systems, as well as Int === Int32 on 32 bit systems, while simultaneously allowing Int === Int8 in cross compilation for AVR. Conceptually, no new types need to be introduced for this - type inference just sees a different type depending on the target architecture. This behavior should be the same that users on current 64 bit/32 bit platforms can already observe.

This also preserves all existing promotion rules & behavior, i.e. zero(Int32) + 1 still returns one(Int64) on 64 bit platforms and one(Int32) on 32 bit platforms. zero(Int) + 1 would now return one(Int), which will (for non-cross compilation) stay the same as it is now.

It may be desirable to introduce this change as part of a broader cross compilation API in codegen, since it's not clear how the following would behave with just this change:

module Foo

const x = 1
const y = x*2

end

Semantically, we need to assign a value to y, but can't really do this without knowing the concrete runtime type of x. Having a dedicated cross compilation API would solve this, due to being able to ask the interpreter to evaluate the parsed module expression in the context of a target machine different from the host machine (thereby replacing the occurences of Int in the parsed expression with the appropriate integer type before evaluating the expression). Similar issues exist with the use of macros on the top level, which may also need to be evaluated in the context of the target instead of the host, to account for differences in sizeof(Int).

JeffBezanson · 2023-04-04T15:19:39Z

JeffBezanson
Apr 4, 2023
Maintainer Sponsor

The "default" integer type also comes up when asking for the length of an Array or Tuple.

1 reply

Seelengrab Apr 4, 2023
Collaborator Author

Yes. Tuples are (usually) less problematic (in the context of AVR at least), due to constant propagation and loop unrolling optimizing them away. Arrays are more of an issue, due to (often) requiring a ccall and GC, as well as using pointers, the size of which of course also depends on the target platform..

(Pointers are a bit of an additional orthogonal problem since ~most pointers on AVR are 16 bit (e.g. the stack pointer is implemented as two 8-bit registers, even though sometimes only the low byte is needed), while the vast majority of registers are only 8 bit (except for 3 16-bit general purpose registers, which are usually used for things like indirect stores, again exposed as two 8-bit registers). Extracting/using those is.. brittle.)

Seelengrab · 2024-05-08T07:30:58Z

Seelengrab
May 8, 2024
Collaborator Author

This discourse thread had me think a bit more about this, and there's a usecase that complicates this.

Say you have a microcontroller like the ATmega328p with its word size of Int8. Currently, Int in julia source code ends up as whatever the host compiler has as its word size, Int64 on my machine. Structs on the microcontroller (albeit overly-large for some cases) and on that host machine have the exact same layout (padding and all), because the julia compiler is oblivious to the difference in architecture. This has an advantage: if you want to communicate between the microcontroller & the host machine, you can (in theory) just serialize into the communication channel of your choice, deserialize on the other end and you're done. Very convenient!

Now, if we were to make the compiler aware on the Int level of the other architecture, we couldn't do that anymore. The padding in structs wouldn't necessarily match up anymore, a field declared as Int would be 8 byte on one machine and 1 byte on the microcontroller - you get the idea. That is, the following struct:

struct Foo
    a::Int8
    b::Int
end

is 16 bytes (an Int8 + 7 bytes of padding + an Int64) on my x86_64 machine, but would be 2 bytes (and Int8 and another Int8) on the ATmega, even though the definition is exactly the same! This would cause a mismatch when deserializing to the same struct with a different type. This is of course also mentioned in the docstring of serialize:

  In some cases, the word size (32- or 64-bit) of the reading and writing machines must
  match. In rarer cases the OS or architecture must also match, for example when using
  packages that contain platform-dependent code.

I'm unsure of general solutions to this. To cover all cases, I suspect we'd have to not only write a type tag, but actively check that the expected layouts are the same in some sort of initialization handshake. Or we could write the layout before serializing the actual value. Alternatively, we'd have to not use Int/wordsize-generic types at all for structs that are intended to be shared across architectures.

As far as the original motivation for the discussion/issue is concerned - the type of integer literals should IMO still be flexible. That is, 1 should still be an Int64 on 64 bit machines, an Int32 on 32-bit machines and so on. It's the alias Int that's difficult to reconcile.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only decide the bitsize of an integer literal when interpreting, not when parsing #49252

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Only decide the bitsize of an integer literal when interpreting, not when parsing #49252

Seelengrab Apr 2, 2023 Collaborator

Replies: 2 comments · 1 reply

JeffBezanson Apr 4, 2023 Maintainer Sponsor

Seelengrab Apr 4, 2023 Collaborator Author

Seelengrab May 8, 2024 Collaborator Author

Seelengrab
Apr 2, 2023
Collaborator

Replies: 2 comments 1 reply

JeffBezanson
Apr 4, 2023
Maintainer Sponsor

Seelengrab Apr 4, 2023
Collaborator Author

Seelengrab
May 8, 2024
Collaborator Author