Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive type alias segfaults #1867

Closed
evantypanski opened this issue Sep 20, 2024 · 4 comments · Fixed by #1881
Closed

Recursive type alias segfaults #1867

evantypanski opened this issue Sep 20, 2024 · 4 comments · Fixed by #1881
Assignees
Labels
Bug Something isn't working

Comments

@evantypanski
Copy link
Contributor

evantypanski commented Sep 20, 2024

The following code segfaults during compilation via spicyc:

module segfault;

type Data = Data;

Like:

etyp:/Users/etyp/src/tests/spicy $ spicyc -c segfault-recursive.spicy
[1]    16815 segmentation fault  spicyc -c segfault-recursive.spicy

Offending line just loops infinitely trying to resolve (unsurprisingly)

I think there's merit in having this be possible, but it can also be a difficult problem sometimes, so just not segfaulting would work too.

EDIT: To be clear, that minimal case above doesn't seem useful, but this kind of type may be:

# This still segfaults
type Data = tuple<
    # Some random fields...
    optional<real>,
    optional<int64>,
    optional<bytes>,
    # Or a vector!
    optional<vector<Data>>,
>;
@evantypanski evantypanski added the Bug Something isn't working label Sep 20, 2024
@rsmmr
Copy link
Member

rsmmr commented Sep 24, 2024

I believe we should actually reject both cases, and more generally such self-recursive types.
A type defined through itself seems ill-defined in our type system, and cannot be expressed by the generated C++ code either. The way to make that work would be a reference:

type Data = tuple<
     ...
    optional<vector<Data&>>,
>;

@evantypanski
Copy link
Contributor Author

evantypanski commented Sep 25, 2024

With the reference (exactly the provided example) also seems to segfault

FWIW, C++17 (I think) does allow incomplete types in vectors:

#include <iostream>
#include <vector>

struct Data {
  std::vector<Data> v;
  int i = 0;
};

int main() {
  Data dat;
  dat.v.push_back({.i = 5});
  dat.v.push_back(dat);

  for (auto &inner : dat.v) {
    std::cout << inner.i << std::endl; // prints 5 then 0
  }
}

but that may be hard to replicate, I don't know

@rsmmr
Copy link
Member

rsmmr commented Sep 25, 2024

With the reference (exactly the provided example) also seems to segfault

Darn. :-) Now we definitely have a bug.

FWIW, C++17 (I think) does allow incomplete types in vectors:

Interesting, I never realized that C++17 added that. But yeah, I don't immediately see how we'd use that, seems the standard supports only vectors and lists for this. However, if there was a way to make this work for our containers and structs, then that could have quite an impact on our generated code because many instances of our ValueReference wrapping would no longer be needed; also see #1781, which addresses the same thing (though there are also new problems with putting more stuff on the heap: it has a performance impact on ours fibers)

@bbannier
Copy link
Member

I think @timwoj reported some issues around incomplete types in Spicy when he tried to bump Zeek to C++20, I wonder (but didn't check) whether that was due to how we emit defaulted functions, see e.g., https://www.lukas-barth.net/blog/cpp20-vector-incomplete/#sec-speculation.

@evantypanski evantypanski self-assigned this Sep 30, 2024
evantypanski added a commit that referenced this issue Sep 30, 2024
This is part of #1867 - there is a fair bit more work to do, but this
recursion limit gives a simple way to prevent overflows without a pretty
unecessary cycle detector.

Types within containers (like vectors) or references will continue to
segfault - that happens at type unification. Then, in order to allow
references, more will be necessary.
evantypanski added a commit that referenced this issue Oct 3, 2024
This is part of #1867 - there is a fair bit more work to do, but this
recursion limit gives a simple way to prevent overflows without a pretty
unecessary cycle detector.

Types within containers (like vectors) or references will continue to
segfault - that happens at type unification. Then, in order to allow
references, more will be necessary.
evantypanski added a commit that referenced this issue Oct 3, 2024
Closes #1867

There are two different cases where infinite loops happen with recursive
types. First, a type may reference itself (`type Data = Data`). Second,
a type may reference itself inside some other type (`type Data =
vector<Data>`).

The first is fixed with a recursion limit. Since the type simply cannot
resolve, it doesn't get anywhere near codegen. You could detect cycles,
but that introduces some extra overhead and complexity that shouldn't
be needed in a "simple" function.

The second is fixed with an ad-hoc "occurs" check in type unification.
That just detects cycles and aborts if one is present. This could be
placed at some other place in the "resolve until convergence" loop, but
it seems best put closest to the source of the issue.
evantypanski added a commit that referenced this issue Oct 4, 2024
Closes #1867

There are two different cases where infinite loops happen with recursive
types. First, a type may reference itself (`type Data = Data`). Second,
a type may reference itself inside some other type (`type Data =
vector<Data>`).

The first is fixed with a recursion limit. Since the type simply cannot
resolve, it doesn't get anywhere near codegen. You could detect cycles,
but that introduces some extra overhead and complexity that shouldn't
be needed in a "simple" function.

The second is fixed with an ad-hoc "occurs" check in type unification.
That just detects cycles and aborts if one is present. This could be
placed at some other place in the "resolve until convergence" loop, but
it seems best put closest to the source of the issue.
evantypanski added a commit that referenced this issue Oct 4, 2024
Closes #1867

There are two different cases where infinite loops happen with recursive
types. First, a type may reference itself (`type Data = Data`). Second,
a type may reference itself inside some other type (`type Data =
vector<Data>`).

The first is fixed with a recursion limit. Since the type simply cannot
resolve, it doesn't get anywhere near codegen. You could detect cycles,
but that introduces some extra overhead and complexity that shouldn't
be needed in a "simple" function.

The second is fixed with an ad-hoc "occurs" check in type unification.
That just detects cycles and aborts if one is present. This could be
placed at some other place in the "resolve until convergence" loop, but
it seems best put closest to the source of the issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants