Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDTF crash server if return != len(input) #422

Open
tupui opened this issue Jan 27, 2022 · 3 comments
Open

UDTF crash server if return != len(input) #422

tupui opened this issue Jan 27, 2022 · 3 comments
Labels
bug Something isn't working heavydb Related to heavydb server

Comments

@tupui
Copy link
Contributor

tupui commented Jan 27, 2022

The server crashes considering the following example, with input vectors of more than 2 and returning 2:

@omnisci('int32(Column<int32>, OutputColumn<int32>)')
def example(input, out):
    size = len(input)
    for i in range(size):
        out[i] = input[i]
    return 2
@tupui tupui added bug Something isn't working heavydb Related to heavydb server labels Jan 27, 2022
@tupui tupui changed the title UDTF crash server if return != len(input) UDTF crash server if return != len(input) Jan 27, 2022
@pearu
Copy link
Contributor

pearu commented Jan 27, 2022

The server likely crashes because no memory has been allocated to the output parameter out. Use:

@omnisci('int32(Column<int32>, OutputColumn<int32>)')
def example(input, out):
    size = len(input)
    set_output_row_size(size)
    for i in range(size):
        out[i] = input[i]
    return size

On the other hand, avoiding the server crash on the issue example may require analyzing the generated code and the corresponding signature. For instance, when the input specifies no sizer arguments and the body makes no call to set_output_row_size function then the resulting operator will likely crash the server. Another approach would be to implement a range check on indexing input and output columns so that running the above example on the server would result in an index error but it would keep the server alive.

@tupui
Copy link
Contributor Author

tupui commented Jan 28, 2022

But then how to only return a slice? I thought this return could be used for that?

For crashing the server, would there be a way to use some sort of sandbox or pre-validation? It would be good to check the function when it's being registered so that a user cannot crash the server.

@pearu
Copy link
Contributor

pearu commented Jan 28, 2022

But then how to only return a slice? I thought this return could be used for that?

There are (perhaps too many) number of ways to specify the size of output columns and each has its advantages/disadvantages. I'll give a summary elsewhere.

For crashing the server, would there be a way to use some sort of sandbox or pre-validation? It would be good to check the function when it's being registered so that a user cannot crash the server.

Sure, it would be desired but technically it is not trivial. For instance, a pre-validation requires generating sample inputs to table functions which means if a table function defines restrictions on arguments, the samples must obey these as well. And even then, one can likely construct a function that can be made to crash the server on specific inputs while on samples the function execute work well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working heavydb Related to heavydb server
Projects
None yet
Development

No branches or pull requests

2 participants