Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seg Fault when parsing data with size greater than ~ 1.2 Go #5

Open
brbatv opened this issue Jun 28, 2021 · 0 comments
Open

Seg Fault when parsing data with size greater than ~ 1.2 Go #5

brbatv opened this issue Jun 28, 2021 · 0 comments

Comments

@brbatv
Copy link

brbatv commented Jun 28, 2021

Hello
The parser works nicely, until parsing too large data. This is not a RAM problem, because i've got plenty of it (~ 64 Go).
I'm parsing data like this :

SELECT {fields} FROM {table} WHERE "T">= {t0} AND "T" <= {tf} ORDER BY "T" ASC
with fields being BIG INT, FLOAT, FLOAT

i'm using the following schema

test_schema = Schema("table", [
    num('T', int=True) # ,
    #num('p'),
    # num('q')
])

if i'm adding 'p' and 'q', the parser still works, but when inspecting the dataframe, the code produces a seg fault shortly after the data gets bigger than ~ 1.2 Go, so if I'm parsing only T, tf_ms-t0_ms can be bigger than if I'm parsing T, p and q

I don't think this is a data corruption problem because i'm able to reduce or shift [t0_ms ; tf_ms ] to parse the data (I could parse approximately 1Go of data at a time for example and do it in several shots)

When running gdb, i find the segfault being here :

start copy expert 1624877358.4578984
sql query copied to store 1624877401.6580575
start BINARY read
--Type <RET> for more, q to quit, c to continue without paging--

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffd2361a4e in __pyx_f_11psql_binary_read_reversed2 (__pyx_v_pos=<synthetic pointer>, __pyx_v_src=0x7ffed63a8030 "PGCOPY\n\377\r\n", __pyx_v_target=0x7fffffffd1c2 "\001") at psql_binary.c:3705
3705        __pyx_f_11psql_binary_read_reversed2(((char *)(&__pyx_v_column_count)), (&(*((char *) ( /* dim=0 */ (__pyx_v_f.data + __pyx_t_15 * __pyx_v_f.strides[0]) )))), (&__pyx_v_pos));

also, when compiling psql_binary.pyx, i get a warning here, not sure if it's related :

warning: psql_binary.pyx:289:47: Buffer unpacking not optimized away.
warning: psql_binary.pyx:289:47: Buffer unpacking not optimized away.

Please share your thoughts if you have an idea about the issue, this is such a great tool.

Thank you for your attention

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant