Skip to content

Seg Fault when parsing data with size greater than ~ 1.2 Go #5

Open
@brbatv

Description

@brbatv

Hello
The parser works nicely, until parsing too large data. This is not a RAM problem, because i've got plenty of it (~ 64 Go).
I'm parsing data like this :

SELECT {fields} FROM {table} WHERE "T">= {t0} AND "T" <= {tf} ORDER BY "T" ASC
with fields being BIG INT, FLOAT, FLOAT

i'm using the following schema

test_schema = Schema("table", [
    num('T', int=True) # ,
    #num('p'),
    # num('q')
])

if i'm adding 'p' and 'q', the parser still works, but when inspecting the dataframe, the code produces a seg fault shortly after the data gets bigger than ~ 1.2 Go, so if I'm parsing only T, tf_ms-t0_ms can be bigger than if I'm parsing T, p and q

I don't think this is a data corruption problem because i'm able to reduce or shift [t0_ms ; tf_ms ] to parse the data (I could parse approximately 1Go of data at a time for example and do it in several shots)

When running gdb, i find the segfault being here :

start copy expert 1624877358.4578984
sql query copied to store 1624877401.6580575
start BINARY read
--Type <RET> for more, q to quit, c to continue without paging--

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffd2361a4e in __pyx_f_11psql_binary_read_reversed2 (__pyx_v_pos=<synthetic pointer>, __pyx_v_src=0x7ffed63a8030 "PGCOPY\n\377\r\n", __pyx_v_target=0x7fffffffd1c2 "\001") at psql_binary.c:3705
3705        __pyx_f_11psql_binary_read_reversed2(((char *)(&__pyx_v_column_count)), (&(*((char *) ( /* dim=0 */ (__pyx_v_f.data + __pyx_t_15 * __pyx_v_f.strides[0]) )))), (&__pyx_v_pos));

also, when compiling psql_binary.pyx, i get a warning here, not sure if it's related :

warning: psql_binary.pyx:289:47: Buffer unpacking not optimized away.
warning: psql_binary.pyx:289:47: Buffer unpacking not optimized away.

Please share your thoughts if you have an idea about the issue, this is such a great tool.

Thank you for your attention

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions