Open
Description
I have two equivalent functions
def py_myparser(data: str) -> List[Sequence[str]]:
cols = ["colA","colB","colC","colD","colE"]
rows: List[Sequence[str]] = []
reading = False
for line in (ll for ll in data.splitlines() if ll and ll.strip()):
if not reading:
if [ll.strip() for ll in line.split("|")][0:-1] == cols:
reading = True
continue
else:
row: Sequence[str] = [w.strip() for w in line.split("|")][0:-1]
if len(row) == len(cols):
rows.append(row)
else:
reading = False
return rows
import std/[strutils]
import zero_functional
import nimpy
proc nim_myparser(data: string): seq[seq[string]] {.exportpy.} =
const cols = ["colA","colB","colC","colD","colE"]
var reading = false
data.splitLines --> filter(it.strip.len > 0) --> createIter(lines)
for line in lines():
if not reading:
if line.split('|') --> map(it.strip())[0..^2] == cols:
reading = true
continue
else:
let row = line.split('|') --> map(it.strip())[0..^2]
if len(row) == len(cols):
result.add(row)
else:
reading = false
I compile the python module with
--gc: "arc"
--d: "danger"
--app: "lib"
--passC: "-flto"
--passL: "-flto"
the final result is correct for both, but running %timeit -n 10
on quite big data gives surprising timings.
Seems like List[List[str]]]
conversion to pd.DataFrame
is zero-cost for python version but is very expensive for nim version
%timeit -n 10 py_myparser(data)
# 265 ms ± 4.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit -n 10 pd.DataFrame(py_myparser(data))
# 268 ms ± 15.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit -n 10 nim_myparser(data)
# 160 ms ± 7.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit -n 10 pd.DataFrame(nim_myparser(data))
# 306 ms ± 8.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
same timing happens if I do the pd.DataFrame
conversion via pyIpmport("pandas")
in the nim/nimpy side at the end of the function (and changing the return type)
Any idea what's happening here? I'm trying to replace python function with nim functions to speeup things, but I need pandas on the python side, and this behavior is confusing to me.
Metadata
Metadata
Assignees
Labels
No labels