You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Calling python functions from within query formulas (e.g. in update()) is extremely slow, whether they are DH-provided or user-provided functions. Add some benchmarks to capture this.
(This issue is becoming more relevant as customers are being pushed in the direction of Python more than Groovy.)
Notes On User-Defined Functions: (From JianFeng)
Has not been scheduled yet
The improvement on UDF could be a huge amount of work (very risky too), not really scoped yet, but the recent partial vectorization work has achieved 4 - 20 times performance gain depending on how much time a specific UDF takes
Translating a Python wrapper function into a Java static method call isn’t trivial either, since that basically eliminates the boundary crossing between the JVM and the Python interpreter. I think we can confidently say that the performance will be much much better
Translations are being added to convert Python-equivalent date time utils that are already benchmarked. No need to test all of them. Pick a much-used one and make a test for that.
Paul provided some examples between Python and Groovy. Benchmark does not use Groovy but that could be added in the future.
Python:
tb_out=db.historical_table("FeedOS", "EquityQuoteL1") \
.where("Date=`2023-05-31`") \
.update("Open_Size = (int)AskSize")
def Rounding(x):
out=100*int(x/100)+100*((x%100)>=50)
if out==0:
out=100
return out
else :
return out
open_1=tb_out.where("!isNull(Open_Size)").update(["Date","LocalCodeStr","Open_Size= (int)(Rounding.call(Open_Size))"])
Groovy:
tb_out=db.t("FeedOS", "EquityQuoteL1") \
.where("Date=`2023-05-31`") \
.update("Open_Size = (int)AskSize")
Rounding={ int x ->
out=100*(int)(x/100)+100*((x%100) >= 50 ? 1 : 0)
return out==0 ? 100 : out
}
open_1=tb_out.where("!isNull(Open_Size)").update("Date","LocalCodeStr","Open_Size= Rounding.call(Open_Size)")
An improvement to the Python snippet:
import numpy as np
tb_out=db.historical_table("FeedOS", "EquityQuoteL1") \
.where("Date=`2023-05-31`") \
.update("Open_Size = (long)AskSize")
def Rounding(x) -> np.int32:
out=100*int(x/100)+100*((x%100)>=50)
if out==0:
out=100
return out
else :
return out
open_1=tb_out.where("!isNull(Open_Size)").update(["Date","LocalCodeStr","Open_Size=Rounding(Open_Size)"])
The text was updated successfully, but these errors were encountered:
Calling python functions from within query formulas (e.g. in update()) is extremely slow, whether they are DH-provided or user-provided functions. Add some benchmarks to capture this.
(This issue is becoming more relevant as customers are being pushed in the direction of Python more than Groovy.)
Notes On User-Defined Functions: (From JianFeng)
Note from Chip:
Note From Ryan:
Paul provided some examples between Python and Groovy. Benchmark does not use Groovy but that could be added in the future.
Python:
Groovy:
An improvement to the Python snippet:
The text was updated successfully, but these errors were encountered: