Aligning timezones using YFDate.download #573
Replies: 2 comments
-
No one? |
Beta Was this translation helpful? Give feedback.
-
The first solution I'd recommend is using your own higher quality data that's manually cleaned for everything. However, for yfinance with vectorbt the code I've provided below should work for 1d data. Something to be aware of for 1d Yahoo data is that it has a timestamp of midnight local to the exchange the index's components trade on, adjusted for daylight savings. If the data was aggregated properly it would be at the close of regular trading hours. This can create some serious inaccuracies depending on what your model or strategy is sensitive to. The code below creates the price dataframes individually, removes the time portion of the datetime index so that the symbols can be aligned, renames the columns to include the symbol name, and merges the dataframes: import pandas as pd
import vectorbt as vbt
yf_args = dict(start="2023-1-1", end="2023-03-31", interval="1d")
ohlcv_list = ["Open", "High", "Low", "Close", "Volume"]
symbols = ["^GSPC", "^GDAXI"]
df_list = []
for symbol in symbols:
df = vbt.YFData.download(symbols=symbol, **yf_args).get(ohlcv_list)
df.index = df.index.date
df.columns = [f'{symbol}_Open', f'{symbol}_High', f'{symbol}_Low', f'{symbol}_Close', f'{symbol}_Volume']
df_list.append(df)
data = pd.concat([df_list[0], df_list[1]], axis=1) If you only want close data or want different indexes then you can edit the lists. Now this issue likely comes from the fact that when you pull Yahoo data with vectorbt it tries to set the timezone to UTC by default. import vectorbt as vbt
spx = vbt.YFData.download(
symbols=["^GSPC"],
start="2023-4-5",
end="2023-4-6",
interval="1d"
).get()
dax = vbt.YFData.download(
symbols=["^GDAXI"],
start="2023-4-5",
end="2023-4-6",
interval="1d"
).get()
print(spx.index.tzinfo)
print(dax.index.tzinfo) I suspect that since the yfinance api is using the local timezone of the index's relevant exchange(s), and shifting with daylight savings accordingly, it's creating conflicts for converting both indexes datetimes at once. Namely since US daylight savings and Central European Summer begin and end on different calendar days. You can check that yfinance gives datetime values this way by running the requests below and comparing the datetime values: import yfinance as yf
dax_nodl = yf.download("^GDAXI", start="2023-03-23", end="2023-03-24", interval="1h")
print(dax_nodl["Open"].head())
dax_dl = yf.download("^GDAXI", start="2023-03-27", end="2023-03-28", interval="1h")
print(dax_dl["Open"].head())
spx_nodl = yf.download("^GSPC", start="2023-03-10", end="2023-03-11", interval="1h")
print(spx_nodl["Open"].head())
spx_dl = yf.download("^GSPC", start="2023-03-13", end="2023-03-14", interval="1h")
print(spx_dl["Open"].head()) Hope this helps 😁 |
Beta Was this translation helpful? Give feedback.
-
What am I supposed to do in order to align data that has different timezones, like S&P 500 and DAX? Let's say I wanted them to be converted to "UTC".
How do I have to apply the
tz_convert
argument?Beta Was this translation helpful? Give feedback.
All reactions