You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -162,7 +162,7 @@ We recommend users to prepare their own data if they have a high-quality dataset
162
162
### Automatic update of daily frequency data(from yahoo finance)
163
163
> It is recommended that users update the data manually once (--trading_date 2021-05-25) and then set it to update automatically.
164
164
165
-
> For more information refer to: [yahoo collector](https://github.com/microsoft/qlib/tree/main/scripts/data_collector/yahoo#Automatic-update-of-daily-frequency-data)
165
+
> For more information refer to: [yahoo collector](https://github.com/microsoft/qlib/tree/main/scripts/data_collector/yahoo#automatic-update-of-daily-frequency-datafrom-yahoo-finance)
166
166
167
167
* Automatic update of data to the "qlib" directory each trading day(Linux)
Copy file name to clipboardExpand all lines: examples/benchmarks/README.md
+4
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,10 @@ Here are the results of each benchmark model running on Qlib's `Alpha360` and `A
4
4
5
5
The numbers shown below demonstrate the performance of the entire `workflow` of each model. We will update the `workflow` as well as models in the near future for better results.
6
6
7
+
> If you need to reproduce the results below, please use the **v1** dataset: `python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/qlib_cn_1d --region cn --version v1`
8
+
>
9
+
> In the new version of qlib, the default dataset is **v2**. Since the data is collected from the YahooFinance API (which is not very stable), the results of *v2* and *v1* may differ
10
+
7
11
## Alpha360 dataset
8
12
| Model Name | Dataset | IC | ICIR | Rank IC | Rank ICIR | Annualized Return | Information Ratio | Max Drawdown |
> `qlib-data` from *YahooFinance*, is the data that has been dumped and can be used directly in `qlib`
31
+
32
+
- get data: `python scripts/get_data.py qlib_data`
33
+
- parameters:
34
+
-`target_dir`: save dir, by default *~/.qlib/qlib_data/cn_data*
35
+
-`version`: dataset version, value from [`v1`, `v2`], by default `v1`
36
+
-`v2` end date is *2021-06*, `v1` end date is *2020-09*
37
+
- user can append data to `v2`: [automatic update of daily frequency data](#automatic-update-of-daily-frequency-datafrom-yahoo-finance)
38
+
-**the [benchmarks](https://github.com/microsoft/qlib/tree/main/examples/benchmarks) for qlib use `v1`**, *due to the unstable access to historical data by YahooFinance, there are some differences between `v2` and `v1`*
39
+
-`interval`: `1d` or `1min`, by default `1d`
40
+
-`region`: `cn` or `us`, by default `cn`
41
+
-`delete_old`: delete existing data from `target_dir`(*features, calendars, instruments, dataset_cache, features_cache*), value from [`True`, `False`], by default `True`
42
+
-`exists_skip`: traget_dir data already exists, skip `get_data`, value from [`True`, `False`], by default `False`
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/qlib_us_1d --region us --interval 1d
51
+
# us 1min
52
+
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/qlib_us_1min --region us --interval 1min
53
+
```
54
+
55
+
### Collector *YahooFinance* data to qlib
56
+
> collector *YahooFinance* data and *dump* into `qlib` format
57
+
1. download data to csv: `python scripts/data_collector/yahoo/collector.py download_data`
58
+
59
+
- parameters:
60
+
- `source_dir`: save the directory
61
+
- `interval`: `1d` or `1min`, by default `1d`
62
+
>**due to the limitation of the *YahooFinance API*, only the last month's data is available in `1min`**
63
+
- `region`: `CN` or `US`, by default `CN`
64
+
- `delay`: `time.sleep(delay)`, by default *0.5*
65
+
- `start`: start datetime, by default *"2000-01-01"*; *closed interval(including start)*
66
+
- `end`: end datetime, by default `pd.Timestamp(datetime.datetime.now() + pd.Timedelta(days=1))`; *open interval(excluding end)*
67
+
- `max_workers`: get the number of concurrent symbols, it is not recommended to change this parameter in order to maintain the integrity of the symbol data, by default *1*
68
+
- `check_data_length`: check the number of rows per *symbol*, by default `None`
69
+
> if `len(symbol_df) < check_data_length`, it will be re-fetched, with the number of re-fetches coming from the `max_collector_count` parameter
70
+
- `max_collector_count`: number of *"failed"* symbol retries, by default 2
"If normalize 1min, the qlib_data_1d_dir parameter must be set: --qlib_data_1d_dir <user qlib 1d data >, Reference: "
851
+
"If normalize 1min, the qlib_data_1d_dir parameter must be set: --qlib_data_1d_dir <user qlib 1d data >, Reference: https://github.com/zhupr/qlib/tree/support_extend_data/scripts/data_collector/yahoo#automatic-update-of-daily-frequency-datafrom-yahoo-finance"
0 commit comments