Skip to content

Commit b90bd66

Browse files
committed
Merge branch 'main' into Fix_collector_doc
2 parents 63d05e4 + 0b11dc5 commit b90bd66

30 files changed

+444
-88
lines changed

README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -82,13 +82,10 @@ This table demonstrates the supported Python version of `Qlib`:
8282
2. For Python 3.9, `Qlib` supports running workflows such as training models, doing backtest and plot most of the related figures (those included in [notebook](examples/workflow_by_code.ipynb)). However, plotting for the *model performance* is not supported for now and we will fix this when the dependent packages are upgraded in the future.
8383

8484
### Install with pip
85-
**Note**: Due to latest numpy release: version 1.20.0, unexpected errors will occur if you install or run Qlib with `numpy==1.20.0`. We recommend to use lower version of `numpy==1.19.5` for now and we will fix this incompatibility in the neaar future.
86-
8785
Users can easily install ``Qlib`` by pip according to the following command.
8886

8987
```bash
90-
pip install numpy==1.19.5
91-
pip install pyqlib --ignore-installed numpy
88+
pip install pyqlib
9289
```
9390

9491
**Note**: pip will install the latest stable qlib. However, the main branch of qlib is in active development. If you want to test the latest scripts or functions in the main branch. Please install qlib with the methods below.
@@ -121,7 +118,12 @@ Also, users can install the latest dev version ``Qlib`` by the source code accor
121118
## Data Preparation
122119
Load and prepare data by running the following code:
123120
```bash
121+
# get 1d data
124122
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn
123+
124+
# get 1min data
125+
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data_1min --region cn --interval 1min
126+
125127
```
126128
127129
This dataset is created by public data collected by [crawler scripts](scripts/data_collector/), which have been released in

docs/advanced/serial.rst

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
.. _serial:
2+
3+
=================================
4+
Serialization
5+
=================================
6+
.. currentmodule:: qlib
7+
8+
Introduction
9+
===================
10+
``Qlib`` supports dumping the state of ``DataHandler``, ``DataSet``, ``Processor`` and ``Model``, etc. into a disk and reloading them.
11+
12+
Serializable Class
13+
========================
14+
15+
``Qlib`` provides a base class ``qlib.utils.serial.Serializable``, whose state can be dumped into or loaded from disk in `pickle` format.
16+
When users dump the state of a ``Serializable`` instance, the attributes of the instance whose name **does not** start with `_` will be saved on the disk.
17+
18+
Example
19+
==========================
20+
``Qlib``'s serializable class includes ``DataHandler``, ``DataSet``, ``Processor`` and ``Model``, etc., which are subclass of ``qlib.utils.serial.Serializable``.
21+
Specifically, ``qlib.data.dataset.DatasetH`` is one of them. Users can serialize ``DatasetH`` as follows.
22+
23+
.. code-block:: Python
24+
25+
##=============dump dataset=============
26+
dataset.to_pickle(path="dataset.pkl") # dataset is an instance of qlib.data.dataset.DatasetH
27+
28+
##=============reload dataset=============
29+
with open("dataset.pkl", "rb") as file_dataset:
30+
dataset = pickle.load(file_dataset)
31+
32+
.. note::
33+
Only state of ``DatasetH`` should be saved on the disk, such as some `mean` and `variance` used for data normalization, etc.
34+
35+
After reloading the ``DatasetH``, users need to reinitialize it. It means that users can reset some states of ``DatasetH`` or ``QlibDataHandler`` such as `instruments`, `start_time`, `end_time` and `segments`, etc., and generate new data according to the states (data is not state and should not be saved on the disk).
36+
37+
A more detailed example is in this `link <https://github.com/microsoft/qlib/tree/main/examples/highfreq>`_.
38+
39+
40+
API
41+
===================
42+
Please refer to `Serializable API <../reference/api.html#module-qlib.utils.serial.Serializable>`_.

docs/component/data.rst

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Qlib Format Data
3131
We've specially designed a data structure to manage financial data, please refer to the `File storage design section in Qlib paper <https://arxiv.org/abs/2009.11189>`_ for detailed information.
3232
Such data will be stored with filename suffix `.bin` (We'll call them `.bin` file, `.bin` format, or qlib format). `.bin` file is designed for scientific computing on finance data.
3333

34-
``Qlib`` provides two different off-the-shelf dataset, which can be accessed through this `link <https://github.com/microsoft/qlib/blob/main/qlib/contrib/data/handler.py>`_:
34+
``Qlib`` provides two different off-the-shelf datasets, which can be accessed through this `link <https://github.com/microsoft/qlib/blob/main/qlib/contrib/data/handler.py>`_:
3535

3636
======================== ================= ================
3737
Dataset US Market China Market
@@ -41,15 +41,20 @@ Alpha360 √ √
4141
Alpha158 √ √
4242
======================== ================= ================
4343

44+
Also, ``Qlib`` provides a high-frequency dataset. Users can run a high-frequency dataset example through this `link <https://github.com/microsoft/qlib/tree/main/examples/highfreq>`_.
4445

4546
Qlib Format Dataset
4647
--------------------
4748
``Qlib`` has provided an off-the-shelf dataset in `.bin` format, users could use the script ``scripts/get_data.py`` to download the China-Stock dataset as follows.
4849

4950
.. code-block:: bash
5051
52+
# download 1d
5153
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn
5254
55+
# download 1min
56+
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/qlib_cn_1min --region cn --interval 1min
57+
5358
In addition to China-Stock data, ``Qlib`` also includes a US-Stock dataset, which can be downloaded with the following command:
5459

5560
.. code-block:: bash
@@ -167,7 +172,7 @@ The `trade unit` defines the unit number of stocks can be used in a trade, and t
167172
168173
169174
- If users use ``Qlib`` in US-stock mode, US-stock data is required. ``Qlib`` also provides a script to download US-stock data. Users can use ``Qlib`` in US-stock mode according to the following steps:
170-
- Download china-stock in qlib format, please refer to section `Qlib Format Dataset <#qlib-format-dataset>`_.
175+
- Download us-stock in qlib format, please refer to section `Qlib Format Dataset <#qlib-format-dataset>`_.
171176
- Initialize ``Qlib`` in US-stock mode
172177
Supposed that users prepare their Qlib format data in the directory ``~/.qlib/csv_data/us_data``. Users only need to initialize ``Qlib`` as follows.
173178

docs/component/recorder.rst

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,52 @@ The ``RecordTemp`` class is a class that enables generate experiment results suc
9494

9595
- ``SignalRecord``: This class generates the `prediction` results of the model.
9696
- ``SigAnaRecord``: This class generates the `IC`, `ICIR`, `Rank IC` and `Rank ICIR` of the model.
97+
98+
Here is a simple example of what is done in ``SigAnaRecord``, which users can refer to if they want to calculate IC, Rank IC, Long-Short Return with their own prediction and label.
99+
100+
.. code-block:: Python
101+
102+
from qlib.contrib.eva.alpha import calc_ic, calc_long_short_return
103+
104+
ic, ric = calc_ic(pred.iloc[:, 0], label.iloc[:, 0])
105+
long_short_r, long_avg_r = calc_long_short_return(pred.iloc[:, 0], label.iloc[:, 0])
106+
97107
- ``PortAnaRecord``: This class generates the results of `backtest`. The detailed information about `backtest` as well as the available `strategy`, users can refer to `Strategy <../component/strategy.html>`_ and `Backtest <../component/backtest.html>`_.
98108

109+
Here is a simple exampke of what is done in ``PortAnaRecord``, which users can refer to if they want to do backtest based on their own prediction and label.
110+
111+
.. code-block:: Python
112+
113+
from qlib.contrib.strategy.strategy import TopkDropoutStrategy
114+
from qlib.contrib.evaluate import (
115+
backtest as normal_backtest,
116+
risk_analysis,
117+
)
118+
119+
# backtest
120+
STRATEGY_CONFIG = {
121+
"topk": 50,
122+
"n_drop": 5,
123+
}
124+
BACKTEST_CONFIG = {
125+
"verbose": False,
126+
"limit_threshold": 0.095,
127+
"account": 100000000,
128+
"benchmark": BENCHMARK,
129+
"deal_price": "close",
130+
"open_cost": 0.0005,
131+
"close_cost": 0.0015,
132+
"min_cost": 5,
133+
}
134+
135+
strategy = TopkDropoutStrategy(**STRATEGY_CONFIG)
136+
report_normal, positions_normal = normal_backtest(pred_score, strategy=strategy, **BACKTEST_CONFIG)
137+
138+
# analysis
139+
analysis = dict()
140+
analysis["excess_return_without_cost"] = risk_analysis(report_normal["return"] - report_normal["bench"])
141+
analysis["excess_return_with_cost"] = risk_analysis(report_normal["return"] - report_normal["bench"] - report_normal["cost"])
142+
analysis_df = pd.concat(analysis) # type: pd.DataFrame
143+
print(analysis_df)
144+
99145
For more information about the APIs, please refer to `Record Template API <../reference/api.html#module-qlib.workflow.record_temp>`_.

docs/component/workflow.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -90,12 +90,12 @@ Below is a typical config file of ``qrun``.
9090
test: [2017-01-01, 2020-08-01]
9191
record:
9292
- class: SignalRecord
93-
module_path: qlib.workflow.record_temp
94-
kwargs: {}
93+
module_path: qlib.workflow.record_temp
94+
kwargs: {}
9595
- class: PortAnaRecord
96-
module_path: qlib.workflow.record_temp
97-
kwargs:
98-
config: *port_analysis_config
96+
module_path: qlib.workflow.record_temp
97+
kwargs:
98+
config: *port_analysis_config
9999
100100
After saving the config into `configuration.yaml`, users could start the workflow and test their ideas with a single command below.
101101

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ Document Structure
4949

5050
Building Formulaic Alphas <advanced/alpha.rst>
5151
Online & Offline mode <advanced/server.rst>
52+
Serialization <advanced/serial.rst>
5253

5354
.. toctree::
5455
:maxdepth: 3

docs/reference/api.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,4 +152,14 @@ Recorder
152152
Record Template
153153
--------------------
154154
.. automodule:: qlib.workflow.record_temp
155+
:members:
156+
157+
158+
Utils
159+
====================
160+
161+
Serializable
162+
--------------------
163+
164+
.. automodule:: qlib.utils.serial.Serializable
155165
:members:

examples/highfreq/README.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# High-Frequency Dataset
2+
3+
This dataset is an example for RL high frequency trading.
4+
5+
## Get High-Frequency Data
6+
7+
Get high-frequency data by running the following command:
8+
```bash
9+
python workflow.py get_data
10+
```
11+
12+
## Dump & Reload & Reinitialize the Dataset
13+
14+
15+
The High-Frequency Dataset is implemented as `qlib.data.dataset.DatasetH` in the `workflow.py`. `DatatsetH` is the subclass of [`qlib.utils.serial.Serializable`](https://qlib.readthedocs.io/en/latest/advanced/serial.html), whose state can be dumped in or loaded from disk in `pickle` format.
16+
17+
### About Reinitialization
18+
19+
After reloading `Dataset` from disk, `Qlib` also support reinitializing the dataset. It means that users can reset some states of `Dataset` or `DataHandler` such as `instruments`, `start_time`, `end_time` and `segments`, etc., and generate new data according to the states.
20+
21+
The example is given in `workflow.py`, users can run the code as follows.
22+
23+
### Run the Code
24+
25+
Run the example by running the following command:
26+
```bash
27+
python workflow.py dump_and_load_dataset
28+
```

examples/highfreq/__init__.py

Whitespace-only changes.

examples/highfreq/highfreq_handler.py

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -62,9 +62,9 @@ def get_feature_config(self):
6262
def get_normalized_price_feature(price_field, shift=0):
6363
"""Get normalized price feature ops"""
6464
if shift == 0:
65-
template_norm = "{0}/Ref(DayLast({1}), 240)"
65+
template_norm = "Cut({0}/Ref(DayLast({1}), 240), 240, None)"
6666
else:
67-
template_norm = "Ref({0}, " + str(shift) + ")/Ref(DayLast({1}), 240)"
67+
template_norm = "Cut(Ref({0}, " + str(shift) + ")/Ref(DayLast({1}), 240), 240, None)"
6868

6969
feature_ops = template_norm.format(
7070
template_if.format(
@@ -90,7 +90,7 @@ def get_normalized_price_feature(price_field, shift=0):
9090
names += ["$open_1", "$high_1", "$low_1", "$close_1", "$vwap_1"]
9191

9292
fields += [
93-
"{0}/Ref(DayLast(Mean({0}, 7200)), 240)".format(
93+
"Cut({0}/Ref(DayLast(Mean({0}, 7200)), 240), 240, None)".format(
9494
"If(IsNull({0}), 0, If(Or(Gt({1}, Mul(1.001, {3})), Lt({1}, Mul(0.999, {2}))), 0, {0}))".format(
9595
template_paused.format("$volume"),
9696
template_paused.format(simpson_vwap),
@@ -101,7 +101,7 @@ def get_normalized_price_feature(price_field, shift=0):
101101
]
102102
names += ["$volume"]
103103
fields += [
104-
"Ref({0}, 240)/Ref(DayLast(Mean({0}, 7200)), 240)".format(
104+
"Cut(Ref({0}, 240)/Ref(DayLast(Mean({0}, 7200)), 240), 240, None)".format(
105105
"If(IsNull({0}), 0, If(Or(Gt({1}, Mul(1.001, {3})), Lt({1}, Mul(0.999, {2}))), 0, {0}))".format(
106106
template_paused.format("$volume"),
107107
template_paused.format(simpson_vwap),
@@ -112,7 +112,7 @@ def get_normalized_price_feature(price_field, shift=0):
112112
]
113113
names += ["$volume_1"]
114114

115-
fields += [template_paused.format("Date($close)")]
115+
fields += ["Cut({0}, 240, None)".format(template_paused.format("Date($close)"))]
116116
names += ["date"]
117117
return fields, names
118118

@@ -149,18 +149,20 @@ def get_feature_config(self):
149149
# Because there is no vwap field in the yahoo data, a method similar to Simpson integration is used to approximate vwap
150150
simpson_vwap = "($open + 2*$high + 2*$low + $close)/6"
151151
fields += [
152-
template_fillnan.format(template_paused.format("$close")),
152+
"Cut({0}, 240, None)".format(template_fillnan.format(template_paused.format("$close"))),
153153
]
154154
names += ["$close0"]
155155
fields += [
156-
template_if.format(
157-
template_fillnan.format(template_paused.format("$close")),
158-
template_paused.format(simpson_vwap),
156+
"Cut({0}, 240, None)".format(
157+
template_if.format(
158+
template_fillnan.format(template_paused.format("$close")),
159+
template_paused.format(simpson_vwap),
160+
)
159161
)
160162
]
161163
names += ["$vwap0"]
162164
fields += [
163-
"If(IsNull({0}), 0, If(Or(Gt({1}, Mul(1.001, {3})), Lt({1}, Mul(0.999, {2}))), 0, {0}))".format(
165+
"Cut(If(IsNull({0}), 0, If(Or(Gt({1}, Mul(1.001, {3})), Lt({1}, Mul(0.999, {2}))), 0, {0})), 240, None)".format(
164166
template_paused.format("$volume"),
165167
template_paused.format(simpson_vwap),
166168
template_paused.format("$low"),

0 commit comments

Comments
 (0)