Skip to content

Commit

Permalink
Merge branch 'main' into update-hypothesis-key
Browse files Browse the repository at this point in the history
  • Loading branch information
you-n-g committed Aug 2, 2024
2 parents f7e9c94 + a5c96a4 commit de6fe61
Show file tree
Hide file tree
Showing 10 changed files with 114 additions and 28 deletions.
29 changes: 17 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ We believe that the automatic evolution of R&D will lead to solutions of signifi
<!-- Tag Cloud -->
R&D is a very general scenario. The advent of RDAgent can be your
- [🎥Automatic Quant Factory]()
- Data mining agent: iteratively proposing [🎥data]() & [models]() and implementing them by gaining knowledge from data.
- Research copilot: Auto read [🎥research papers]()/[🎥reports]() and implement model structures or building datasets.
- 🤖Data mining agent: iteratively proposing [🎥data]() & [models]() and implementing them by gaining knowledge from data.
- 🦾Research copilot: Auto read [🎥research papers]()/[🎥reports]() and implement model structures or building datasets.
- ...

You can click the [🎥link]() above to view the demo. More methods and scenarios are being added to the project to empower your R&D processes and boost productivity.
Expand Down Expand Up @@ -107,9 +107,9 @@ Here is our supported scenarios

| Scenario/Target | Model Implementation | Data Building |
| -- | -- | -- |
| 💹 Finance | Iteratively Proposing Ideas & Evolving | - Auto reports reading & implementation <br/> - Iteratively Proposing Ideas & Evolving |
| 🩺 Medical | Iteratively Proposing Ideas & Evolving | - |
| 🏭 General | Auto paper reading & implementation | - |
| 💹 Finance | 🤖Iteratively Proposing Ideas & Evolving | - 🦾Auto reports reading & implementation <br/> - 🤖Iteratively Proposing Ideas & Evolving |
| 🩺 Medical | 🤖Iteratively Proposing Ideas & Evolving | - |
| 🏭 General | 🦾Auto paper reading & implementation | - |

Different scenarios vary in entrance and configuration. Please check the detailed setup tutorial in the scenarios documents.

Expand All @@ -118,7 +118,8 @@ TODO: Scenario Gallary

# ⚙️Framework

![image](https://github.com/user-attachments/assets/c622704c-377a-4361-b956-c1eb9cf6a736)
![image](https://github.com/user-attachments/assets/98fce923-77ab-4982-93c8-a7a01aece766)


Automating the R&D process in data science is a highly valuable yet underexplored area in industry. We propose a framework to push the boundaries of this important research field.

Expand All @@ -135,13 +136,15 @@ We believe that the key to delivering high-quality solutions lies in the ability
# 📃Paper/Work list

## Benchmark
- TODO: adding link;
- [Towards Data-Centric Automatic R&D](https://arxiv.org/abs/2404.11276);
```BibTeX
@article{chen2024rd2bench,
title={RD2Bench: Toward Data-Centric Automatic R\&D},
author={Chen, Haotian and Shen, Xinjie and Ye, Zeqi and Yang, Xiao and Yang, Xu and Liu, Weiqing and Bian, Jiang},
journal={arXiv preprint arXiv:2404.11276},
year={2024}
@misc{chen2024datacentric,
title={Towards Data-Centric Automatic R&D},
author={Haotian Chen and Xinjie Shen and Zeqi Ye and Wenjun Feng and Haoxue Wang and Xiao Yang and Xu Yang and Weiqing Liu and Jiang Bian},
year={2024},
eprint={2404.11276},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
```
![image](https://github.com/user-attachments/assets/494f55d3-de9e-4e73-ba3d-a787e8f9e841)
Expand Down Expand Up @@ -176,6 +179,8 @@ You can find issues in the issues list or simply running `grep -r "TODO:"`.

Making contributions is not a hard thing. Solving an issue(maybe just answering a question raised in issues list ), fixing/issuing a bug, improving the documents and even fixing a typo are important contributions to RDAgent.

# Disclaimer
**The RD-agent is provided “as is”, without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. The RD-agent is aimed to facilitate research and development process in the financial industry and not ready-to-use for any financial investment or advice. Users shall independently assess and test the risks of the RD-agent in a specific use scenario, ensure the responsible use of AI technology, including but not limited to developing and integrating risk mitigation measures, and comply with all applicable laws and regulations in all applicable jurisdictions. The RD-agent does not provide financial opinions or reflect the opinions of Microsoft, nor is it designed to replace the role of qualified financial professionals in formulating, assessing, and approving finance products. The inputs and outputs of the RD-agent belong to the users and users shall assume all liability under any theory of liability, whether in contract, torts, regulatory, negligence, products liability, or otherwise, associated with use of the RD-agent and any inputs and outputs thereof.**

<img src="https://img.shields.io/github/contributors-anon/microsoft/RD-Agent"/>

Expand Down
16 changes: 12 additions & 4 deletions docs/project_framework_introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,19 @@ Framework Design & Components
Framework & Components
=========================

- TODO: Components & Feature Level
![image](https://github.com/user-attachments/assets/c622704c-377a-4361-b956-c1eb9cf6a736)
.. NOTE: This depends on the correctness of `c-v` of github.
.. image:: https://github.com/user-attachments/assets/98fce923-77ab-4982-93c8-a7a01aece766
:alt: Components & Feature Level

The image above shows the overall framework of RDAgent.


.. image:: https://github.com/user-attachments/assets/60cc2712-c32a-4492-a137-8aec59cdc66e
:alt: Class Level Figure

For those interested in the detailed code, the figure above illustrates the main classes and aligns them with the workflow.

- Class Level Figure
![image](https://github.com/user-attachments/assets/60cc2712-c32a-4492-a137-8aec59cdc66e)

Detailed Design
=========================
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -213,10 +213,11 @@ def __init__(


class FactorGraphRAGStrategy(RAGStrategy):
prompt = Prompts(file_path=Path(__file__).parent.parent / "prompts.yaml")

def __init__(self, knowledgebase: FactorGraphKnowledgeBase) -> None:
super().__init__(knowledgebase)
self.current_generated_trace_count = 0
self.prompt = Prompts(file_path=Path(__file__).parent.parent / "prompts.yaml")

def generate_knowledge(
self,
Expand Down
6 changes: 4 additions & 2 deletions rdagent/components/coder/model_coder/CoSTEER/evaluators.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,10 @@ def value_evaluator(
prediction: torch.Tensor,
target: torch.Tensor,
) -> Tuple[torch.Tensor, bool]:
if target is None or prediction is None:
return "No output generated from the model. No value evaluation conducted.", False
if prediction is None:
return "No output generated from the model. Skip value evaluation", False
elif target is None:
return "No ground truth output provided. Value evaluation not impractical", False
else:
# Calculate the mean absolute difference
diff = torch.mean(torch.abs(target - prediction)).item()
Expand Down
12 changes: 10 additions & 2 deletions rdagent/core/proposal.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,20 @@ class Hypothesis:
- Belief
"""

def __init__(self, hypothesis: str, reason: str, concise_reason: str, concise_observation: str, concise_justification: str, concise_knowledge: str) -> None:
def __init__(
self,
hypothesis: str,
reason: str,
concise_reason: str,
concise_observation: str,
concise_justification: str,
concise_knowledge: str,
) -> None:
self.hypothesis: str = hypothesis
self.reason: str = reason
self.concise_reason: str = concise_reason
self.concise_observation: str = concise_observation
self.concise_justification: str = concise_justification
self.concise_justification: str = concise_justification
self.concise_knowledge: str = concise_knowledge

def __str__(self) -> str:
Expand Down
19 changes: 16 additions & 3 deletions rdagent/core/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
import importlib
import json
import multiprocessing as mp
import pickle
from collections.abc import Callable
from typing import Any, ClassVar, cast
from typing import Any, ClassVar, NoReturn, cast

from fuzzywuzzy import fuzz # type: ignore[import-untyped]

Expand All @@ -27,13 +28,25 @@ def __new__(cls, *args: Any, **kwargs: Any) -> Any:
# TODO: this restriction can be solved.
exception_message = "Please only use kwargs in Singleton to avoid misunderstanding."
raise RDAgentException(exception_message)
all_args = [(-1, f"{cls.__module__}.{cls.__name__}")] + [(i, args[i]) for i in args] + list(sorted(kwargs.items()))
class_name = [(-1, f"{cls.__module__}.{cls.__name__}")]
args_l = [(i, args[i]) for i in args]
kwargs_l = list(sorted(kwargs.items()))
all_args = class_name + args_l + kwargs_l
kwargs_hash = hash(tuple(all_args))
if kwargs_hash not in cls._instance_dict:
cls._instance_dict[kwargs_hash] = super().__new__(cls) # Corrected call
cls._instance_dict[kwargs_hash].__init__(**kwargs) # Ensure __init__ is called
return cls._instance_dict[kwargs_hash]

def __reduce__(self) -> NoReturn:
"""
NOTE:
When loading an object from a pickle, the __new__ method does not receive the `kwargs`
it was initialized with. This makes it difficult to retrieve the correct singleton object.
Therefore, we have made it unpickable.
"""
msg = f"Instances of {self.__class__.__name__} cannot be pickled"
raise pickle.PicklingError(msg)


def parse_json(response: str) -> Any:
try:
Expand Down
20 changes: 18 additions & 2 deletions rdagent/scenarios/data_mining/experiment/model_template/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,21 +75,37 @@ def collate_fn(batch):
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
criterion = nn.CrossEntropyLoss()


# Train the model
def eval_auc(model):
y_pred = []
for data in test_dataloader:
x, y = data
out = model(x)
y_pred.append(out.cpu().detach().numpy())
return roc_auc_score(y_test, np.concatenate(y_pred))


best = 0.0
best_model = None

for i in range(10):
for i in range(15):
for data in train_dataloader:
x, y = data
out = model(x)
optimizer.zero_grad()
loss = criterion(out.squeeze(), y)
loss.backward()
optimizer.step()
roc = eval_auc(model)
if roc > best:
best = roc
best_model = model

y_pred = []
for data in test_dataloader:
x, y = data
out = model(x)
out = best_model(x)
y_pred.append(out.cpu().detach().numpy())

acc = roc_auc_score(y_test, np.concatenate(y_pred))
Expand Down
6 changes: 5 additions & 1 deletion rdagent/scenarios/data_mining/experiment/prompts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,11 @@ dm_model_interface: |-
```
No other parameters will be passed to the model so give other parameters a default value or just make them static.
Remember to permute the input tensor since the input tensor is in the shape of (batch_size, num_features, num_timesteps) and a normal time series model is expecting the input tensor in the shape of (batch_size, num_timesteps, num_features).
The input tensor shape is (batch_size, num_features, num_timesteps) which is different from the normal time series input shape of (batch_size, num_timesteps, num_features). Please write code accordingly.
Note that for nn.Conv1d() layers, please do not permute the input tensor as the in_channel dimension should match the num_feature dimension.
The output shape should be (batch_size, 1) with sigmoid activation since we have binary labels.
Don't write any try-except block in your python code. The user will catch the exception message and provide the feedback to you. Also, don't write main function in your python code. The user will call the forward method in the model_cls to get the output tensor.
Expand Down
2 changes: 1 addition & 1 deletion rdagent/scenarios/qlib/prompts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ hypothesis_and_feedback: |-
hypothesis_output_format: |-
The output should follow JSON format. The schema is as follows:
{
"hypothesis": "The new hypothesis generated based on the information provided.", Note that this should focus on model architecture, not training process or feature engineering or anything else
"hypothesis": "The new hypothesis generated based on the information provided.", # Note that this should focus on model architecture, not training process or feature engineering or anything else
"reason": "The reason why you generate this hypothesis. It should be comprehensive and logical. It should cover the other keys below and extend them.",
"concise_reason": Two line summary. First line focuses on the a concise justification for the change. 2nd Line learns from first line and previous experiences (hypothesis & experiments & code & feedbacks) to generalise a knowledge statement (use tend to/because/if/generally/etc. ). This is the summary of the three keys below.
"concise_observation": One line summary. It focuses on the observation of the given scenario, data characteristics, or previous experiences (failures & succeses).
Expand Down
29 changes: 29 additions & 0 deletions test/utils/test_misc.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,29 @@

class A(SingletonBaseClass):
def __init__(self, **kwargs):
print(self, "__init__", kwargs) # make sure the __init__ is called only once.
self.kwargs = kwargs

def __str__(self) -> str:
return f"{self.__class__.__name__}.{getattr(self, 'kwargs', None)}"

def __repr__(self) -> str:
return self.__str__()


class MiscTest(unittest.TestCase):
def test_singleton(self):
print("a1=================")
a1 = A()
print("a2=================")
a2 = A()
print("a3=================")
a3 = A(x=3)
print("a4=================")
a4 = A(x=2)
print("a5=================")
a5 = A(b=3)
print("a6=================")
a6 = A(x=3)

# Check that a1 and a2 are the same instance
Expand All @@ -37,6 +50,22 @@ def test_singleton(self):

print(id(a1), id(a2), id(a3), id(a4), id(a5), id(a6))

print("...................... Start testing pickle ......................")

# Test pickle
import pickle

with self.assertRaises(pickle.PicklingError):
with open("a3.pkl", "wb") as f:
pickle.dump(a3, f)
# NOTE: If the pickle feature is not disabled,
# loading a3.pkl will return a1, and a1 will be updated with a3's attributes.
# print(a1.kwargs)
# with open("a3.pkl", "rb") as f:
# a3_pkl = pickle.load(f)
# print(id(a3), id(a3_pkl)) # not the same object
# print(a1.kwargs) # a1 will be changed.


if __name__ == "__main__":
unittest.main()

0 comments on commit de6fe61

Please sign in to comment.