Fixing ensemble #508

xisen-w · 2024-12-20T02:38:36Z

Description

Motivation and Context

How Has This Been Tested?

If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

Your own tests:

Types of changes

Fix bugs
Add new feature
Update documentation

📚 Documentation preview 📚: https://RDAgent--508.org.readthedocs.build/en/508/

* refine ds modal for more cases: eval and es * update model template * prompts for model and ensemble * fix a bug * fix a bug * init: ds workflow evovingstrategy * Adding ensemble (#505) * Initial Draft * Updating logic for init * Revising * Successful Testing * Updating to use the latest & right class * bug: bug-fixing for testing * data science loop changes * data science loop base * ds loop feedback * fix * remove measure_time because it's duplicated (in LoopBase) * add the knowledge query for data_loader & feature * edit ds workflow evaluator * data_loader bug fix * stop evolving when all tasks completed * llm app change * fix break all complete strategy * Adding queried knowledge (#508) Co-authored-by: XianBW <[email protected]> * fix loop bug * ds workflow evaluator; test; refine prompts * workflow spec * fix ci * feature task changes * ds loop change * fix a bug in feat * add query knowledge for model and workflow * llm_debug info(for show) using pickle instead of json * remove NextLoopException * loop change * coder raise CoderError when all sub_tasks failed * rename code_dict to file_dict in FBWorkspace * add CoSTEER unittest * now show self.version in Task.get_task_information(), simplify CoSTEER sub tasks definition * remove some properties in ModelTask, add model_type in it. * fix llm app bug * llm web app bug fix * ds loop bug fix * fix: give component code to feature&ens eval * loop catch error bug * rename load_from_raw_data to load_data * feat: Add debug data creation functionality for data science scenarios * support local folder (#511) * support local folder * remove unnecessary random * KaggleScen Subclass * small fix * use template for style description * update default scen to kaggle * update sample data script * make sure frac < 1 * fix a bug * feature spec changes * fix * changeimport order * clear unnecessary std outputs * fix a typo * create sample folder after unzip kaggle data * feature/model test script update * Align the data types across modules. * fix a bug in model eval * show line number * move sample entry point to app * spec & model prompt changes * Refine the competition specification to address the data type problem and the coherence issue. * fix some bugs * add file filter in FBworkspace.code property * support non-binary prediction * avoid too much warnings * fix a bug in ensemble module * filtered the knowledge query in all modules * delete RAG in idea proposal * refine the code in ensemble * show exp workspace in llm_st * exp_gen bug fix * feedback bug fix * use `feature` instead of `feat01` * Trace & method of judging if exp is completed change * fix a bug in package calling and execute ci * fix code * bug fix * bug fix * fix a bug * fix some bugs * fix a bug * refactor: Enhance error handling and feedback in data science loop * support different use_azure on chat and embedding models * multi-model proposal logic * fix a small syntax error * loopBase and some changes * ensemble scores change * fbworkspace.code -> .all_codes * use all model codes in workflow coder * check scores.csv's keys(model_names) * model name changes * add a todo in ensemble test * sota_exp changes * give model info in exp gen * add runner time limit * config using debug data or not in evals * exp to feedback base * add feature code when writing model task * small problem * copying during sampling * update * refactor: Simplify code handling and improve workspace management * model part output fix * print model's execution time * bug fix * ensemble test fix * ens small change * ens_test bug fix * Refine partial expansion logic to display only a few subfolders when their structure is uniform, improving readability in nested directories. * several update on prompts * sample subfolders * Filter the stdout after code execution to remove irrelevant information e.g. progress bars, whitespace characters, excessive line breaks. * Add some more prompts and comments * several update on the first init rounds * model timeout as error * fix pattern of getting model codes in workspace * small bux fix on model prompts * remove get_code_with_key since we have regex pattern * fix: Correct tqdm progress bar update logic in LoopBase class * feat: Add diff generation and enhance feedback mechanism in data science loop * update some fix to model and workflow prompts * refine the logic of progress bar filter * add last_successful_exp in exp_gen * fix a one line bug * add a hint in prompt * fix data sample for bms * fix data sample for bms * hypothesis small fix * crawler readme update * fix component gen * fix bug * annotation change * load description.md if it exists * refactor: Simplify SOTA description handling in feedback and prompts * refactor: Use shared templates for feedback and experiment descriptions * change webapp for model codes changes * update proposal * add timeout message for docker run output * fix * refine the code in docker time processing * use .shape instead of len() when do shape eval * won't change size during iteration * support bson sample * sample support jsonl and bson * add former_code to coder prompts * a little speed us in debug data creating * filter progress bar when eval ens and main * avoid costeer makes no change to former code * fix several log error * add timeout judge threshold * fix some bugs in the evaluation of component output shapes * File structure for supporting litellm (#517) Co-authored-by: Young <[email protected]> * ignore submission and show processing * ignore submission and show processing * add efficiency notice * refactor: Enhance error message with detailed feedback summary * refactor: Simplify component handling in DSExpGen class * refactor: Update code structure and add docstring for clarity * reserve one sample to each label in data sampling * add Evaluation info * refine costeer code to avoid giving same code twice * use raw_description as plain text * add a prompt hint to avoid same dict key * model task name bug in first model exp gen * fix a typo * add some debug info in costeer tests * task init change * enhance data sampling * refine the code in data_loader * more reasonable loop * fix a bug in data folder description * add error msg & traceback to execution feedback * fix llm error msg detection * add task information to costeer eval & add cache to docker run(use zipfile to store the whole workspace) * fix CI first round * fix CI second round * use txt to store test script to avoid pytest * remove zipfile in requirements * add azure.identity to requirements * ignore debug web page * component test changes * remove redundent task_desc in model coder * feat: Add APE module and prompts for automated prompt engineering * fix: Update .gitignore and improve text formatting in eval.py * refactor: Update print output and improve code comments and imports * style: Fix string formatting and import order in ape.py and fmt.py * exclude ape * add a data folder notice * reduce unnecessary output to stdout * refine the code of describe_data_folder * fix ci * style: streamlit style update (#522) * streamlit style update * fix import * fix format * fix llm_st loop progress bar * debugapp small change * fix model str * refine some prompts * fix model str * fix CI * refine the logic associated with the data_folder * fix ci * small change * set filter_progress_bar as default in execute * model proposal with workflow * add submission check in workflow eval * fix bug * small change * fix CI * fix CI * refactor: Move generate_diff to utils and update DSExpGen logic * more reasonable prompt describing metric direction * fix a minor jinja2 bug * quick fix exp_gen bugs * fix the following bug * fix * fix some bugs * remove workflow from model * add pending_tasks_list in data science to enable coding model and workflow * refine the code for handling JSON-formatted data descriptions * assert with information * ensure correct csv file name * add logging to help record the output * log competition * add log tag for debug llm app * test: Test ds refactor ll (#523) * fix bugs to former scenario * fix a bug because coding in rdloop changed * fix the bug when feedback gets no hypothesis * fix trace structure * change all trace hist when merging hypothesis to experiments * ignore some error in ruff * fix kaggle scenario bugs * refine one line * another bug * another small bug * fix ui bugs * chage kaggle train.py path --------- Co-authored-by: Xu Yang <[email protected]> * fix CI * Update rdagent/app/data_science/loop.py Co-authored-by: Copilot <[email protected]> * add samplecsv into spec prompts * fix CI --------- Co-authored-by: TPLin22 <[email protected]> Co-authored-by: yuanteli <[email protected]> Co-authored-by: Xisen Wang <[email protected]> Co-authored-by: Bowen Xian <[email protected]> Co-authored-by: Xu Yang <[email protected]> Co-authored-by: XianBW <[email protected]> Co-authored-by: Tim <[email protected]> Co-authored-by: 炼金术师华华 <[email protected]> Co-authored-by: Linlang <[email protected]> Co-authored-by: Copilot <[email protected]>

Adding queried knowledge

0de692d

xisen-w changed the base branch from main to ds_refactor December 20, 2024 02:38

Merge branch 'ds_refactor' into fixing-ensemble

4d3b4e8

XianBW merged commit 74a2829 into ds_refactor Dec 20, 2024
1 of 4 checks passed

XianBW deleted the fixing-ensemble branch December 20, 2024 05:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing ensemble #508

Fixing ensemble #508

xisen-w commented Dec 20, 2024 •

edited by github-actions bot

Loading

Fixing ensemble #508

Fixing ensemble #508

Conversation

xisen-w commented Dec 20, 2024 • edited by github-actions bot Loading

Description

Motivation and Context

How Has This Been Tested?

Screenshots of Test Results (if appropriate):

Types of changes

xisen-w commented Dec 20, 2024 •

edited by github-actions bot

Loading