Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The process has been stuck at the retrieval phase for about an hour. Is this normal? #116

Open
Wuyuhang11 opened this issue Sep 9, 2024 · 2 comments

Comments

@Wuyuhang11
Copy link

(AI_Scientist) root@intern-studio-50102651:~/AI-Scientist# python launch_scientist.py --model "gpt-4o-2024-05-13" --experiment nanoGPT --num-ideas 1
Using GPUs: [0]
Using OpenAI API with model gpt-4o-2024-05-13.

Generating idea 1/1
Iteration 1/3
{'Name': 'mixture_of_experts', 'Title': 'Mixture of Experts in Transformers: Efficiently Scaling Model Capacity', 'Experiment': 'Integrate a mixture of experts (MoE) mechanism into the transformer blocks. Modify the Block class to include multiple experts and a gating network that selects which experts to use for each input. Compare the performance, training speed, and generalization capabilities of the MoE-GPT model with the baseline GPT model on the provided datasets.', 'Interestingness': 8, 'Feasibility': 5, 'Novelty': 7}
Iteration 2/3
{'Name': 'simplified_moe', 'Title': 'Simplified Mixture of Experts in Transformers: Efficient Dynamic Computation', 'Experiment': 'Modify the Block class to include multiple expert sub-layers within each transformer block. Implement a gating mechanism that selects one of these sub-layers for each input dynamically. Compare the performance, training speed, and generalization capabilities of the simplified MoE-GPT model with the baseline GPT model on the provided datasets.', 'Interestingness': 7, 'Feasibility': 6, 'Novelty': 6}
Iteration 3/3
{'Name': 'simplified_moe', 'Title': 'Simplified Mixture of Experts in Transformers: Efficient Dynamic Computation', 'Experiment': 'Modify the Block class to include multiple expert sub-layers within each transformer block. Implement a gating mechanism that selects one of these sub-layers for each input dynamically. Compare the performance, training speed, and generalization capabilities of the simplified MoE-GPT model with the baseline GPT model on the provided datasets.', 'Interestingness': 7, 'Feasibility': 6, 'Novelty': 6}
Idea generation converged after 3 iterations.

Checking novelty of idea 0: adaptive_block_size
Response Status Code: 200
Response Content: {"total": 6626, "offset": 0, "next": 10, "data": [{"paperId": "d4b99821ab8c1ee3271a72dc4163feb8d310c8a0", "title": "DBPS: Dynamic Block Size and Precision Scaling for Efficient DNN Training Supported by RISC-V ISA Extensions", "abstract": "Over the past decade, it has been found that deep neural networks (DNNs) perform better on visual perception and language understanding tasks as their size increases. However, this comes at the cost of high energy consumption and large memory requirement to tr
Response Status Code: 200
Response Content: {"total": 7531, "offset": 0, "next": 10, "data": [{"paperId": "5b2c04e082a56c0eb70ed62bc36148919f665e1c", "title": "SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention", "abstract": "Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additiona
Response Status Code: 200
Response Content: {"total": 204, "offset": 0, "next": 10, "data": [{"paperId": "eb9f044682d43f072a15f21822570024b31a7590", "title": "Dynamic Context Adaptation and Information Flow Control in Transformers: Introducing the Evaluator Adjuster Unit and Gated Residual Connections", "abstract": "Transformers have revolutionized various domains of artificial intelligence due to their unique ability to model long-range dependencies in data. However, they lack in nuanced, context-dependent modulation of features and info
Response Status Code: 200
Response Content: {"total": 350, "offset": 0, "next": 10, "data": [{"paperId": "76ad063a928deb97752de17256fd92b63515d4fc", "title": "Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation", "abstract": "Unsupervised domain adaptation (UDA) and domain generalization (DG) enable machine learning models trained on a source domain to perform well on unlabeled or even unseen target domains. As previous UDA&DG semantic segmentation methods are mostly based on out
Response Status Code: 200
Response Content: {"total": 787, "offset": 0, "next": 10, "data": [{"paperId": "de94361c09fa37567acb7c6674f1094828c61f19", "title": "A sustainable Bitcoin blockchain network through introducing dynamic block size adjustment using predictive analytics", "abstract": null, "venue": "Future generations computer systems", "year": 2023, "citationCount": 3, "citationStyles": {"bibtex": "@Article{Monem2023ASB,\n author = {Maruf Monem and Md Tamjid Hossain and Md. Golam Rabiul Alam and M. S. Munir and Md. Mahbubur Rahman

@conglu1997
Copy link
Collaborator

conglu1997 commented Sep 10, 2024

I am guessing you do not have a SS key or were timed out or lost connection? You are welcome to execute ideas without checking for novelty!

@leeJing77
Copy link

So where should I comment out the code? I get a lot of errors when I comment it out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants