The process has been stuck at the retrieval phase for about an hour. Is this normal? #116

Wuyuhang11 · 2024-09-09T07:53:24Z

(AI_Scientist) root@intern-studio-50102651:~/AI-Scientist# python launch_scientist.py --model "gpt-4o-2024-05-13" --experiment nanoGPT --num-ideas 1
Using GPUs: [0]
Using OpenAI API with model gpt-4o-2024-05-13.

Generating idea 1/1
Iteration 1/3
{'Name': 'mixture_of_experts', 'Title': 'Mixture of Experts in Transformers: Efficiently Scaling Model Capacity', 'Experiment': 'Integrate a mixture of experts (MoE) mechanism into the transformer blocks. Modify the Block class to include multiple experts and a gating network that selects which experts to use for each input. Compare the performance, training speed, and generalization capabilities of the MoE-GPT model with the baseline GPT model on the provided datasets.', 'Interestingness': 8, 'Feasibility': 5, 'Novelty': 7}
Iteration 2/3
{'Name': 'simplified_moe', 'Title': 'Simplified Mixture of Experts in Transformers: Efficient Dynamic Computation', 'Experiment': 'Modify the Block class to include multiple expert sub-layers within each transformer block. Implement a gating mechanism that selects one of these sub-layers for each input dynamically. Compare the performance, training speed, and generalization capabilities of the simplified MoE-GPT model with the baseline GPT model on the provided datasets.', 'Interestingness': 7, 'Feasibility': 6, 'Novelty': 6}
Iteration 3/3
{'Name': 'simplified_moe', 'Title': 'Simplified Mixture of Experts in Transformers: Efficient Dynamic Computation', 'Experiment': 'Modify the Block class to include multiple expert sub-layers within each transformer block. Implement a gating mechanism that selects one of these sub-layers for each input dynamically. Compare the performance, training speed, and generalization capabilities of the simplified MoE-GPT model with the baseline GPT model on the provided datasets.', 'Interestingness': 7, 'Feasibility': 6, 'Novelty': 6}
Idea generation converged after 3 iterations.

Checking novelty of idea 0: adaptive_block_size
Response Status Code: 200
Response Content: {"total": 6626, "offset": 0, "next": 10, "data": [{"paperId": "d4b99821ab8c1ee3271a72dc4163feb8d310c8a0", "title": "DBPS: Dynamic Block Size and Precision Scaling for Efficient DNN Training Supported by RISC-V ISA Extensions", "abstract": "Over the past decade, it has been found that deep neural networks (DNNs) perform better on visual perception and language understanding tasks as their size increases. However, this comes at the cost of high energy consumption and large memory requirement to tr
Response Status Code: 200
Response Content: {"total": 7531, "offset": 0, "next": 10, "data": [{"paperId": "5b2c04e082a56c0eb70ed62bc36148919f665e1c", "title": "SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention", "abstract": "Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additiona
Response Status Code: 200
Response Content: {"total": 204, "offset": 0, "next": 10, "data": [{"paperId": "eb9f044682d43f072a15f21822570024b31a7590", "title": "Dynamic Context Adaptation and Information Flow Control in Transformers: Introducing the Evaluator Adjuster Unit and Gated Residual Connections", "abstract": "Transformers have revolutionized various domains of artificial intelligence due to their unique ability to model long-range dependencies in data. However, they lack in nuanced, context-dependent modulation of features and info
Response Status Code: 200
Response Content: {"total": 350, "offset": 0, "next": 10, "data": [{"paperId": "76ad063a928deb97752de17256fd92b63515d4fc", "title": "Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation", "abstract": "Unsupervised domain adaptation (UDA) and domain generalization (DG) enable machine learning models trained on a source domain to perform well on unlabeled or even unseen target domains. As previous UDA&DG semantic segmentation methods are mostly based on out
Response Status Code: 200
Response Content: {"total": 787, "offset": 0, "next": 10, "data": [{"paperId": "de94361c09fa37567acb7c6674f1094828c61f19", "title": "A sustainable Bitcoin blockchain network through introducing dynamic block size adjustment using predictive analytics", "abstract": null, "venue": "Future generations computer systems", "year": 2023, "citationCount": 3, "citationStyles": {"bibtex": "@Article{Monem2023ASB,\n author = {Maruf Monem and Md Tamjid Hossain and Md. Golam Rabiul Alam and M. S. Munir and Md. Mahbubur Rahman

conglu1997 · 2024-09-10T01:02:17Z

I am guessing you do not have a SS key or were timed out or lost connection? You are welcome to execute ideas without checking for novelty!

leeJing77 · 2024-09-18T03:28:06Z

So where should I comment out the code? I get a lot of errors when I comment it out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The process has been stuck at the retrieval phase for about an hour. Is this normal? #116

The process has been stuck at the retrieval phase for about an hour. Is this normal? #116

Wuyuhang11 commented Sep 9, 2024

conglu1997 commented Sep 10, 2024 •

edited

Loading

leeJing77 commented Sep 18, 2024

The process has been stuck at the retrieval phase for about an hour. Is this normal? #116

The process has been stuck at the retrieval phase for about an hour. Is this normal? #116

Comments

Wuyuhang11 commented Sep 9, 2024

conglu1997 commented Sep 10, 2024 • edited Loading

leeJing77 commented Sep 18, 2024

conglu1997 commented Sep 10, 2024 •

edited

Loading