Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: all the input array dimensions except for the concatenation axis must match exactly #6

Open
Daiiszuki opened this issue Apr 20, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@Daiiszuki
Copy link

I'm trying to download the training data using 0_dl_trainval_data.py. It looks like the data downloaded successfully but the pre-processing causes the error. Specifically price_array = np.hstack([price_array, df[df.tic == tic][['close']].values]) causes the error.

  1. What are some potential causes
  2. What are some potential solutions?
  3. Wouldn't it be better to isolate the data downloading and cleaning?
@Daiiszuki
Copy link
Author

Daiiszuki commented Apr 21, 2023

At first I suspected that the fact that I was requesting for data from 2017, hence the different tickers had different no. records. I tried to use a more recent range (2020-01-01) and now Im getting

 main()
  File "/content/drive/MyDrive/DE-FI DE-GEN/FinRL_Crypto-master/0_dl_trainval_data.py", line 97, in main
    data_from_processor, price_array, tech_array, time_array = process_data()
  File "/content/drive/MyDrive/DE-FI DE-GEN/FinRL_Crypto-master/0_dl_trainval_data.py", line 52, in process_data
    data_from_processor, price_array, tech_array, time_array = DataProcessor.run(
  File "/content/drive/MyDrive/DE-FI DE-GEN/FinRL_Crypto-master/processor_Binance.py", line 81, in run
    price_array, tech_array, time_array = self.df_to_array(data, if_vix)
  File "/content/drive/MyDrive/DE-FI DE-GEN/FinRL_Crypto-master/processor_Binance.py", line 186, in df_to_array
    price_array = np.hstack([price_array, df[df.tic == tic][['close']].values])
  File "<__array_function__ internals>", line 200, in hstack
  File "/usr/local/lib/python3.9/dist-packages/numpy/core/shape_base.py", line 370, in hstack
    return _nx.concatenate(arrs, 1, dtype=dtype, casting=casting)
  File "<__array_function__ internals>", line 200, in concatenate
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 1359835 and the array at index 1 has size 1359834

@Daiiszuki
Copy link
Author

configured as

# General Training Settings
#######################################################################################################
#######################################################################################################

trade_start_date = '2023-02-01 00:00:00'
trade_end_date = '2023-04-19 00:00:00'

SEED_CFG = 2390408
TIMEFRAME = '1m'
H_TRIALS = 50
KCV_groups = 5
K_TEST_GROUPS = 2
NUM_PATHS = 4
N_GROUPS = NUM_PATHS + 1
NUMBER_OF_SPLITS = nCr(N_GROUPS, N_GROUPS - K_TEST_GROUPS)

print(NUMBER_OF_SPLITS)

no_candles_for_train = 1111200
no_candles_for_val = 250000

@Daiiszuki
Copy link
Author

I've tried using the default configuration and it works with no issue, so I guess it's the no_candles_for_train that's causing this

What are the valid date ranges per timeframe for the date?

I assumed it was January 2020 if using binance?

@YangletLiu YangletLiu added the bug Something isn't working label Apr 22, 2023
@Daiiszuki
Copy link
Author

So after trying different values for my number training candles, I found that around 300,000 was the max and was able to download with that. But now I get the following error

ValueError: No trials are completed yet.

Full trace:


10
TRAIN_START_DATE:  2021-11-23 23:20:00
VAL_END_DATE:  2023-01-31 23:59:00

Starting CPCV optimization with:
drl algorithm:        ppo
name_test:            model
gpu_id:               0 


##### Launched hyperparameter optimization with CPCV  #####

TIMEFRAME                   1m
TRAIN SAMPLES               500000
TRIALS NO.                  50
N                           5
K test groups               2
SPLITS                      10


TRAIN SAMPLES               500000
VAL_SAMPLES                 125000
TRAIN_START_DATE            2021-11-23 23:20:00
TRAIN_END_DATE              2022-11-06 04:39:00
VAL_START_DATE              2022-11-06 04:40:00
VAL_END_DATE                2023-01-31 23:59:00 

TICKER LIST                 ['XRPUSDT', 'BTCUSDT', 'ETHUSDT', 'BNBUSDT', 'HBARUSDT', 'UNIUSDT'] 

/usr/local/lib/python3.9/dist-packages/optuna/samplers/_tpe/sampler.py:282: ExperimentalWarning: ``multivariate`` option is an experimental feature. The interface can change in the future.
  warnings.warn(
[I 2023-04-22 07:46:37,255] A new study created in memory with name: no-name-2f2030fe-f672-4585-95e6-bc15be720c90

LOADING DATA FOLDER:  ./data/1m_625000 

No. Train Samples: 374994 

| Arguments Remove cwd: ./train_results/cwd_tests/model_CPCV_ppo_1m_50H_625k
################################################################################
ID     Step    maxR |    avgR   stdR   avgS  stdS |    expR   objC   etc.
[W 2023-04-22 07:46:53,455] Trial 0 failed with parameters: {'learning_rate': 0.03, 'batch_size': 512, 'gamma': 0.99, 'net_dimension': 1024, 'target_step': 937500, 'eval_time_gap': 60, 'break_step': 45000.0, 'lookback': 1, 'norm_cash': 0.000244140625, 'norm_stocks': 0.00390625, 'norm_tech': 3.0517578125e-05, 'norm_reward': 0.0009765625, 'norm_action': 10000} because of the following error: ValueError('operands could not be broadcast together with shapes (6,) (10,) ').
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/content/drive/MyDrive/DE-FI DE-GEN/FinRL_Crypto-master/1_optimize_cpcv.py", line 328, in obj_with_argument
    return objective(trial, name_test, model_name, cwd, res_timestamp, gpu_id)
  File "/content/drive/MyDrive/DE-FI DE-GEN/FinRL_Crypto-master/1_optimize_cpcv.py", line 269, in objective
    sharpe_bot, sharpe_eqw, drl_rets_tmp = train_and_test(trial, price_array, tech_array, train_indices,
  File "/content/drive/MyDrive/DE-FI DE-GEN/FinRL_Crypto-master/function_train_test.py", line 33, in train_and_test
    train_agent(price_array,
  File "/content/drive/MyDrive/DE-FI DE-GEN/FinRL_Crypto-master/function_train_test.py", line 72, in train_agent
    agent.train_model(model=model,
  File "/content/drive/MyDrive/DE-FI DE-GEN/FinRL_Crypto-master/drl_agents/elegantrl_models.py", line 86, in train_model
    train_and_evaluate(model)
  File "/content/drive/MyDrive/DE-FI DE-GEN/FinRL_Crypto-master/train/run.py", line 40, in train_and_evaluate
    trajectory = agent.explore_env(env, target_step)
  File "/content/drive/MyDrive/DE-FI DE-GEN/FinRL_Crypto-master/drl_agents/agents/AgentPPO.py", line 74, in explore_one_env
    next_s, reward, done, _ = env.step(get_a_to_e(ten_a)[0].numpy())
  File "/content/drive/MyDrive/DE-FI DE-GEN/FinRL_Crypto-master/environment_Alpaca.py", line 113, in step
    for index in np.where(actions < -self.minimum_qty_alpaca)[0]:
ValueError: operands could not be broadcast together with shapes (6,) (10,) 
[W 2023-04-22 07:46:54,924] Trial 0 failed with value None.
Traceback (most recent call last):
  File "/content/drive/MyDrive/DE-FI DE-GEN/FinRL_Crypto-master/1_optimize_cpcv.py", line 360, in <module>
    optimize(name_test, name_model, gpu_id)
  File "/content/drive/MyDrive/DE-FI DE-GEN/FinRL_Crypto-master/1_optimize_cpcv.py", line 341, in optimize
    study.optimize(
  File "/usr/local/lib/python3.9/dist-packages/optuna/study/study.py", line 425, in optimize
    _optimize(
  File "/usr/local/lib/python3.9/dist-packages/optuna/study/_optimize.py", line 66, in _optimize
    _optimize_sequential(
  File "/usr/local/lib/python3.9/dist-packages/optuna/study/_optimize.py", line 174, in _optimize_sequential
    callback(study, frozen_trial)
  File "/content/drive/MyDrive/DE-FI DE-GEN/FinRL_Crypto-master/1_optimize_cpcv.py", line 98, in save_best_agent
    if study.best_trial.number != trial.number:
  File "/usr/local/lib/python3.9/dist-packages/optuna/study/study.py", line 159, in best_trial
    return copy.deepcopy(self._storage.get_best_trial(self._study_id))
  File "/usr/local/lib/python3.9/dist-packages/optuna/storages/_in_memory.py", line 250, in get_best_trial
    raise ValueError("No trials are completed yet.")
ValueError: No trials are completed yet.

Any suggestions?

@Daiiszuki
Copy link
Author

Daiiszuki commented Apr 22, 2023

New evidence suggests that the issue might be with my number of ticker symbols, I will try to download, as in the example configuration, 10 tickers instead of 6. Fingers crossed

Please, I think any amount of your input would be a major help

@Daiiszuki
Copy link
Author

Definitely the number of tickers, but I still thank the issue needs attention so I'll leave it open

@mehdicauche
Copy link

The error you are encountering, ValueError: all the input array dimensions for the concatenation axis must match exactly, is occurring because the arrays you are trying to concatenate using np.hstack() have different sizes along dimension 0 (i.e., different numbers of rows).

In this context, this could mean that different tickers have a different number of data points (candles) in the DataFrame, causing the mismatch during concatenation.
Solution:

You might want to ensure that each ticker has the same number of data points before attempting to concatenate. Here's how you could modify the df_to_array method to handle this.

@mehdicauche
Copy link

I'd like to update the code but i am not very familiar to contributing to open source projects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants