Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion of globaldatamatrix panel to Dataframe (panel is deprecated) #137

Open
aushaff opened this issue Jan 16, 2021 · 4 comments
Open

Comments

@aushaff
Copy link

aushaff commented Jan 16, 2021

Hi

As stated in the title I am in the process of converting the panel in the globaldatamatrix function to a multi-index dataframe but could do with some clarification/assistance (hopefully you are still watching this repo).

What I have done so far:

In get_global_panel:

L76:

     panel = pd.Panel(items=features, major_axis=coins, minor_axis=time_index, dtype=np.float32)

to

    new_panel = pd.DataFrame(
            index = pd.MultiIndex.from_product([coins, time_index]),
            columns = features
        )

L133:

      panel.loc[feature, coin, serial_data.index] = serial_data.squeeze()

to

      new_panel.loc[(coin, serial_data.index), feature] = serial_data.values

This works but is very very slow...

After this 'new_panel' is returned to datamatrices: 48

     self.__global_data = self.__history_manager.get_global_panel(start, self.__end, period=period, features = type_list)

L58:

    self.__PVM = pd.DataFrame(index=self.__global_data.minor_axis, columns=self.__global_data.major_axis)

to

    self.__PVM = self.__global_data

Is this equivalent? I'm not sure of the structure of the PVM; is it the same as the structure of 'new_panel'?

L64:

    self._num_periods = len(self.__global_data.minor_axis)

to

    self._num_periods = len(self.__global_data.index)

Is that corrrect? my assumption here is that we want the number of 30 minute periods not the number of rows?

When I run the program with 'mode=download_data' this goes but as mentioned it takes a long time.

In backtest mode:

I am confused by datamatrices::get_submatrix and datamatrices::__pack_samples... due to the change of data structure the indices are wrong (there are too many for a start).

Can you provide some guidance as to how to approach this please?

I appreciate that this is an old repo and maybe you aren't supporting it any more but any advice would be welcome! Thanks!

@aushaff
Copy link
Author

aushaff commented Jan 16, 2021

to update:

I have everything working now, on a different machine, with pandas version 0.24 so will be able to compare the data structures. Hopefully that will be enough.

@aushaff aushaff closed this as completed Jan 17, 2021
@aushaff aushaff reopened this Jan 17, 2021
@bjrnfrdnnd
Copy link

I am using xarray.DataArray for that. Relatively minor code changes.

@aushaff
Copy link
Author

aushaff commented Mar 15, 2021

Thanks. It's good to hear that someone has already done it.

If you have time could you be a bit more specific as to what you needed to change please?

@Nice-Zhang66
Copy link

I am using xarray.DataArray for that. Relatively minor code changes.

Hello, can you guide how to change the replace panel method with xarray.DataArray, I have been having problems after changing it. fei'chang'gan'xiThank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants