-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory optimization for large datasets #241
Comments
@JHHatfield Nice to see you are still swimming these waters. It would be good to make a record of where else you see these changes needed to aid future work to overhaul and use data.table functions. Also would you like to make a pull request with the changes you have already made? |
I have submitted my quick fix for formatOccData which deals with the size limit faced by the reshape2 version of dcast. The issue is that data.table requires data tables instead of data frames. The syntax differences mean a lot of changes would be needed for a full overhaul. I got around it here by using setDT then setDF to go from frame to table and back. I suppose the question is if the memory usage is a big enough problem to warrant such changes. |
The alternative option would be to use Something like:
rather than I'm not sure the arranges are strictly necessary, but this way the outputs are identical |
Sounds good, I will have a look. The quick fix for formatOccData works pretty well but when I started to have a look at occDetFunc its not really going to work. I will have a look at memory usage switching occDetFunc over to tidyverse functions. Although what is the plan for the function going forward if you are bringing NIMBLE in? |
I think most of the code in |
Some of the functions have relatively high memory requirements (e.g. formatOccData and OccDetFunc). From what I can see these are mostly caused by the cast and merge steps. I have replaced some of the reshape2 functions in formatOccData with data.table ones. This seems to reduce the memory requirement and work as a small fix but looks complex to do in a comprehensive way.
The text was updated successfully, but these errors were encountered: