-
Notifications
You must be signed in to change notification settings - Fork 9
[feedback] Getting repos analysed #94
Comments
Hi @valeriocos, thanks for the feedback. We are well-aware of issues regarding performance and memory consumption for large projects with some Rascal metrics (mostly dependencies + API-related ones). I started to push some fixes that make the situation better (usethesource/rascal@78e0b60) and will continue in this direction. I'm hopeful we can make it a lot better, though memory consumption will unavoidably remain an issue: analyzing large projects will always consume a lot of resources. I'll keep you updated on our progress. |
Thank you @tdegueul for the quick reply! |
Thanks a lot @valeriocos for the feedback and the hints. I've also gathered some information in #338 (submitted in the scava repo [1]). Regarding performance I often have a huge load on the ci4 server (I've seen a load of 192 the other day, I didn't even know it could go that high! ;-) and the computations take a lot of time.. I've setup several workers and it works quite well, but the worker that's stuck (see the issue mentioned above) takes up a lot of resources -- and it can't be stopped through the UI. As a result the host (a 8-cpus and 64GB of RAM power-host) is always 100%. Good point: multi-threading works well. Since we have a lot of projects to analyse this will be problematic; we've decided to use a very short range (starting from beginning of 2018) for all projects and we'll probably use your metrics list. Any other hint is welcome.. |
As suggested by @phkrief , the idea of this issue is to share the experience when analysing projects with CROSSMINER, and trigger discussions to identify possible bugs/limitations on the platform. This issue is related to:
The default docker-compose may drain out the resources of a machine (even a powerful one). Thus, in order to get projects analysed, a solution (more details here) consists of:
Nevertheless, in my specific case, sometimes
oss-app
freezed (probably due to the limitation of my machine, something @MarcioMateus agreed on) and I had to delete theoss-db
container. Furthermore, also queuing new task analysis was causing the current task to stop. Thus, I was waiting for a task to finish before adding a new task. Then, I was importing the data to elasticsearch with the script available at: https://github.com/valeriocos/scava/blob/bit/web-dashboards/scava-metrics/scava2es_battery.py (which calls scava2es on a battery of repos).With what commented above, I was able to analyze all CHAOSS repos plus puppet-elasticsearch from 01/01/2019 to 30/06/2019 using the following metrics providers:
As suggested by @creat89 here, I had a quick look to the metrics. I noticed that the ones related to dev dependencies metric providers (e.g. osgi and maven) seem to eat up a considerable amount of memory. I checked it by selecting https://github.com/elastic/elasticsearch as target project (from 01/01/2019 to 30/06/2019), and looking at the memory consumption with
slimbook@slimbook-KATANA:~$ top
The text was updated successfully, but these errors were encountered: