Skip to content

2022 04 29 report

Artem Pelenitsyn edited this page Mar 1, 2023 · 1 revision

Report 2022 04 29

Hey Jan! Indeed, there are some developments that I haven’t reported. My main work was around looking into how much different packages are susceptible to our technique. As it’s always the case with “real code”, there are all sorts of surprises coming out of it. I can’t say there’s much interesting surprises though.

The most subtle issue was in the whole approach of processing several packages in batch: it turned out (in hindsight, not surprising at all) that loading all packages at once in one Julia session leads to search space blow out: many new types get added in the hierarchy, and packages that were processed okay individually became too slow because I have much more types to enumerate.

This is a curious challenge because on the one hand, it gives more ground to explore, and, therefore a reason to believe the result more. But on the other hand, it becomes way less practical (slower). So, for now I rearranged the code in a way to ensure that every package is tested in a pristine environment.

Another issue I’ve been dealing with is more minor, I’ve seen it only in one package so far. That is: one package `export`ed a method name that it never defines. This looked like a bug, and I even posted it on their bug tracker but haven’t heard from them yet. The implementation seemed to have been removed long time ago (according to GitHub), so it’s suspicious that they still have the name in the export declaration. The reason I posted it on their tracker is that I think there may be a subtle reason for leaving the export in place. E.g. they may want clients to define it?.. I’m not sure. So far, I managed to work around this.

One problem that is still pending is too large CSV files that my engine tries to produce. According to the messages from Julia’s CSV package, it’s ready to handle up to 4M rows by default, and I tried to store 9M. For things larger than the default I seemingly need to set some parameters manually.

I don’t have much numbers yet and not too much to analyze but there’s some. One package on the bigger side, called Flux, processed okay. There was a fair number of methods that we can’t process due to Any or Vararg, sadly.

Module Methods Stable Unstable Any Vararg Generic TcFail NoFuel
Flux 512 100 1 174 87 150 0 0

Generic is a sizeable chunk that I could treat better: right now I give up on it even if variables are bounded, but I could do the same as with exists in individual arguments. The reason I’ve not done it is that it doesn’t fit very well in the pipeline, unfortunately. But it’s possible to fix, for sure.

For comparison, what we had with the dynamic analysis in the paper is:

Package Methods Instances Inst/Meth Varargs Stable Grounded
Flux 381 4288 11.3 13% 76% 70%

So, depending on how you treat the numbers for special cases in the previous table, it’s either: looks similar or looks bad. In particular, if the static tool finds only 100/512 stable methods (~20%) — that looks low in comparison to 76% in the dynamic counterpart. But If ypu subtract all special cases (174 + 87 + 150 =~ 350) from 512, then it’s 100/150 (66%) which is closer to 76%.

I hope to get numbers for all ten packages and add support for generic methods in the coming week. I was kinda slow this week because we went through a big reorganization of the apartment (adding and rearranging furniture, moving stuff) to allow for some space for baby stuff.