Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the parsing of fktables to pineko #17

Closed
wants to merge 5 commits into from

Conversation

scarlehoff
Copy link
Member

This is basically what it would be required to have vp depend on pineko.

Cons:
- It is a bit of a complication for something that is only depending on vp (that is, the format of the dataframe)

Pros:
- Gives us a bit more freedom on how to do anything since the only limitation is that at the end we have to have a dataframe as defined in pineappl_to_fktable. How do we get there is up to us. Vp doesn't need to know how the initial state were define, or what the luminosity means or anything.

@alecandido
Copy link
Member

Just to tell it: I'm not a fan of linearizing dimensions.

We have the luck of to have a simple and powerful framework at hand, that is able to deal with N-dimensional tensors in a fast and expressive way.

FkTables are genuinely 4 dimensional objects (bin, lumi, x1, x2), so we should use 4 dimensional data structures, not try to fit it into a 2 dimensional one.

If you want labels, concatenation and other excel-like swag, giving up on part of the simplicity, than you should consider to use xarray, still based on numpy, but at least it doesn't have the tradeoff on the rank.

@scarlehoff
Copy link
Member Author

scarlehoff commented Mar 30, 2022

And you haven't seen the terrible crap in the other side to get the pandas df and recover exactly the 4rank tensor that I had at the beginning :__

Sadly vp uses the df everywhere (most notably, for the old tables). I can move the tensor to df part to the vp parser so that this side is clean. I think that makes sense.

@alecandido
Copy link
Member

😞

Copy link
Member

@alecandido alecandido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First batch, I still have to review the main API pineappl_to_fktable

src/pineko/parser.py Show resolved Hide resolved
src/pineko/parser.py Outdated Show resolved Hide resolved
src/pineko/parser.py Outdated Show resolved Hide resolved
src/pineko/parser.py Outdated Show resolved Hide resolved
src/pineko/parser.py Outdated Show resolved Hide resolved
src/pineko/parser.py Outdated Show resolved Hide resolved
src/pineko/parser.py Outdated Show resolved Hide resolved
@scarlehoff
Copy link
Member Author

I still have to review the main API pineappl_to_fktable

Wait for that one though, I'm going to remove the df part and move it back to vp.

@scarlehoff
Copy link
Member Author

I will run through the yaml db to standarize how the apfelcomb flags are added since many were added manually anyway. That way I can fix most issues at once without adding checks for it in here.

@scarlehoff
Copy link
Member Author

apfelcomb:
    normalization:
        ATLAS_TTB_8TEV_2L_TTRAP: [2.0]
    repetition_flag:
        CMSTTBARTOT8TEV-TOPDIFF8TEVTOT: True

So that instead of relying on things being by chance in the same order or whatever, the behavior is explicit for each subfktable. This is for ATLAS_TOPDIFF_DILEPT_8TEV_TTRAPNORM for instance.

If you guys agree I'll go ahead. In this case for instance only the ATLAS_TTB_8TEV_2L_TTRAP fktable needs a factor and only the CMSTTBARTOT8TEV-TOPDIFF8TEVTOT needs to set the repetition flag.

@felixhekhorn
Copy link
Contributor

In this case for instance only the ATLAS_TTB_8TEV_2L_TTRAP fktable needs a factor and only the CMSTTBARTOT8TEV-TOPDIFF8TEVTOT needs to set the repetition flag

If things decouple, perfect!

@scarlehoff
Copy link
Member Author

scarlehoff commented Mar 30, 2022

Meh, there's no good solution for this.

The apfelcomb flags basically set up the dataframe for vp so either everything is in vp or everything is in here.
I would prefer to have the apfelcomb flags here so that they actually get removed, if they get into the vp side they will be abused forever just like apfelcomb was.

@alecandido
Copy link
Member

The apfelcomb flags basically set up the dataframe for vp so either everything is in vp or everything is in here. I would prefer to have the apfelcomb flags here so that they actually get removed, if they get into the vp side they will be abused forever just like apfelcomb was.

Fine, then I'd immediately open a separate issue for apfelcomb flags total disruption: as soon as we completed theory uncertainties and we have new commondata, we are stable enough and we drop backward compatibility (the last faints of it).

@felixhekhorn
Copy link
Contributor

I'm still no big fan of this:

  • it adds highly non-trivial functionality beyond the scope of the package: the unique scope in my eyes is the production of FK tables
  • this should instead belong to a CommonDataReader (as also said here: https://github.com/NNPDF/nnpdf/blob/646dd01c2b9e60ebf1ac98cf52b4ed5c6efa6f62/validphys2/src/validphys/pineparser.py#L16) - which most likely is the long-term solution
  • for the short term I suggest to add a condensed, 5-line version of if: read the file and concatenate the content of operands
    • Pros: quick and easy
    • Cons: slight code duplication, though only until CDR gets settled
  • the other unfavoured option is to give up on datasets:
    • Pros: no additional dependency what so ever
    • Cons: you have to know FK table member names, which seems unreasonable to users (but not to scripts): ATLAS_WP_JET_8TEV_PT-atlas-atlas-wjets-arxiv-1711.03296-xsec002
  • the least favoured option is to depend on vp:
    • Pros: well ...
    • Cons: infinite (truly - since I'd like to get rid of non-trivial dependencies)

@scarlehoff
Copy link
Member Author

I'm currently reducing the amount of stuff here but

it adds highly non-trivial functionality beyond the scope of the package: the unique scope in my eyes is the production of FK tables

Indeed

this should instead belong to a CommonDataReader

That's what will happen (but for now there is no such thing)

The (my) idea is that only the apfelcomb flags will remain here. I am very interested on this being the case because that way nobody can rely on them in vp (since they are coming from some external package).

@felixhekhorn
Copy link
Contributor

The (my) idea is that only the apfelcomb flags will remain here. I am very interested on this being the case because that way nobody can rely on them in vp (since they are coming from some external package).

for sure we can intermediate stuff here, since everything is still under heavy development - but I believe the correct place for this should be NNPDF/pinefarm#3 as I consider this under correct grid generation and not FK generation

@scarlehoff
Copy link
Member Author

But this is not only for converted grids, these flags sometimes are necessary because of a mismatch between cfactors / cuts etc and what pineappl grids have.

tbh, it can be anywhere but it needs to be something vp can import and be somewhere the people writing vp cannot touch.

@felixhekhorn
Copy link
Contributor

But this is not only for converted grids, these flags sometimes are necessary because of a mismatch between cfactors / cuts etc and what pineappl grids have.

Okay I take your point - still it should belong to pinecardrunner: as we agreed the pinecards can and in general will be NNPDF dependent (hence dependent on cfactors and what ever) so we can even adjust existing pinecards

@felixhekhorn felixhekhorn marked this pull request as draft July 12, 2022 12:25
@felixhekhorn felixhekhorn added the refactor Refactor code label Jul 12, 2022
@felixhekhorn felixhekhorn deleted the pineko_does_the_parsing branch December 14, 2022 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Refactor code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants