-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The User can now register its determinant into trexio format #80
Conversation
Following discussion from TREX-CoE/trexio_tools#45 (comment) |
pyscf/tools/trexio.py
Outdated
@@ -260,8 +260,6 @@ def det_to_trexio(mcscf, norb, nelec, filename, backend='h5'): | |||
|
|||
det_list = [] | |||
for a, b, coeff in zip(occsa, occsb, ci_values): | |||
occsa_upshifted = [orb + 1 for orb in a] | |||
occsb_upshifted = [orb + 1 for orb in b] | |||
det_tmp = [] | |||
det_tmp += trexio_det.to_determinant_list(occsa_upshifted, int64_num) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NastaMauger you forgot to rename occsa_upshifted
to occsa
here (and same for occsb
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad ! It is done now
The code mixes 2-space and 4-space indention. Please use consistent indentions. @q-posev I'm not very familiar with TrexIO. I'd appreciate your feedback on this new feature. Please let me know if you think this code is ready for merging. Thanks! |
@q-posev I realized that the original pyscf-forge code (the part which I have not modified) do not register the ao shell and atomic normalization. |
pyscf/tools/trexio.py
Outdated
for i in range(min(len(selected_occslst), mcscf.ci.shape[0])): | ||
for j in range(min(len(selected_occslst), mcscf.ci.shape[1])): | ||
ci_coeff = mcscf.ci[i, j] | ||
if np.abs(ci_coeff) > 1e-8: # Check if CI coefficient is significant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this threshold for CI coefficients configurable as an argument to this function? I would trust the user to have the full control of the determinant filtering (we can keep your 1e-8 as default).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@q-posev I am not sure if setting a CI threshold is a good idea. For example, after a sCI run, my system produces the following (full) list of determinants with CI values using a threshold of 1e-8:
occsa occsb CI coeff
[0 1] [0 1] -0.99779049148461
[0 2] [0 2] 0.03340264667473
[0 2] [1 2] -0.03321881541135
[1 2] [0 2] -0.03321881541135
[1 2] [1 2] 0.03303662972152
Now, let’s say a user wants to keep only values higher than 0.1. In this case, the code would register only [0 1] [0 1] -0.9977904914846 and set the others to 0. However, this corresponds to truncating the wavefunctions, which is a different process, right? This would also imply that the one- and two-electron integrals would need to be renormalized to account for the new threshold.
I chose a threshold of 1e-8, as I believe it’s sufficient to filter out coefficients that are effectively zero in terms of bit precision.
Let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NastaMauger There is no problem in truncating the CI coefficients. It is a valid wave function that would potentially need to be renormalized, but trexio doesn't impose that the wave function is normalized.
This does not have any incidence on the integrals because they only depend on the Hamiltonian and not on the wave function. It would have an influence on the 1- and 2-body RDMs though.
If you truncate a wave function that is an eigenfunction of S^2 based on the determinant coefficients, the truncated wave function will not be necessarily be an eigenfunction of S^2 any more (it will be spin-contaminated), and some determinants may be removed of the determinant set. In that case, you will not be able to recover the S^2 eigenstate even when you rediagonalize H. Truncating the CSF coefficients keeps the S^2-eigenstate property for the truncated wavefunction.
I think it is a good idea to let the user enter a threshold as a parameter, with a default of zero (no threshold).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I decided to place the CI threshold argument right after the user-defined nelec
value for consistency. I thought it might be counterintuitive for the user to place this threshold after the trexio_file
argument.
Let me know your thoughts.
pyscf/tools/trexio.py
Outdated
|
||
|
||
def read_det_trexio(filename, backend='h5'): | ||
with trexio.File(filename, 'r', back_end=_mode(backend)) as tf: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When reading, you can safely use back_end=trexio.TREXIO_AUTO
. TREXIO_AUTO allows the library to automatically detect the backend for a given TREXIO file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set back_end=_mode(backend)
to maintain consistency with how it was previously written in the code. For example, the same back_end is called on line 95.
I can certainly change it to back_end=trexio.TREXIO_AUTO
as you advised, but I thought it might be better to stay consistent.
We could either keep it as it is or update all calls to use back_end=trexio.TREXIO_AUTO
. Let me know which option would be best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would advise to use that but only for reading TREXIO files. The reason is that when one wants to write/create TREXIO file - the I/O back end has to be provided explicitly, but for reading - our library is smart enough and can detect the I/O back end that was used to produce the TREXIO file. So using TREXIO_AUTO
allows you to stay compatible with whatever new back end we will decide to implement in the future.
If @sunqm is OK with that, I would recommend to modify also the line 95 accordingly.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this modification in all functions that use trexio.read
. It’s done in a separate commit so that if @sunqm disagree, I can easily roll it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NastaMauger Sorry, i should have precised that: you need to use back_end=trexio.TREXIO_AUTO in the calls to trexio.File with 'r' mode. And then you can remove the backend argument in the functions with *_from_trexio and read_eri. I am not sure your commit works as expected because _mode function doesn't handle the TREXIO_AUTO case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@q-posev My apologies—I am a bit confused about the backend. I have modified the code based on your last comment. Could you please check it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would advise to use that but only for reading TREXIO files. The reason is that when one wants to write/create TREXIO file - the I/O back end has to be provided explicitly, but for reading - our library is smart enough and can detect the I/O back end that was used to produce the TREXIO file. So using
TREXIO_AUTO
allows you to stay compatible with whatever new back end we will decide to implement in the future.If @sunqm is OK with that, I would recommend to modify also the line 95 accordingly.
Thanks!
sure. Please update the module as you like. I implemented the version based on the paper or the documents. Many places are likely sub-optimal.
with trexio.File(filename, 'r', back_end=_mode(backend)) as tf: | ||
offset_file = 0 | ||
|
||
num_det = trexio.read_determinant_num(tf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of a suggestion (I do not insist for it to be implemented here): If one day num_det
becomes huge - the user can run out of memory while reading the CI info. This is why we allow to read/write determinants in buffers (or chunks) of fixed size. It might be a good idea to define some relatively big buffer_size (e.g. 100k) and if num_det
is larger than that value - read the data in chunks. Example of writing in chunks (using offset_file
to advance the data pointers) can be found in trexio-tutorials.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Let me know if it meets your requirement.
@NastaMauger I left a few comments but overall the code looks great, thank you for your contribution! Does your code work for both selected CI and FCI objects of PySCF? @sunqm Sure, with pleasure! I left a few comments on this PR. I would like to also take this opportunity to give you kudos for all the hard work that you and the team do on PySCF - it's a great software! |
@sunqm if I may: we have been recently discussing a few issues reported by the TREXIO users. I can summarize them as follows:
|
|
Dear @sunqm
No, I do not think the format of the ANO basis set is compatible with the TREXIO format.
Here, the problem is that one exponent is associated with multiple contraction coefficients, Instead, for instance, the ccecp-cc-pv*Z basis set implemented in
because one exponent is associated with one contraction coefficient (i.e., no general contraction). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NastaMauger I left a few comments to facilitate the code modifications. Right now the CI tests fail because of a typo (trexio.File
class does not support backend
but we use back_end
instead).
Sorry for the trouble with the modifications, I would push the necessary changes myself to make things easier but this PR is from your fork and I cannot push there.
pyscf/tools/trexio.py
Outdated
@@ -91,9 +93,10 @@ def _cc_to_trexio(cc_obj, trexio_file): | |||
def _mcscf_to_trexio(cas_obj, trexio_file): | |||
raise NotImplementedError | |||
|
|||
|
|||
def mol_from_trexio(filename, backend='h5'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backend='h5'
can be removed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now use TREXIO_AUTO
for reading, that's why the backend
option does not need to be provided here
pyscf/tools/trexio.py
Outdated
def mol_from_trexio(filename, backend='h5'): | ||
mol = gto.Mole() | ||
with trexio.File(filename, 'r', back_end=_mode(backend)) as tf: | ||
with trexio.File(filename, 'r', backend=trexio.TREXIO_AUTO) as tf: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
back_end
instead of backend
(tests failed in the CI because of that)
pyscf/tools/trexio.py
Outdated
@@ -132,7 +135,7 @@ def mol_from_trexio(filename, backend='h5'): | |||
|
|||
def scf_from_trexio(filename, backend='h5'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above (backend
option can be removed)
pyscf/tools/trexio.py
Outdated
@@ -174,7 +177,7 @@ def write_eri(eri, filename, backend='h5'): | |||
|
|||
def read_eri(filename, backend='h5'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above (backend
option can be removed)
pyscf/tools/trexio.py
Outdated
else: | ||
n_chunks = 1 | ||
|
||
with trexio.File(filename, 'w', back_end=_mode(backend)) as tf: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need unsafe 'u'
mode to delete things from TREXIO file (e.g. determinant
group as here)
pyscf/tools/trexio.py
Outdated
@@ -174,7 +177,7 @@ def write_eri(eri, filename, backend='h5'): | |||
|
|||
def read_eri(filename, backend='h5'): | |||
'''Read ERIs in AO basis, 8-fold symmetry is assumed''' | |||
with trexio.File(filename, 'r', back_end=_mode(backend)) as tf: | |||
with trexio.File(filename, 'r', backend=trexio.TREXIO_AUTO) as tf: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
back_end
instead of backend
pyscf/tools/trexio.py
Outdated
|
||
|
||
def read_det_trexio(filename, backend='h5'): | ||
with trexio.File(filename, 'r', backend=trexio.TREXIO_AUTO) as tf: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
back_end
instead of backend
I see. Let's complete this PR as it is. I can make a new PR to address the general contraction. |
@q-posev I have updated the code following your recommendations. Let me know your thoughts! |
@NastaMauger thank you, it's perfect now! @sunqm I think we are ready to merge this if the CI is green. |
Hello,
For a specific reason, I needed to register my determinants into TREXIO, so I modified the code to accomplish this.
Since the functions use the MCSCF object and
mcscf_to_trexio
has not been developed yet, it might be beneficial to register the determinants infos into a different TREXIO file compared to the one used for the mf part. Therefore, this final TREXIO file only contains:The user can also set a threshold to register into the TREXIO file only the determinants with coefficients higher than the user-defined value. If no value is provided by the user, the full list of determinants generated by the PySCF object is registered.
As an improvement, one could develop a
mcscf_to_trexio function
(in progress here) as well as a converter from bitfield to a human-readable format. This would allow users to import a TREXIO file from other software and read it easily.The different functions from this PR can be called as follow:
Best
Edit: As recommended here, two files are attached where all new functionalities are working on CASCI/sCI and FCI objects.
casci.txt
fci.txt