Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new character type #391

Merged
merged 45 commits into from
Oct 18, 2024
Merged

Add new character type #391

merged 45 commits into from
Oct 18, 2024

Conversation

josephburkhart
Copy link
Contributor

@josephburkhart josephburkhart commented Sep 16, 2024

Description

These changes make the Python API more approachable by adding support for reading and writing strings to GSD data chunks. My changes add a new character data type on the C side, and modify the python api to automatically decode strings for the user. The new changes are accompanied by a new test, which fully covers the expected use cases.

Motivation and Context

Currently, strings are implicitly encoded as uint8 data, but it is up to the user to figure out how to decode that data upon reading. The Python API is intended to be approachable, allowing users to, for example, create a pandas dataframe from a GSD log in one line. However, this oneliner will break if the log contains varying textual information, because pandas requires uniform dimensions in each column. These changes make it so users don't have to convert decimal unicode to text on their own, allowing the one-line example to work.

This PR resolves #375.

How Has This Been Tested?

I have modified test_fl.py, adding a new test specifically for the string dtype (test_string_dtype). The new test covers both fl and pygsd.

  • Please build the sphinx documentation and check that any changes to documentation display properly

Checklist:

  • I have reviewed the Contributor Guidelines.
  • I agree with the terms of the GSD Contributor Agreement.
  • My name is on the list of contributors (doc/credits.rst) in the pull request source branch.
  • I have added a change log entry to CHANGELOG.rst.

@josephburkhart
Copy link
Contributor Author

pre-commit.ci autofix

@josephburkhart josephburkhart marked this pull request as ready for review September 20, 2024 14:12
Copy link
Member

@joaander joaander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to remove stray print() calls, commented code, and unused imports.

gsd/fl.pyx Outdated Show resolved Hide resolved
gsd/fl.pyx Outdated Show resolved Hide resolved
gsd/fl.pyx Outdated Show resolved Hide resolved
gsd/fl.pyx Outdated Show resolved Hide resolved
gsd/gsd.c Show resolved Hide resolved
gsd/gsd.c Outdated Show resolved Hide resolved
josephburkhart and others added 2 commits September 24, 2024 12:44
Co-authored-by: Joshua A. Anderson <[email protected]>
Co-authored-by: Joshua A. Anderson <[email protected]>
@josephburkhart
Copy link
Contributor Author

Great catches - apologies for the omissions!

Copy link
Member

@joaander joaander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale label Oct 16, 2024
@joaander
Copy link
Member

I tested the original motivation for this feature.

In the test_hoomd.py::read_log test, pandas is indeed able to create a DataFrame with the strings:

    import pandas
    df = pandas.DataFrame(logged_data_dict)
    print()
    print(df)
   configuration/step  log/value/potential_energy  log/value/pressure log/category
0                   0                          10                  -3            A
1                   1                          10                   5          BBB

@joaander joaander merged commit c75aa35 into trunk-patch Oct 18, 2024
41 checks passed
@joaander joaander deleted the new-character-type branch October 18, 2024 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a flag to the Index Block that indicates whether a data block encodes text
2 participants