-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allotted length: maximum length of the variable #91
Comments
Look in |
Maybe add documentation on how the two sources of length (data-driven and spec-driven) is prioritized?
Users can either specify the length in |
I did some testing by creating xpt files and opening them with SAS. • When attribute ‘width’ is missing or width = NA then the maximum length of the variable is used for length • When the length in attribute ‘width’ is shorter than the maximum length of the variable, there is a warning message from haven and the maximum length is taken. When variable have Y and NA, ‘haven’ consider that the length of the variable is 2. Should I add a GitHub issue for this to not consider NA for length? Should we keep the warning message from haven or have our own check for this? • I did a test with a variable containing a value longer than 200 characters and an xpt was created without any issue and it was not truncated in SAS. The maximum length variable should be 200, I suggest adding a check to make sure that length of variable is <= 200 • Per define.xml specification, for SDTM date variable --DTC the length should be missing I suggest keeping attribute ‘width’ missing so the length is handled by haven or to set the length to 19. What do you think? • I also added a Github issue in Metatools package to suggest a function to check/update metadata taking in consideration the maximum length of variable in data. pharmaverse/metatools#53 |
This issue has been fixed: tidyverse/haven#699 It would be great to have the max length warning as @bms63 @cpiraux mentioned:
|
I have tested this with the latest version of haven, and I no longer encounter the issue with NA Additional suggestion: Additionally, we could consider adding an option to allow users to choose whether or not to impute the length. If the length is not imputed, haven can provide the maximum length of the value in that case. |
That's a good question. The limit of XPT v5 is 200 characters for the variable, which probably corresponds to 200 bytes. If special characters are used, then we need to take into account the number of bytes per character. |
@siye6 could you make a reproducible example of this please? We are in active development so can most likely incorporate this into the next release. |
Hi @bms63 , thanks for asking, sure, please see below simple example, xportr_write() call haven::write_xpt so I just use haven::write_xpt in the example.
|
From discussion in a PR I see that the issue #91 is also tagged but there are other elements to consider regarding the length in this issue. We could have another PR for it or do it in this one; I am not sure of the best approach, but I prefer the issue not to be closed until the following points are handled: Metadata length Data length > metadata length: This will cause the truncation of the data. This case is handled by Haven. The length is the one from the data, and a message is given. The FDA requests trimming the variable across datasets. For example, if AVISIT data length = 40 in ADLB and AVISIT length = 30 in ADVS, the AVISIT length should be 40 for all datasets. In this case, the metadata length will be different from the data length. I don’t think we can check this scenario, but maybe we could mention this point in the documentation. |
Agree with @cpiraux to use the data length when metadata length is missing, currently it's imputed as 200 |
I have listed below some possibilities to implement the maximum length in xportr:
|
As an example, the process for length that I have already experienced is:
|
Closes #91 length attribute from max data length
The length for the submitted data should be set to the maximum length of the variable across all datasets per FDA Study Data Technical Conformance Guide.
In the CDISC webinar ‘Define-XML Office Hours’, they also mentioned that the length in metadata (planned length) might be different than the one in dataset (actual length)
How xportr_length() could take this into account? See below discsussion
Definition of Done
The text was updated successfully, but these errors were encountered: