Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide conversions as CSVs #2

Open
Enchufa2 opened this issue Oct 17, 2023 · 4 comments
Open

Provide conversions as CSVs #2

Enchufa2 opened this issue Oct 17, 2023 · 4 comments

Comments

@Enchufa2
Copy link
Member

We could make separate packages (e.g. substances.clinical, substances.chem...) for specific domains, but those packages would only provide a dataframe of conversions, so I don't think it makes much sense. I would centralize everything here, and provide them as CSVs. (I'm supposing here that there's no authoritative source from where we can query or scrape this info, and basically that we need to compile these by hand).

@billdenney: If there's no better way, could you please provide your current databases for these conversions (including the standard units) as inst/extdata/clinical.csv, etc., in a PR? If you think this would be better managed externally, we would still need some sample data in this package for development and testing.

@henningte
Copy link
Contributor

In case this is useful, I have implemented conversions between mols of chemical elements and grams of chemical elements in the 'elco' package. I could reformat this table so that it could be used in the planned package.

@billdenney
Copy link

I think that a mass/molar conversion for chemical elements would be helpful. I also think it would be helpful to have multiple examples (elements and clinical laboratory measurements) at the beginning so that we can think if there are difference between different domains.

It'll take me a couple of days to pull together some clinical unit conversions-- I almost always do them as one-off conversions, so I don't have a .csv available.

@Enchufa2
Copy link
Member Author

@henningte That would be great, thanks.

@billdenney Sure, no hurries, thanks.

@billdenney
Copy link

@Enchufa2, @DBartlettHP has made a PR that will help the discussion of clinical lab unit conversion. I realized that the discussion of how to handle the details would probably be better here than in the PR, so I'm going to reiterate and expand on what I said there in this issue:

  1. Clinical lab units often have substance-specific parts and an SI unit parts that will need to be separated.
    1. For example, insulin has a conventional unit of "uIU/mL" and an SI unit of "pmol/L":
    2. The "mL" and "L" parts are typical SI volume units that are interpretable as with any other measurement.
    3. The "uIU" part of the conventional unit should be separated into a "u" (micro) prefix and an "IU" (international unit) part. The "IU" part is specific to insulin.
    4. The "pmol" part of the SI unit should be separated into a "p" (pico) prefix and "mol" which is interpretable as the SI unit of moles, but should not be directly converted to moles of anything else.
  2. Some clinical units are directly convertible between conventional and SI units, but should not be converted between substances.
    1. For example, "human chorionic gonadotropin" has conventional units of "mIU/mL" and SI units of "IU/L". The only part that is required to know for this conversion is that "IU" is a new base unit. And, the other important implementation detail for this example is that the "IU" here should never be converted to "IU" for another substance (e.g. do not convert "human chorionic gonadotropin" substance to "insulin" substance).
    2. Another example is that glucose only uses units already known to the units library ("mg/dL" and "mmol/L"), but again, the glucose substance "g" and "mol" should not be converted to another substance.

So, the way that I think it would make sense to register a new substance is something like:

  • Prework (prior to the substance registration stage, i.e. in the data-raw directory):
    • Determine the substance name and the unique substance units that may be available.
    • Determine the conversions between the unique substance units.
    • I think that this would take the form of a data.frame with 4 columns of "substance_name", "unit1", "unit2", and "conversion_factor" where you would take the value with "unit1" and multiply it by the "conversion_factor" to get the value with "unit2". (This is a generalization of what was done for the elements table.)
  • Add the substance to the substance library:
    • Register the substance name with all of its units and conversion factors.
    • Verify that all units are convertible, possibly by multiple steps.
  • Enable creation of a vector of substance-unit-values
    • I think that this would be similar to the set_units() function with an added argument for substance.
    • Possibly, set_substance(x, unit, substance, ...) where x is numeric, unit is the vector of possibly-different units, and substance is the vector of possibly-different substances?
  • Enable use of the substance-unit-value objects:
    • Unit conversion: set_units() and as_units() methods
    • Substance conversion:
      • A simple need would allow forcing one substance to become another. The most straight-forward way to do this would likely be to move out and back into the substances library (e.g. as.numeric(x); do the unit conversion; set_substance()). I think that I would keep this as the default for the intermediate future.
      • A more complex need may allow for substances to convert between each other. For example, we may have a "water" substance where 1 mole could convert to 2 "hydrogen" moles. That seems like something to be aware of but not to handle in the initial version of the package.
    • Math ops: Operations would only occur within a substance or between unitless numbers and the substance. This would be equivalent to the math in the units library and may only involve a substance check before passing to the units library.

How does that sound?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants