Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary with information on experimental data #27

Open
TristanHehnen opened this issue Mar 12, 2020 · 30 comments
Open

Dictionary with information on experimental data #27

TristanHehnen opened this issue Mar 12, 2020 · 30 comments

Comments

@TristanHehnen
Copy link
Contributor

As mentioned in #26, here is the link to the first iteration of the ExperimentalDataInfo.py. It contains basically the information that is provided via the README.md files and knows the location of the CSV files containing the data from the different experiments. I like this approach, because it allows me to access all the information from within python scripts or Jupyter notebooks. The human-readable keys to access the different items I find to reduce errors, as compared. Also dictionaries can be easily transformed into Pandas DataFrames which allows for nice rendering of tables in the Jupyter notebooks. Furthermore, I can easily pass the information on to scripts that build FDS and optimisation input files.

It could be located in the root directory of the MaCFP Git repo (obviously file paths need to be adjusted).

Since it aims to mirror the structure of the README.md files it might be relatively simple to set up scripts to automatically screen the repository and add information of new data sets.

If this script is considered a useful addition to the MaCFP project we can add it in.

@rmcdermo
Copy link
Contributor

@TristanHehnen Thanks. I have been working on building a module for the gas phase group. It can be found here. I suggest you start a similar module for matl-db called "matl.py" and add your classes to this module. Then, this module can live in a Utilities directory of the repo and be called using something similar to this (this is just a temporary script I'm using to build the module).

@TristanHehnen
Copy link
Contributor Author

I'll have a look at it.

@TristanHehnen
Copy link
Contributor Author

I would like to introduce to you the initial prototype for parsing the readme files. I would like to get some feedback such that I do not spend a lot of time creating something that is disliked/thrown away in the end.

For now the prototype consits of a script containing the individual functions and a Jupyter notebook that serves as a brief demo of the functionality. It doesn't follw the module proposal by Randy, i.e. "matl.py" yet, but it could certainly moved in this direction.

As an example, only the TGA data from UMET is processed in the demo. This is primarily due to the overhead that comes with adjusting the readme files. And if we have reached an agreement as to how they should look like, it would be easier to adjust the format automatically with the here developed script. That means, the readme files should be unified in that different laboratories provided different parameters to describe their experimental campaigns (e.g. different temperature programs, lids on the crucibles or ways to describe the crucibles, etc.). In my view, all these items should be used in all experiment descriptions consistantly. Items that are not relevant for the particular experiment in question should contain "None", as they do in the dictionary later on. The goal of having the laboratories fill out "None" consciously, is to reduce the chance to forget data.

Furthermore, I suggest to have the heating rates and initial sample masses written only in the "Test Condition Summary" table. They seem not to be too useful in the text section.

Since Isaac started to unify the data file names, I would like to ask if the label of the individual experiment is supposed to by the same as the data file name in general. This would be helpful to reduce the footprint of the summary table and I don't really see why there should be a difference in naming.

As for the structure of the dictionary, it is ordered by experiment --> institute --> repetition (rep. label/data file name) --> parameters (e.g. heating rate or initial sample mass), see the demo.

As further steps, I would set up functionality that translates the dictionary into the readme file format and saves it as README.md, save the dictionary as human-readable Python script, set up the functionality to process all the other experiment types and finally provide the respective README.md templates, such that new data sets can easily be integrated.

@rmcdermo
Copy link
Contributor

@TristanHehnen Thanks for all your work on this! I think it is headed in the right direction. But I am not the keeper for matl-db (only a maintainer), so I think we need to get consensus from Isaac (@leventon) and Morgan (@mcb1).

As long as very simple instructions can be put together for the participants, I am in favor. I suggest you use Isaac or Morgan as a test case and see if they can follow the instructions.

@leventon
Copy link
Contributor

leventon commented Jun 17, 2020 via email

@rmcdermo
Copy link
Contributor

Guys, what about the idea of using a Google Form (or something) to submit the README data, which would get converted to a csv and then the scripts would generate the README.md file. In that way, you could have dropdowns where you want only specific answers.

@leventon
Copy link
Contributor

That's probably a good idea. We might get a new data set from Chile in the next few months - that'd be a good trial run for the form, otherwise it'll at least be set for our next material. Either way, @TristanHehnen - after we settle on a final format of the readme, do you want to set up a test case with Google forms to work on a script to build a proper README file from there? Forms can output data as .csv files; parsing them should be straightforward but there might be tweaks needed to do that with your script given the different format of those files vs. our current README.md files.

Ideally, we'd also upload measurement data through google forms, but it looks like Forms requires users to have a google account to do that, so I'll likely just include a text notification reminding visitors to [submit data by email as a .zip file to [email protected]] when they submit the form.

As for your script - honestly, I'm out of my element here so we should wait for proper feedback from Morgan. The general concept / flow of what you have here makes sense but I can't really comment on the functionality, writing, or design of the code itself. As for general conceptual comments...

Default options:
I am a little worried about defaulting to "None" for set options. In some cases, 'unknown' or 'not provided' would make more sense. (e.g., we should not default to "None" for crucible type/lid type because, in this case, None means that "it wasn't used" [instead of "this info wasn't provided by the user"])

Test Conditions Table:
For the test conditions table, especially if we automate it as a form, I am not sure how we can switch that programmatically to allow for different field/header types for different experiments (e.g., TGA has certain settings, these are unique from Cone or FPA). Linking O2 concentration to the main carrier gas will require some thought as will having different inert gases and how we link this prescribed value to test label/file name.

Test Label and File Name:
I believe you noted this already but Test Label and File Name are Redundant, so we should definitely collapse that into one field

Calibration type:
This likely could be expanded to include more default field types. Below is an example just for TGA; we'd likely want further thought to provide options for other test types
Calibration type: (mass, heat flow, temperature)
Calibration Temperature Range:
Number of Calibration materials:
Frequency:

Initial Mass:
For data analysis - it is not uncommon for initial mass to be not-equal-to the first time/line of mass data in .csv files (e.g., due to taring, or buoyancy effects). Currently, we don't have an automated process to converge the two. On a case by case basis, I'd adjust as needed; likely, taring/renormalizing .csv data to the listed (if provided) initial mass.


I really like that we have some ability to visualize data so the Plotting section is great, but I'll hold off on comments there until we can sort through the README first

@rmcdermo
Copy link
Contributor

I would not worry about the "None". If you use a form, then whatever you have in the dropdown can be converted to None as needed. None is commonly used in Python script arguments, so it is handy that way.

@TristanHehnen I'd say press forward with your processing scripts. We are in a similar situation on the gas phase where really only I know how the scripts work. To some degree, this is unavoidable. The fact that you are taking charge and making things happen means you are in control of this aspect of the project. It is welcome from my point of view.

@leventon I would argue that "ideally" measurement data would come from a pull request to GitHub. In lieu of that, emailing a zip that we push to GitHub is the best option. Usually I have to massage the column headers, etc.

But let's give the form idea a try just amongst ourselves. Create a simple toy form and send it to me and Tristan and we can build from there.

@leventon
Copy link
Contributor

leventon commented Jun 18, 2020 via email

@TristanHehnen
Copy link
Contributor Author

Thank you @leventon and @rmcdermo for your time and comments!

@leventon :

  • For the README: I've now created a template file that contains the information for the TGA experiments. New contributors could use this to fill their information in. I would provide similar files for the other experiments, while working through the repo. (Do we know what the group from Chile would like to provide? We could use this to decide which experiment I could process next and then they could test how it all works...)
  • I've started a Wiki with the aim to provide information as to how people can contribute to the repository. Furthermore, some more technical information is to be provided on the scripts, functions and the usage thereof. I've used the "Guidelines for Participation in the 2021 MaCFP Condensed Phase Workshop", version 1.2, to fill in some of the text.
  • On the test condition table: We can leave the temperatures and sample masses etc. in the text. Also, you wouldn't need to go back and change all README files, we could have them built from the dictionary later on.
  • File names and repetition labels: Okay, then I will change the table that in only contains the repetition label and extract that for the file name as well. (As it was already started.) I'm also not quite sure about putting the carrier gas into the file name/repetition label. It seemed as a good idea at first but I believe now it might get quickly out of hand. With file name and repetition label being the same and further information like gas concentrations in the summary table the necessary information should be there.
  • As for the "None": As Randy mentioned, it is relatively handy for the scripting. In my mind, for example for the lids in TGA experiments, there would be three options it was used --> True, it wasn't used--> False, no information provided (for whatever reason) --> None. The "None" are simply meant for no information provided.
  • For the different experiments: Since different information is needed for different experiments, I would set up individual README templates, as well as Python functionality to deal with them. The user would be asked to look for the necessary templates, copy their content into a new README.md file and fill in the blanks (None). A description to the templates is to be provided in the Wiki.
  • Calibartion types: It might be a good idea to have the suggested items there. For the calibartion descriptions I looked at, there seemed also to be some kind of "individual solutions", which I found hard to condens to bullet points. Thus, there are these "Notes" items.
  • Initial mass: I'm unsure what to do here. It is correct that the first value does not necessarily correspond to the intitial (as I thought naively in the beginning) and there are some noisy artefacts in some data sets. Some may contain simply slight errors, seen by the changes in the residual mass, which varies between 0% to 10% of the original sample mass across different data sets. Yea, I've no idea what to do here...
  • The jupyter notebook with the plot was primarely intended to be an example for how to use the dictionary. For now the dictionary is created within the notebook, but I would like to have it already pre-build and ship with the matl-db repo. This would then obviously need to be updated with new data sets, but there could be a script in the utilities that could be used for the update. We can then also have notebooks within each of the institute-directories, such that plots could be provided. Still, I would keep the README files, since they are realatively easy to read/use without the Python overhead. Like, I would not require people to provide a notebook, just the README. But if they want to provide one it would be fine, I guess.
  • Submitting of new data sets: I would vote for having them integrated via pull request, instead of a Google form.

For the next steps, I would like to wrap up the TGA experiment processing functionality, using the UMET data set as example. When we have agreed that this is how it should look like, I would propagate the necessary changes to all other TGA experiments within the repo. Afterwards, adding the functionality for the next experiment with a single example case, e.g. Cone Calorimeter, have this discussed, propagate the changes and so forth.

@leventon
Copy link
Contributor

leventon commented Jun 24, 2020

I like the Wiki, it’s a good addition to start building/adding reference material there. As for the hope that we’ll get people to submit data through Github PRs vs. email – long term, I hope we get there, but we just haven’t seen any willingness from our participants yet for that. It’s a learning curve / barrier to entry that we likely won’t get past with all participants. I am no longer wholly incompetent with Github but it took a surprising amount of effort to get to this level (just for adding/editing of data / files in our repo). I’m not sure I would (and it looks like most contributors wouldn’t either) want to go through that just to submit files.
Though it may exist elsewhere, writing up the most basic, step-by-step walkthrough of how to do that in Github (including how to set up an account, close the repo..) and posting that to the wiki would help, especially for new users (even if we just copy it from elsewhere)

As for the TGA/DSC template that’s there, I’m still not sure how well it will work out if we rely on that vs. trying to create a form that needs to be filled out in a certain way. I mention that because we had already included that info in the guidelines that were emailed to everyone AND templates were available on the Repo when most labs submitted data but we still got quite a spread in what was submitted. So long as they can edit fields (e.g., [none]) when they write their own files, we’ll likely get a lot of variation (not all labs, but most).
You mentioned different templates – as I started thinking about a google form for that, I came to same conclusion, we likely need multiple options to record info from all the test types people are submitting. If you’re okay with unique python scripts for each one, we should be okay.

As a trial run for that vs. just following the templates you provide for TGA data – using the Chile group as a test case might be a good idea. I suspect they’ll submit cone and TGA data. How would you feel about getting your template up and available on the repo and making a google form as a second option. Hopefully they can provide feedback for which was easier to work with. Til then, fields like calibration types that need (or would do well from) having suggested items likely can be best defined with a drop down menu in a form.

TGA naming- I’m right there with you on those file names getting long / out of hand. In fact, heating rate wasn’t even included on some of the earlier data sets. As I started analyzing that data though, it became apparent that heating rate and gaseous environment were needed (or at least very helpful) to include.

TGA initial masses: What’s happening here is likely how the experiment is run. From my experience with the test, you have a range of options for defining that mass (m0). In a number of cases, separately measuring m0 before you start your test gives you the most accurate measurement. The balance is hypersensitive and so you can see shifts in that signal at the start /end of the test. Let’s say true mass is 5.0 mg. It’s not uncommon for the initial steady state TGA mass (at 20C-80C) to read higher or lower (though be stable). In that case, I’d use the time resolved mass loss but renormalize the initial mass to match m0 as measured independently. Steps like that.. they’re clear to the experimentalist (it’s why UMD submitted their own averaging / uncertainty analysis) but it can be hard to automate. *This is something that will require further discussion

I think I shared with you (email) a copy of the outline/next steps that was sent to the condensed phase committee a couple weeks ago. In ~2 weeks from now, I’ll need to prepare a summary of data to the participants. That will include preliminary analysis of all test data (hopefully, I’m working through that in MATLAB now). When that report is shared with the committee, and then with participants, we are requesting feedback on how we want to do that analysis (e.g., how to define smoothing, test averaging, uncertainty analysis, key data point identification…) There is not necessarily one best approach; one of our goals was to come to a consensus as a community on how to do that.
Because those requirements will evolve, it may be worth waiting a week or two on your end to write/finalize those analysis scripts in python until we agree on how we want to do that analysis (how we format the data may even need to change; e.g., time/temperature resolution). For now, plotting tools like you have for the TGA data – those are great for visualization and likely won’t have to change as much, so they may be better to focus on in the near term.
When I share the report/summary approach with the committee (a week before it’s widely shared with the community/participants) I would like your feedback though, if you can. The exact code between matlab/python will change, but the functionality in the end will be the same.

*** Of all files to work with for TGA – please avoid UMET for now. That set is messed up. I’m aware of some challenges; I have different notes on what I want to do there, and I’ll edit it eventually when I can but.. just for now, please choose a friendlier set. I think SANDIA TGA data was okay (and that gives you a range of test conditions to play with too). ***

@leventon
Copy link
Contributor

Oops. Please forgive the formatting of that last message. Larger font is not meant to indicate emphasis, I don't know what happened there.

@rmcdermo
Copy link
Contributor

Markdown thinks the ----- mean you are formatting a table and it makes the column headings of tables bold. (Part of the learning curve :)

As usual, I disagree with the comment about automation. These things are not difficult. Just get the data into a simple column format and we can do pretty much anything.

@TristanHehnen
Copy link
Contributor Author

Hello everyone, my apologies for the radio silence recently.

I've now implemented some improvements for the processing of the README files. The individual steps are now better implemented into functions. These functions contain inline comments and docstrings, in an effort to make things more accessible for users, or rather developers.

For developers and maintainers:
There is a Jupyter notebook ExpInfoConstruction that details how the README files are processed. This is meant to explain the process to developers that would like to contribute to the utilities. Furthermore, it is used to create the dictionary and save it to a Python file, such that maintainers of the repository could easily update the dictionary when new data comes in.
The creation of the Python file is not meant to be performed often, but only for updates by the maintainers, or possibly contributers, for when new data is added.

For users:
For regular use the Python file is to be imported and then all the information is readily accessible. They should ideally not need to deal with the things mentioned above.

Now, the question is if the layout/format of the README files, at least for TGA experiments, is settled (e.g. my proposal in the UMET README in my fork). Then I would have another pass over the implemented functions to ensure they work with said format and unify the remaining README files.
Afterwards I would start processing the cone calorimeter data, for example.
I will also update the demonstration of the usage of the dictionary that is meant for the users.

What are your thoughts about this?

@rmcdermo
Copy link
Contributor

@TristanHehnen I am very much in favor of moving forward with your Python scripts. I have just spent the last few days going through the current Matlab scripts and, while these were necessary to get started, they need an overhaul.

What would be very helpful, and I am not sure how far you are from having this, is if you could create a master Python script that would process the exp data and create all the plots needed for Isaac's document.

Isaac is going to email me his personal copy and then I will push the pdf up to the Releases page. You can then use that document as a basis for your scripts. If that document is not sufficient, then I think it means it needs work. So, this will be an excellent exercise.

@TristanHehnen
Copy link
Contributor Author

@rmcdermo
The overall goal with the above discussed dictionary is indeed to facilitate the automatic processing of all the information within the matl_db repo. However, said scripts are merely the foundation, aiming to structure all the information and make it easier accessible (at least for people using Python).
For now the "master Python script" is not really feasible, because the remaining README files need to be adjusted and the only experiments that are accessible yet are the TGA experiments. These are the next steps I'm working on, mentioned above.

I can certainly help translating the Matlab functionality into Python scripts. However, if it is not too urgent, I would like to focus first on the foundation - processing all the README files.

For translating the Matlab functionalities I would open a new issue to keep both tasks clearly sperate.

@TristanHehnen
Copy link
Contributor Author

The translation of the Matlab functionalities to Python have now their own issue, see issue #80

@TristanHehnen
Copy link
Contributor Author

@leventon
I will now adjust the TGA/DCS parts of the README files, primarily the test summary tables. Such that these tables contain the O2 concentration. I will also remove the file name column from my UMET example, because I believe the consensus was that the file names should be identical to the test names. For cases where TGA and DCS were conducted simultaniously, I will set up a function that adjusts the [...]STA[...] in the test name to the respective test when reading the file names. Even though, both tests could be performed simultaniously, I would keep them seperate within the dictionary, primarily to deal with cases where only one of each was conducted. Also, this information will not be lost anyway.

@TristanHehnen
Copy link
Contributor Author

TristanHehnen commented Sep 17, 2020

So, just as a head-up: the TGA data can now be processed and the respective information is already stored in the Python file containing the dictionary.
There is a brief demonstration notebook, that plots the TGA results from the institutes that submitted test data with a 20 K/min heating rate, just as an example.

I would now proceed to the cone calorimeter.

EDIT: Typo

@rmcdermo
Copy link
Contributor

Looks great, thanks!

@TristanHehnen
Copy link
Contributor Author

TristanHehnen commented Oct 2, 2020

Update: DSC data can now be processed. Construction notebook and dictionary are updated accordingly.

EDIT:
DSC README template added as well.

@TristanHehnen
Copy link
Contributor Author

Hi @leventon and @rmcdermo,

I've now unified most of the README files concerning the cone calorimeter data. Based on this I've created a template.

I would like to ask you to check said template for consistency and completeness. Specifically look at the sample holder and retainer frame dimensions. Across various README files values for both were provided and are thus replicated in the template. Main questions here are:

  1. Is this information necessary/useful?
  2. What happens when institutes have a different design, e.g. round sample holders? Is the presented appraoch sufficient?
  3. Which way for reporting the dimensions is preferred, "sample holder" or "retainer frame"?

Furthermore, there are some significant differences on the volume around sample and heater (sample chamber). Some apparatuses have some kind of box around them (glass walls at the sides), while others can seal this part off and control the atmosphere. I'm not sure how to deal with this and I've just provided a relatively basic approach to collect this data. Would this be sufficient or am I missing something here? Would we need some flow rates here as well, specifically for the controlled atmosphere ones?

For the backing my idea is to adress each material as an individual layer. The provided lines would need to be copied and the individual entries numbered accordingly.

With the thermocouples there are different ways their locations are reported. Some are marked "front" and some "back". I'm thinking now, that there could be two coordinate systems, one starting at the centre of the front face of the sample and the other on the top face of the backing (back of the sample). Then negative z-values would denote locations within the sample and backing respectively - positive z-values point towards the heater for both systems.
Furthermore, it might be interesting to address the directions from which the individual thermocouples are lead to the measurement location. There could be some point like "Lead from: left side", or something.

The summary table could be extended to get a column for the flame out time and a column for the residual mass after the test.

Tristan

@rmcdermo
Copy link
Contributor

Why not also require a detailed drawing of the system? Modelers usually need this sort of thing.

@leventon
Copy link
Contributor

leventon commented Oct 29, 2020 via email

@TristanHehnen
Copy link
Contributor Author

Hi @leventon !

First off: Yes, there are a lot of items in the template. I would like to emphasise that they are all collected from the README files that have been provided by the institutes. I only added two minor details the ignition time column and the bead diameter.

  1. From the points in the README you linked to, I seem to have most of it covered. The main thing missing in the template is the baseline corrections. I've talked to some of our experimenters and they mentioned "correction curves" specifically for TGA/DSC type of apparatuses. Would this be similar? If we would like to have this data, I suggest to provide them as *.csv files. The file could be labeled InstituteLabel_BaselineCorrections.csv. The column labels could be the individual test labels the baseline correction correspond to.

  2. Sure, we can use the bullet points up top only for the general info of heat flux and initial sample mass.

  3. Okay, I propose then that I reduce the bullet points for the whole sample-holder-thing to this:

  • Sample holder
    • Shape: [None]
    • Edge length / diameter: [None] m
    • Retainer frame: [None]
    • Nominal exposed sample surface area: [None] m²
    • Retaining grid: [None]
    • Note: [None]

I put the rest into the description part in the beginning, to keep the information on the sample holder that was already provided by the institutes.

  1. The data of the backing materials could be stored in another *.csv file (my favourite) or in another table in the README. File name should then be something like InstituteLabel_GlassWool.csv or InstituteLabel_Backing1.csv.

  2. I'll remove the bead diameter.

  3. I'll change the calibration to:

  • Calibration
    • Type: [None]
    • Frequency: [None]
    • Note: [None]
  1. I'll adjust the apparatus type.

@TristanHehnen
Copy link
Contributor Author

To 3.: I move the nominal exposed surface and the diameter/edge length to the sample itself which reduces the amount of items for the retainer frame even more.

@leventon
Copy link
Contributor

leventon commented Nov 4, 2020

Hi @TristanHehnen , lots of good work here, thanks for the update.

  1. I wouldn't necessarily include the baseline correction as its own data set. There's not much you can do with it unless you also have the uncorrected data, then you can confirm it's subtracted correctly. This all becomes a mess.

We should confirm the group's calibration procedure (type / frequency / matls) and whether or not results have been corrected for drifts in their baseline (TGA, DSC, MCC, and Cone HRR are all often adjusted in such a way). This calibration and baseline correction should be done by the experimentalist, not the modeler, and the process described (not asked to be reproduced).

Are these the same as a "correction curve"? Maybe, but that's ambiguous wording to me. For clarity, I'd refer you to each of the reference texts suggested in the preliminary summary (they discuss the principles/practices needed) but that's not most supportive of replies. A more friendly, immediately useful response might be to have a ~30 min call when we can go through each of these types of corrections/calibrations // setup/processing steps rather than trade messages.

  1. works for me. For shape, simply offering [None/square/round] may provide consistency, though I suspect we're doing all that ourselves / manually anyway.

  2. This is a more interesting questions. A standard format for what properties we want / how that should be submitted would be needed. Such info is not necessarily given by our labs. Some discussion with them may help. Going through the effort of creating this nice standard template, requesting all the files should be balance by what info they can and will provide.

@TristanHehnen
Copy link
Contributor Author

Small clarification: The thermocouple diameter was introduced by Edinburgh and I changed it bead diameter.

@leventon
Copy link
Contributor

leventon commented Nov 6, 2020 via email

@TristanHehnen
Copy link
Contributor Author

Sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants