Description
I'm frequently dealing with a really weird tabular data structure called a PCL file - basically, the first n
rows contain various forms of metadata (first column for these rows is metadata name), then below that it has a sample x feature matrix. So for example, I might have my_data.pcl
:
age | 12 | 14 | 18 |
---|---|---|---|
gender | m | f | f |
sampleID | a | b | c |
feature1 | 0.2 | 0.4 | 0.3 |
feature2 | 0.3 | 0.2 | 0.3 |
feature3 | 0.1 | 0.6 | 0.7 |
Since most of the operations occur on the numerical table part, and storing this all in a single dataframe (or whatever) would generally lead to columns with type Any
, what I'd like to be able to do is have load/save functions that make/take two iterable tables, that share the sampleID
row, eg the two tables would be:
metadatadf
:
sampleID | a | b | c |
---|---|---|---|
age | 12 | 14 | 18 |
gender | m | f | f |
featuredf
:
sampleID | a | b | c |
---|---|---|---|
feature1 | 0.2 | 0.4 | 0.3 |
feature2 | 0.3 | 0.2 | 0.3 |
feature3 | 0.1 | 0.6 | 0.7 |
And I'd like to be able to do something like:
(x, y) = load("my_data.pcl", id_row="sampleID")
metadatadf = DataFrame(x)
featuredf = DataFrame(y)
save("new_table.pcl", metadatadf, featuredf)
Can I get some guidance on whether this is possible / makes sense to use the FileIO
framework for this?