Skip to content

Save / Load with multiple data inputs / outputs #192

Open
@kescobo

Description

@kescobo

I'm frequently dealing with a really weird tabular data structure called a PCL file - basically, the first n rows contain various forms of metadata (first column for these rows is metadata name), then below that it has a sample x feature matrix. So for example, I might have my_data.pcl:

age 12 14 18
gender m f f
sampleID a b c
feature1 0.2 0.4 0.3
feature2 0.3 0.2 0.3
feature3 0.1 0.6 0.7

Since most of the operations occur on the numerical table part, and storing this all in a single dataframe (or whatever) would generally lead to columns with type Any, what I'd like to be able to do is have load/save functions that make/take two iterable tables, that share the sampleID row, eg the two tables would be:

metadatadf:

sampleID a b c
age 12 14 18
gender m f f

featuredf:

sampleID a b c
feature1 0.2 0.4 0.3
feature2 0.3 0.2 0.3
feature3 0.1 0.6 0.7

And I'd like to be able to do something like:

(x, y) = load("my_data.pcl", id_row="sampleID")
metadatadf = DataFrame(x)
featuredf = DataFrame(y)

save("new_table.pcl", metadatadf, featuredf)

Can I get some guidance on whether this is possible / makes sense to use the FileIO framework for this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions