Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save / Load with multiple data inputs / outputs #192

Open
kescobo opened this issue Jul 23, 2018 · 2 comments
Open

Save / Load with multiple data inputs / outputs #192

kescobo opened this issue Jul 23, 2018 · 2 comments

Comments

@kescobo
Copy link
Member

kescobo commented Jul 23, 2018

I'm frequently dealing with a really weird tabular data structure called a PCL file - basically, the first n rows contain various forms of metadata (first column for these rows is metadata name), then below that it has a sample x feature matrix. So for example, I might have my_data.pcl:

age 12 14 18
gender m f f
sampleID a b c
feature1 0.2 0.4 0.3
feature2 0.3 0.2 0.3
feature3 0.1 0.6 0.7

Since most of the operations occur on the numerical table part, and storing this all in a single dataframe (or whatever) would generally lead to columns with type Any, what I'd like to be able to do is have load/save functions that make/take two iterable tables, that share the sampleID row, eg the two tables would be:

metadatadf:

sampleID a b c
age 12 14 18
gender m f f

featuredf:

sampleID a b c
feature1 0.2 0.4 0.3
feature2 0.3 0.2 0.3
feature3 0.1 0.6 0.7

And I'd like to be able to do something like:

(x, y) = load("my_data.pcl", id_row="sampleID")
metadatadf = DataFrame(x)
featuredf = DataFrame(y)

save("new_table.pcl", metadatadf, featuredf)

Can I get some guidance on whether this is possible / makes sense to use the FileIO framework for this?

@SimonDanisch
Copy link
Member

If I'm not mistaken, arbitrary signatures should be supported in FileIO - just make sure that you accept those signates in your IO library! If FileIO misses to pass down the keyword args or additional arguments, please open an issue!

@kescobo
Copy link
Member Author

kescobo commented Jul 24, 2018

Great! An orthogonal question that just occurred to me - should I / can I piggyback off of the functions already present in CSVFiles.jl? Since this is a special case of a csv/tsv file type, and I'll probably want to make use of all of the keyword stuff available there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants