Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whats the best route to save the predictions into a csv file (using Tribuo classes) #109

Open
neomatrix369 opened this issue Jan 28, 2021 · 9 comments
Labels
question General question

Comments

@neomatrix369
Copy link
Contributor

Ask the question
What's the best route to save the predictions into a csv file (using Tribuo classes). Say I have a List<Prediction<Regressor>>

One way could be to iterate thru the list of items and write it to the disk via some FileXxxx() class.

Is your question about a specific Tribuo class?
List<Prediction<Regressor>> and Dataset (one of it's concrete subclasses)

@neomatrix369 neomatrix369 added the question General question label Jan 28, 2021
@Craigacp
Copy link
Member

Craigacp commented Jan 28, 2021

There isn't a helper to write out a csv file of predictions. You can save the dataset back out using CSVSaver, but that won't have the predicted values in it.

It's roughly two lines by converting the list into a stream.

First write out the dimension headers from the output info inside the model, and then predictions.stream().map(Prediction::getOutput).map(Regressor::getValues).map(Arrays::toString).map(s -> s.substring(1,s.length()-1)).forEach(writer::println). Admittedly that's a little ugly as it has to strip off the [ and ] that Arrays.toString() puts on, so there is a cleaner way with a slightly more complex lambda that combines those two operations.

@Craigacp
Copy link
Member

Craigacp commented Jan 28, 2021

Alternatively there is Regressor.getSerializableForm() which produces an output string DIM-0=<value>,...,DIM-N=<value> depending on how exactly you want the output to look. This format is the one that's easily consumed by RegressionFactory.generateOutput.

@neomatrix369
Copy link
Contributor Author

There isn't a helper to write out a csv file of predictions. You can save the dataset back out using CSVSaver, but that won't have the predicted values in it.

It's roughly two lines by converting the list into a stream.

First write out the dimension headers from the output info inside the model, and then predictions.stream().map(Prediction::getOutput).map(Regressor::getValues).map(Arrays::toString).map(s -> s.substring(1,s.length()-1)).forEach(writer::println). Admittedly that's a little ugly as it has to strip off the [ and ] that Arrays.toString() puts on, so there is a cleaner way with a slightly more complex lambda that combines those two operations.

It would be nice to have a method that allows this, cause it's something we all probably want to do as part of a pipeline. I can think of many usecases, I;m already in the middle of one such use case.

@Craigacp
Copy link
Member

Ok. I'm not sure where such a method should live. We have done this in the past when writing out classification outputs for comparison against other systems, but it lives in the main method - https://github.com/oracle/tribuo/blob/main/Classification/Experiments/src/main/java/org/tribuo/classification/experiments/ConfigurableTrainTest.java#L169.

Any suggestions on where it should go? It needs to be specialised to each Output type, so I guess it could be a method on the OutputFactory?

@neomatrix369
Copy link
Contributor Author

neomatrix369 commented Jan 28, 2021

Ok. I'm not sure where such a method should live. We have done this in the past when writing out classification outputs for comparison against other systems, but it lives in the main method - https://github.com/oracle/tribuo/blob/main/Classification/Experiments/src/main/java/org/tribuo/classification/experiments/ConfigurableTrainTest.java#L169.

Any suggestions on where it should go? It needs to be specialised to each Output type, so I guess it could be a method on the OutputFactory?

Let me try to work a workflow from a user perspective, I think some of the low-level (granular) calls could be brought to a higher-level (wrapped with higher-level functions) so we don't have to do a lot of x.y.z() to get to the results - there is a bit of a cognitive overload as well when it comes to getting from one part of the flow to the other.

@neomatrix369
Copy link
Contributor Author

neomatrix369 commented Jan 28, 2021

Also, another question sort of related to this one, say I have this block of code:

var mutableValidationDataset =  new MutableDataset(wineSource);
for (var i: mutableValidationDataset.getData()) {
     System.out.println(i); 
}

I'm not able to get hold of each of the example in the mutableValidationDataset. I tried mutableValidationDataset.getData().get(0) but this does not give me any method I can make use of, I'm referring https://tribuo.org/learn/4.0/javadoc/org/tribuo/impl/ArrayExample.html. It would nice to be able to iterate through the features and target fields.

@Craigacp
Copy link
Member

Craigacp commented Jan 28, 2021

Also, another question sort of related to this one, say I have this block of code:

var mutableValidationDataset =  new MutableDataset(wineSource);
for (var i: mutableValidationDataset.getData()) {
     System.out.println(i); 
}

I'm not able to get hold of each of the example in the mutableValidationDataset. I tried mutableValidationDataset.getData().get(0) but this does not give me any method I can make use of, I'm referring https://tribuo.org/learn/4.0/javadoc/org/tribuo/impl/ArrayExample.html. It would nice to be able to iterate through the features and target fields.

Assuming that's the complete snippet then it's because you forgot the type parameter on MutableDataset (probably should be MutableDataset<Regressor> but it might also infer it properly from the source so MutableDataset<> could work). Then because you forgot the type the JVM washed off all the generics so the Dataset implements Iterable not Iterable<Example<T>> and the type inference inferred Object as the type for i.

You won't get ArrayExample back, the contract is for Example but there aren't many methods just on ArrayExample.

@neomatrix369
Copy link
Contributor Author

I used your tips and some workarounds to get my solutions but ideally, it would be good to have them via cleaner methods (flows) i.e. class/instance level methods to get to the stuff we need from the input data as well as the prediction classes.

@Craigacp
Copy link
Member

What else did you need apart from the regression outputs? The features and ground truth outputs should be simple to access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question
Projects
None yet
Development

No branches or pull requests

2 participants