Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot confusion matrix not from predictions but from actual confusion matrix #105

Open
ricgu8086 opened this issue Dec 4, 2019 · 1 comment

Comments

@ricgu8086
Copy link

ricgu8086 commented Dec 4, 2019

Hi,

I would like to ask if there is a way to provide a precomputed confusion matrix and still using scikit-plot functions for visualization. I have a task where I want to plot 2 types of confusion matrix: one for number of transactions and one for the amount of each transaction ($). In the first case is pretty straightforward, I have ground truth, I have predictions, so just a quick call to plot_confusion_matrix and voilá. However, for the second case is not that easy, as some transactions could be in order of 1000$. If the dataset is of millions of dolars, I would need to create an array with a huge size where each element is a single $, its prediction and its ground truth. It is less cumbersome if I compute by myself the confusion matrix and plot it with a seaborn.heatmap but then the appearance will not be consistent with the other plots.

Is this something that can be done? or maybe is it an enhancement suggestion?

Thanks

@jake-mason
Copy link

jake-mason commented Jan 24, 2020

What prompts you to represent a continuous outcome/prediction ($ amount) in terms of a confusion matrix (meant for binary or categorical modeling tasks)? It seems to me the output of a confusion matrix with even tens of different categories represented would be difficult to understand, let alone potentially thousands of categories.

I assume you're trying to understand your model's performance across the entire dollar range, to see where there may be gaps. Have you tried a residual plot (i.e. plotting predicted $ amount on the x-axis, and the error on the y-axis?

I suppose you could try binning your $ amounts to reduce the cardinality in the predictions/actual outcomes but that seems arbitrary and roundabout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants