Skip to content
Fritz Obermeyer edited this page Jul 21, 2017 · 4 revisions

Here are some example public data sets that would be interesting to analyze with TreeCat:

#rows #cols type Info
6030 93 mixed GitHub Open Source Survey 2017
5400 12 categorical bninfo example dataset
1043 660918 ? Human Genome Diversity Project
517K varies categorical, sparse binary Enron Emails
2.45M 68 categorical US Census (1990)
191K 481 categorical, ordinal KDD Cup 1998

Questions

  • Are there appropriately-sized genetic datasets? How should we deal with alignment?
Clone this wiki locally