You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Information may be stored in multiple times in the database, this came to light in openml/openml-python#1289 (comment). We should avoid storing duplicate information in the database, because it can easily lead to multiple truths. This issue can be used to keep track of all duplicate data, with the intention to refactor our database in the future to avoid these pitfalls:
Feature attributes (e.g., ignore_attributes) information is duplicated between the expdb.dataset table and the expdb.data_features table.
The text was updated successfully, but these errors were encountered:
I wasn't involved with the database design, so I can't comment on why the duplication exists. I hope to discuss this with Jan later, but changes to the database likely won't happen yet in the next few months as we are focusing on a (mostly) faithful reimplementation of the PHP REST API first. While this issue doesn't specifically mention it, potential changes to the database will be benchmarked and put into context with usage statistics, which helps us evaluate the alternatives. But in principle the change outlined is something that should be looked at.
Information may be stored in multiple times in the database, this came to light in openml/openml-python#1289 (comment). We should avoid storing duplicate information in the database, because it can easily lead to multiple truths. This issue can be used to keep track of all duplicate data, with the intention to refactor our database in the future to avoid these pitfalls:
ignore_attributes
) information is duplicated between theexpdb.dataset
table and theexpdb.data_features
table.The text was updated successfully, but these errors were encountered: