Implement OD verifier algorithm #420

vano105 · 2024-05-27T21:20:09Z

Implement novel algorithm for validating canonical ODs (order dependencies). This algorithm receives as input left and right column indices, context, and a flag to determine dependency order (ascending/descending). The algorithm outputs rows violated by swaps or splits.
Add unit tests for OD verification.

For more information about canonical ODs: http://www.vldb.org/pvldb/vol10/p721-szlichta.pdf

Introducing a novel algorithm to validate canonical order dependencies. This algorithm takes as input left and right column indices, context, and a flag indicating whether the dependency is ascending or descending. As output, it identifies rows where the dependency is violated through swaps or splits. Additionally, new options such as context and the ascending flag have been introduced. These parameters serve as inputs for the newly added algorithm. Furthermore, access rights in the class algos::fastod::ComplexStrippedPartition have been modified from private to protected. This adjustment was necessary as I developed a new class inheriting functionality from algos::fastod::ComplexStrippedPartition, requiring access to its private fields.

yakovypg · 2024-05-28T10:25:26Z

src/core/algorithms/od/mining_algorithms.h

@@ -2,3 +2,4 @@

 #include "algorithms/od/fastod/fastod.h"
 #include "algorithms/od/order/order.h"
+#include "algorithms/od/od_verifier/od_verifier.h"


od_verifier should be mentioned in verification_algorithms.h

yakovypg · 2024-05-28T10:36:14Z

src/core/algorithms/od/od_verifier/od_verifier.cpp

+void ODVerifier::LoadDataInternal() {
+    relation_ = ColumnLayoutRelationData::CreateFrom(*input_table_, is_null_equal_null_);
+
+    if (relation_->GetColumnData().empty()) {


You should maintain consistent style of writing braces.

yakovypg · 2024-05-28T10:46:01Z

src/core/algorithms/od/od_verifier/od_verifier.cpp

+}
+
+unsigned long long ODVerifier::ExecuteInternal() {
+    auto start_time = std::chrono::system_clock::now();


It is better to use existing methods. Use util::TimedInvoke (example).

yakovypg · 2024-05-28T10:48:36Z

src/core/algorithms/algorithm_types.h

-    split
+    split,
+/* Canonical OD verifier algorithm */
+    od_verifier


I think that OD verifier algorithms should be mentioned after OD mining algorithms.

yakovypg · 2024-05-28T10:49:09Z

src/core/algorithms/algorithm_types.h

@@ -11,7 +11,7 @@ using AlgorithmTypes =
                   Apriori, metric::MetricVerifier, DataStats, fd_verifier::FDVerifier, HyUCC,
                   PyroUCC, cfd::FDFirstAlgorithm, ACAlgorithm, UCCVerifier, Faida, Spider, Mind,
                   Fastod, GfdValidation, EGfdValidation, NaiveGfdValidation, order::Order,
-                   dd::Split>;
+                   dd::Split, od_verifier::ODVerifier>;


I think OD verifier algorithms should be mentioned after OD mining algorithms.

yakovypg · 2024-05-28T11:09:11Z

src/core/algorithms/od/od_verifier/od_verifier.h

+
+    // Returns the number of rows that violate the OD by split
+    size_t GetNumRowsViolateBySplit() const {
+        return row_violate_ods_by_split_.size();


It is better to implement methods in .cpp.

yakovypg · 2024-05-28T11:09:17Z

src/core/algorithms/od/od_verifier/od_verifier.h

+
+    // Returns the number of rows that violate the OD by swap
+    size_t GetNumRowsViolateBySwap() const {
+        return row_violate_ods_by_swap_.size();


It is better to implement methods in .cpp.

yakovypg · 2024-05-28T11:11:12Z

src/core/algorithms/od/od_verifier/partition.cpp

@@ -0,0 +1,57 @@
+#include "partition.h"
+
+#include <strings.h>


Is this include necessary? If not, remove it.

yakovypg · 2024-05-28T11:20:37Z

src/core/algorithms/od/od_verifier/partition.cpp

+
+        for (size_t i = group_begin + 1; i < group_end; i++) {
+            if (data_->GetValue((*sp_indexes_)[i], right) != group_value) {
+                violates.emplace_back(std::pair<int, int>(right, (*sp_indexes_)[i]));


We can omit creating std::pair due to using emplace_back.

yakovypg · 2024-05-28T11:24:59Z

src/core/algorithms/od/od_verifier/partition.cpp

+
+namespace algos::od_verifier {
+
+std::vector<std::pair<int, int>> Partition::CommonViolationBySplit(model::ColumnIndex right) const {


It is better to have using X = std::pair<int, int> for clarity of what std::pair<int, int> is.

yakovypg · 2024-05-28T17:29:34Z

src/core/algorithms/algorithm_types.h

@@ -76,7 +76,9 @@ BETTER_ENUM(AlgorithmType, char,
    order,


I think that OD mining algorithms should be mentioned next to each other, so you can move 'order'.

yakovypg · 2024-05-28T18:16:48Z

src/core/config/ascending_od/option.h

+#include "config/common_option.h"
+
+namespace config {
+extern CommonOption<AscendingODFlagType> const kAscendingODOpt;


'AscendingOD' option is useful only for OD verifier and won't be needed by other algorithms. So I think that it should be removed. Instead, you can use Option as shown here.

yakovypg · 2024-05-28T18:18:08Z

src/core/config/indices/od_context.h

+#include "config/indices/type.h"
+
+namespace config {
+extern CommonOption<IndicesType> const kODContextOpt;


'ODContext' option is useful only for OD verifier and won't be needed by other algorithms. So I think that it should be removed. Instead, you can use Option as shown here.

yakovypg · 2024-05-28T18:38:14Z

src/core/algorithms/od/od_verifier/od_verifier.h

+
+    // checks whether OD is violated and finds the rows where it is violated
+    template <bool Ascending>
+    void VerifyOD();


Template methods should be implemented in header file.

yakovypg

General comments about tests:

You should test not only dependencies of the form {X}: A ~ B, but also {X}: [] -> A.
You should test dependencies that contain several attributes in context.

yakovypg · 2024-05-28T19:01:00Z

src/tests/test_od_verifier.cpp

+
+struct ODVerifyingParams {
+    algos::StdParamsMap params;
+    size_t const row_violate_ods_by_split = 0;


Did I understand correctly that this field stores the number of rows that violate the order dependency by split? If so, the field should be renamed.

yakovypg · 2024-05-28T19:01:12Z

src/tests/test_od_verifier.cpp

+struct ODVerifyingParams {
+    algos::StdParamsMap params;
+    size_t const row_violate_ods_by_split = 0;
+    size_t const row_violate_ods_by_swap = 0;


Did I understand correctly that this field stores the number of rows that violate the order dependency by swap? If so, the field should be renamed.

yakovypg · 2024-05-28T19:35:30Z

src/core/algorithms/od/od_verifier/od_verifier.h

+    // input data
+    config::InputTable input_table_;
+    config::EqNullsType is_null_equal_null_;
+    IndicesType lhs_indices_;


Canonical ODs contain only one attribute (or don't contain attributes) in left side. Why lhs_indices_ type is std::vector?

yakovypg · 2024-05-28T19:35:51Z

src/core/algorithms/od/od_verifier/od_verifier.h

+    config::InputTable input_table_;
+    config::EqNullsType is_null_equal_null_;
+    IndicesType lhs_indices_;
+    IndicesType rhs_indices_;


Canonical ODs contain only one attribute in right side. Why rhs_indices_ type is std::vector?

yakovypg · 2024-05-31T09:03:48Z

src/core/algorithms/od/od_verifier/partition.h

+
+namespace algos::od_verifier {
+
+class Partition : protected algos::fastod::ComplexStrippedPartition {


Name of this class should be more specific.

yakovypg · 2024-05-31T09:07:07Z

src/core/algorithms/od/od_verifier/partition.h

+                }
+
+                if (!is_first_group && values[prev_group_max_index].second > second) {
+                    violates.push_back(std::pair<int, int>(right, row_pos[i]));


Use emplace_back.

vano105 added 4 commits May 27, 2024 20:43

Add test for OD verification

ac434f3

Fix clang-format

eff9615

Fix clang-format 2

33202df

yakovypg reviewed May 28, 2024

View reviewed changes

yakovypg reviewed May 31, 2024

View reviewed changes

Fix issues in pull request

d0221d4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement OD verifier algorithm #420

Implement OD verifier algorithm #420

vano105 commented May 27, 2024

yakovypg May 28, 2024

yakovypg May 28, 2024

yakovypg May 28, 2024

yakovypg May 28, 2024 •

edited

Loading

yakovypg May 28, 2024

yakovypg May 28, 2024

yakovypg May 28, 2024

yakovypg May 28, 2024

yakovypg May 28, 2024

yakovypg May 28, 2024

yakovypg May 28, 2024

yakovypg May 28, 2024

yakovypg May 28, 2024

yakovypg May 28, 2024

yakovypg left a comment •

edited

Loading

yakovypg May 28, 2024

yakovypg May 28, 2024

yakovypg May 28, 2024

yakovypg May 28, 2024

yakovypg May 31, 2024

yakovypg May 31, 2024


		namespace algos::od_verifier {

		std::vector<std::pair<int, int>> Partition::CommonViolationBySplit(model::ColumnIndex right) const {


		namespace algos::od_verifier {

		class Partition : protected algos::fastod::ComplexStrippedPartition {

Implement OD verifier algorithm #420

Are you sure you want to change the base?

Implement OD verifier algorithm #420

Conversation

vano105 commented May 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yakovypg May 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yakovypg left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yakovypg May 28, 2024 •

edited

Loading

yakovypg left a comment •

edited

Loading