AEC_exp #397

GuyPerets106 · 2024-08-23T07:06:53Z

Added AEC capabilities
Added AEC stats
Fixed calculate_size function in nerl_tools
Added experiments json files
Modified hugging face json file datasets indexes (IMPORTANT FOR EXISTING NOTEBOOKS)

Tests:

Passed full flow test locally
Passed NIF test locally
Passed Distributed & Federated experiments (as presented Aug 20)

leondavi · 2024-08-23T21:01:18Z

src_cpp/opennnBridge/nerlWorkerOpenNN.cpp

@@ -1,5 +1,7 @@
 #include "nerlWorkerOpenNN.h"
 #include "ae_red.h" 
+#include <fstream>


We don't need fstream and iostream
I guess it was for debug
Please remove

leondavi · 2024-08-23T21:04:36Z

src_cpp/opennnBridge/nerlWorkerOpenNN.cpp

@@ -28,7 +30,7 @@ namespace nerlnet
    void NerlWorkerOpenNN::perform_training()
    {
        this->_training_strategy_ptr->set_data_set_pointer(this->_data_set.get());
-
+        this->_training_strategy_ptr->get_loss_index_pointer()->set_regularization_method(LossIndex::RegularizationMethod::L2); // ! ADDED NOW


No need. Please validate that this is rebased on top of master.
In master there is already implementation for regularization.

NErlNet/src_cpp/opennnBridge/nerlWorkerOpenNN.cpp

Line 175 in c01ac0b

_training_strategy_ptr->get_loss_index_pointer()->set_regularization_method(parse_regularization_loss_args(_loss_args_str));

leondavi · 2024-08-23T21:06:16Z

src_cpp/opennnBridge/nerlWorkerOpenNN.cpp

+                int num_of_samples = _aec_data_set->dimension(0);
+                loss_val_tensor = std::make_shared<fTensor2D>(1, 1);
+                (*loss_val_tensor)(0, 0) = static_cast<float>(_last_loss); 
+                (*loss_val_tensor)(1, 0) = _ae_red_ptr->_ema_event; // Mask the following lines to get reduction in data tranfers sizes, or Unmask to enable AEC stats


Add TODO here that we should envelope it with if statement and pass model arg that controls it.

leondavi · 2024-08-23T21:07:25Z

src_cpp/opennnBridge/nerlWorkerOpenNN.cpp

+                // Add _aec_all_loss_values to loss_val_tensor
+                for (int i = 0; i < num_of_samples; i++)
+                {
+                    (*loss_val_tensor)(3 + i, 0) = (*_aec_all_loss_values)(i, 0);


It looks wrong to go through all samples and them one by one.
Add TODO optimization here.

leondavi · 2024-08-23T21:08:07Z

src_cpp/opennnBridge/nerlWorkerOpenNN.cpp

-                break;
+                *loss_values_return = mse2D;
+                _aec_all_loss_values = loss_values_return;
+                // cout << "MSE Loss: " << mse_loss << endl;


remove commented cout

leondavi · 2024-08-23T21:08:15Z

src_cpp/opennnBridge/nerlWorkerOpenNN.cpp

+                _aec_all_loss_values = loss_values_return;
+                // cout << "MSE Loss: " << mse_loss << endl;
+                _ae_red_ptr->update_batch(loss_values_mse);
+                // cout << "AE_RED RESULT VECTOR:" << endl << *res << endl;


remove commented cout

leondavi · 2024-08-23T21:08:48Z

src_cpp/opennnBridge/nerlWorkerOpenNN.cpp

                *loss_values_mse = mse2D;
-                result_ptr = _ae_red_ptr->update_batch(loss_values_mse);
+                // string filename = "/tmp/nerlnet/predict_errors.csv";


remove commented code block or move it as a function of ae_red.

leondavi · 2024-08-23T21:10:02Z

src_cpp/opennnBridge/nerlWorkerOpenNN.cpp

@@ -569,6 +598,11 @@ namespace nerlnet
            }  
            curr_layer = curr_layer->get_next_layer_ptr();
        }
+        // Write the model parameters to file


add a dedicated function that saves model parameters.
We should add this capability to NIF and API-Server at some point.

But remove this comment from here anyway.

leondavi · 2024-08-23T21:12:56Z

src_cpp/opennn

This is maybe the reason for the crash.
We should carefully merge your changes to opennn. I think that Ori did changes as well.

leondavi · 2024-08-23T21:23:32Z

src_cpp/opennnBridge/nerlWorkerOpenNN.cpp

                fTensor2D diff = (*calculate_res - *_aec_data_set);
                fTensor2D squared_diff = diff.pow(2);
                fTensor1D sum_squared_diff = squared_diff.sum(Eigen::array<int, 1>({1}));
                fTensor1D mse1D = (1.0 / static_cast<float>(_aec_data_set->dimension(0))) * sum_squared_diff;
-                fTensor2D mse2D = mse1D.reshape(Eigen::array<int, 2>({num_of_samples, 1}));
+                fTensor2D mse2D = mse1D.reshape(Eigen::array<int, 2>({(int)num_of_samples, 1}));
+                // cout << "MSE2D: " << mse2D << endl;
                *loss_values_mse = mse2D;


This can cause a memory issue. You try to put a tensor that will be deleted when this scope ends into a shared pointer. The shared pointer cannot take control of this memory block. This is a wrong usage of shared pointers.

leondavi · 2024-08-23T21:23:55Z

src_cpp/opennnBridge/nerlWorkerOpenNN.cpp

-                _ae_red_ptr->update_batch(loss_values_mse); // Update thresholds
-
-                break;
+                *loss_values_return = mse2D;


This line again

leondavi · 2024-08-23T21:25:10Z

src_cpp/opennnBridge/nerlWorkerOpenNN.cpp

-
-                break;
+                *loss_values_return = mse2D;
+                _aec_all_loss_values = loss_values_return;


I don't get what the purpose of this line. It is a switch between pointers but at least needs an explanation here.

GuyPerets106 added 30 commits June 24, 2024 20:05

14 Devices Distributed Experiment Added

899cb46

Merge branch 'master' of github.com:leondavi/NErlNet

751557a

Merge branch 'master' of github.com:leondavi/NErlNet

94ad928

[AEC] Exp working

dc1a39d

Merge branch 'master' of github.com:leondavi/NErlNet into AEC_exp

82c96ce

Merge branch 'master' of github.com:leondavi/NErlNet into AEC_exp

c8432e6

[AEC_exp] Small changes

19ebdda

Merge branch 'master' of github.com:leondavi/NErlNet into AEC_exp

b3f7a32

Merge branch 'master' of github.com:leondavi/NErlNet into AEC_exp

763a7fe

[AEC_exp] Changes

3bbb2da

[AEC_exp] Latest changes

e85565c

Merge branch 'master' of github.com:leondavi/NErlNet into AEC_exp

88f67a7

Merge branch 'master' of github.com:leondavi/NErlNet into AEC_exp

4217701

[AEC_Exp] WIP

33db2e1

Merge branch 'master' of github.com:leondavi/NErlNet into AEC_exp

1ab46bf

[AEC] Added EMA Only Mode

5ea5646

[AEC_Exp] Exp Notebook WIP

e6baaa7

[AEC_Exp] WIP

c70252c

[AEC_Exp] WIP

2993be1

[AEC_Exp] WIP RR

f891f2d

[AEC_Exp] Added source policy check in stats

24c7cf0

[AEC_Exp] RR Fixes

6903794

Merge branch 'master' of github.com:leondavi/NErlNet into AEC_exp

5805829

Merge branch 'master' of github.com:leondavi/NErlNet into AEC_exp

fe850d1

[AEC_Exp] WIP

3d99d22

Merge branch 'master' of github.com:leondavi/NErlNet into AEC_exp

e4ed4bd

[AEC_EXP] WIP

0a8f0e0

Merge branch 'master' of github.com:leondavi/NErlNet into AEC_exp

dc4251e

[AEC_Exp] JSON Files update

4c3dacd

Merge branch 'master' of github.com:leondavi/NErlNet into AEC_exp

b61937a

GuyPerets106 added 20 commits August 15, 2024 20:42

[AEC] WIP

4bda261

[AEC] WIP

82532ec

[AEC] WIP

f590d71

[AEC] WIP

87a14df

[AEC] WIP

a48cfe4

[AEC] WIP

8477898

[AEC] WIP

2dcf9d9

[AEC] WIP

bf2c5a3

[AEC] WIP

cf0d390

[AEC] WIP

3e20266

removed loss_values tensor

f58e8ef

Added Size Print

eb5e633

Size Print

45a196d

Print Size

7ea8715

Removed Size Print

797c619

Changed Calculate Size

48adb71

Print Size

f8ce3e3

Removed Prints

26e41ab

Merge branch 'master' of github.com:leondavi/NErlNet into AEC_exp

730387c

[AEC_Exp] Last commit - remove prints

765e459

GuyPerets106 requested review from leondavi, ohad123, NoaShapira8 and Orisadek as code owners August 23, 2024 07:06

leondavi requested changes Aug 23, 2024

View reviewed changes

GuyPerets106 added 3 commits August 27, 2024 20:57

OpenNN ScalingLayer Fix

dec2550

Shared Pointers Fix

549ddfa

Fixed loss_val_tensor size

81d50d6

leondavi merged commit 0916ba0 into master Aug 27, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AEC_exp #397

AEC_exp #397

GuyPerets106 commented Aug 23, 2024

leondavi Aug 23, 2024 •

edited

Loading

leondavi Aug 23, 2024

leondavi Aug 23, 2024

leondavi Aug 23, 2024

leondavi Aug 23, 2024

leondavi Aug 23, 2024

leondavi Aug 23, 2024

leondavi Aug 23, 2024

leondavi Aug 23, 2024

leondavi Aug 23, 2024 •

edited

Loading

leondavi Aug 23, 2024

leondavi Aug 23, 2024

AEC_exp #397

AEC_exp #397

Conversation

GuyPerets106 commented Aug 23, 2024

leondavi Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leondavi Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leondavi Aug 23, 2024 •

edited

Loading

leondavi Aug 23, 2024 •

edited

Loading