Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rnn Relu refactoring #2541

Open
wants to merge 32 commits into
base: develop
Choose a base branch
from
Open

Rnn Relu refactoring #2541

wants to merge 32 commits into from

Conversation

kikimych
Copy link
Contributor

Extracted Relu|Tanh activation functions related code from RNNForwardTrainingPackedTensors to RNNForwardTrainingTanhRelu, from RNNBackwardDataPackedTensors to RNNBackwardDataPackedTensorsRelu

@kikimych kikimych marked this pull request as draft November 20, 2023 18:26
@junliume junliume requested a review from shurale-nkn December 8, 2023 20:24
@junliume
Copy link
Contributor

junliume commented Dec 8, 2023

@shurale-nkn @JehandadKhan please review

@junliume junliume requested a review from JehandadKhan December 8, 2023 20:24
Copy link
Contributor

@shurale-nkn shurale-nkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

@shurale-nkn
Copy link
Contributor

@junliume please add @atamazov to reviewers

@atamazov
Copy link
Contributor

@kikimych Is it time to review? I see this is still a draft.

@kikimych kikimych marked this pull request as ready for review December 12, 2023 20:21
const int m = direction == rnn_direction::Forward ? fbatches.at(time)
: time == 0 ? rbatches.at(0)
: rbatches.at(time - 1);
const int cur_time = direction == rnn_direction::Forward ? time : seq_len - 1 - time;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kikimych
I remember you said that you would make a structure for direct and reverse access to elements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability
auto batch_id_abs = time == seq_len - 1 ? bacc_per_time[directed_time.cur_time()] : bacc_per_time[directed_time.cur_time()] + batches.at(directed_time.next_time());
vs
auto batch_id_abs = time == seq_len - 1 ? bacc_per_time[cur_time] : bacc_per_time[cur_time] + batches.at(next_time);
for example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question was about all code where bacc_per_time or batches is used.
why not class with API : bacc_per_time.at(time, direction);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto batch_id_abs = time == seq_len - 1 ? bacc_per_time[cur_time] : bacc_per_time[cur_time] + batches.at(next_time)
vs
auto batch_id_abs = time == seq_len - 1 ? bacc_per_time(time, direction) : bacc_per_time(time,direction) + batches.at(time+1, direction)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so? what conclusion should we get based on this single line?
how this line would look was quite obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After 2 weeks both cases are not obvious at all


RNNTensorPaddingConverter::ConvertTensorData(
handle, dyDesc[0], in_n, dy, packedDYIn, true);
auto propagate_dhx = [*this, seqLen, &propagate_dhx_prev, &dhx](int layer) {
Copy link
Contributor

@shurale-nkn shurale-nkn Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*this
copy-capture for current class? really?

&dhx
reference to ptr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

reserveSpace,
&activDesc,
propagate_hidden_output,
propagate_hidden_prev](int layer, int time, int direction) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const int m = direction == rnn_direction::Forward ? fbatches.at(time)
: time == 0 ? rbatches.at(0)
: rbatches.at(time - 1);
const int cur_time = direction == rnn_direction::Forward ? time : seq_len - 1 - time;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question was about all code where bacc_per_time or batches is used.
why not class with API : bacc_per_time.at(time, direction);

reserveSpace,
&activDesc,
propagate_hidden_output,
propagate_hidden_prev](int layer, int time, int direction) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so there is no copy of lambda members? ❔

Copy link
Contributor

@shurale-nkn shurale-nkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format failed

@@ -2528,6 +3104,34 @@ void RNNDescriptor::RNNForwardTrainingPackedTensors(
}
return;
}

if((rnnMode == miopenRNNRELU || rnnMode == miopenRNNTANH) && !use_dropout &&
inputMode != miopenRNNskip && !(miopen::IsDisabled(ENV(MIOPEN_RNNFWD_exp))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not miopenRNNskip?

int first_layer_offset() const { return (in_vec_sz + h_vec_sz) * weight_stride; }
};

struct RNNOffsets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not fixed

const int m = direction == rnn_direction::Forward ? fbatches.at(time)
: time == 0 ? rbatches.at(0)
: rbatches.at(time - 1);
const int cur_time = direction == rnn_direction::Forward ? time : seq_len - 1 - time;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so? what conclusion should we get based on this single line?
how this line would look was quite obvious.

Comment on lines 38 to 42
enum rnn_direction
{
Forward = 0,
Backward = 1
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly a question for @atamazov - do we use enum class or have plans to switch to enum class everywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's prefer enum class. But switching to it everywhere is impossible IIRC.

Backward = 1
};

struct RnnBatches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct X{
 int at(int time, RnnDirection direction){
     return RnnDirection::Forward ? batches[time] : batches[ (batches.size() - 1) - time];
 }
...
 std::vector<int> batches;
}

this is true for any direction
.next(time, direction) == .at(time+1, direction)
.prev(time, direction) == .at(time-1, direction)

next_time() and prev_time() are redundant functions.

explanation

FWD next_time:
cur_time(time, direction) + 1 => (time+1)

BWD next_time:
cur_time(time, direction) - 1 => (batches.size() - time - 1) - 1 => (batches.size() - (time+1) - 1)

BWD prev_time:
cur_time(time, direction) + 1 => (batches.size() - time - 1) + 1 => (batches.size() - (time-1) - 1)

next() and prev() can be used as syntactic sugar if you want, but than remove mix using of +1 and next() in code

@shurale-nkn
Copy link
Contributor

CI reports about a merge conflict that cannot be resolved automatically in rnnocl.cpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants