Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak in InmemReader #254

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

m2ng
Copy link

@m2ng m2ng commented May 2, 2019

xLearn::InmemReader suffers from the problem of memory leak, which will affect the performance of training/prediction on large dataset.

@@ -247,7 +247,8 @@ void InmemReader::init_from_txt() {
std::string bin_file = filename_ + ".bin";
data_buf_.Serialize(bin_file);
}
delete [] block_;
free(block_);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, does there any different between delete [] and free()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am new to C++. Please point me out when if I got anything mistaken. The difference was answered in here. Since _block is a char pointer, I think the free operator would be a better one since a char does not have any constructor nor any destructor.

@@ -195,9 +197,8 @@ class InmemReader : public Reader {
// Free the memory of data matrix.
virtual void Clear() {
data_buf_.Reset();
data_samples_.Reset();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_samples_.Reset() will clear some memory, why remove this line of code? Thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I was debugging and walking through the code, I discovered that the Row variable in both data_buf and data_sample are storing pointers that point to the same memory.

Besides, even though the Reset method is called on one DMatrix, the Row variable in another DMatrix will not be reset. This causes the Row variable to keep storing SparseRow pointers pointing to the memory that was already freed.

Calling the Reset method again on data_sample will cause a segfault. So, I removed data_sample.Reset(); because this line is unnecessary.

Copy link
Collaborator

@etveritas etveritas May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because of Clear added to destructor of InmemReader? I found that we use default destructor for InmemReader before. In other words, the member function Clear of InmemReader (maybe including FromDMReader) class never used in program,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants