-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compression: Huffman Coding #1513
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other main point is string handling, which pretty consistently uses repeated concatenation here (e.g. for building prefixes (that one is mostly a non-concern though, since even with slow string concat, it'll be linearithmic time), for encoding / decoding). At least the encoding could be rewritten to use an array without making it any less readable.
Arguably all of this is fine in practice, as JS engines optimize string concatenation. I think it needs some comments alluding to this, though (I'm thinking something along the lines of "the string concatenation in a loop here is fine despite strings being immutable, as modern JS engines optimize it to be amortized constant time by delaying it (using ropes) and similar techniques").
I'm sorry about the unrelated changes due to formatting, that's due to a mistake on our part (see #1515).
Keep up the good work!
Compression/Huffman.js
Outdated
) | ||
|
||
while (nodes.length > 1) { | ||
nodes.sort((a, b) => a.freq - b.freq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you sort in every iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because when pushing a new node, the order may change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be more appropriate to use a max heap here (our existing implementations should work, you just need to import and use them), given that you always extract the most frequent ones and only push new ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would add a lot more complexity, wouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the API of our heap should be pretty straightforward to use. A heap is pretty much the data structure for this use case; it would significantly help the time complexity. If you want me to, I can make the necessary changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to implement it myself, then I'll need your valuable feedback and guidance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have you duplicated the code? I think there should only be a heap-based implementation; it is strictly better (more efficient) than the array-based one and of similar code complexity.
|
Adding the hacktoberfest-accepted label since we're close to the end of October, and this PR is pretty close to being accepted and merged later on. |
know more
Describe your change:
Checklist:
Example:
UserProfile.js
is allowed butuserprofile.js
,Userprofile.js
,user-Profile.js
,userProfile.js
are notFixes: #{$ISSUE_NO}
.