-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RIFF/WAVE support, using ID3v2 #321
Conversation
mutagen/wave.py
Outdated
except KeyError as e: | ||
raise error(str(e)) | ||
|
||
data = waveFormatChunk.read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if 16 bytes are read so the unpack() below doesn't fail for truncated files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔
mutagen/wave.py
Outdated
self.bitrate = self.channels * block_align * self.sample_rate | ||
|
||
try: | ||
waveDataChunk = waveFile[u'data'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does checking for "data" add anything useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not checking, it is accessing the data chunk length to calculate the duration (length in seconds) of the audio. I put comment on top of it; removed throwing exception in following line with assigning 0 to number_of_samples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, right, I missed that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔
waveDataChunk = waveFile[u'data'] | ||
except KeyError as e: | ||
raise error(str(e)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check that block_align and self.sample_rate > 0 or we get a division by zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔
mutagen/wave.py
Outdated
"""Completely removes the ID3 chunk from the AIFF file""" | ||
|
||
try: | ||
del RiffFile(filething.fileobj)[u'ID3'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "ID3" correct here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the ID3v2 chunk should be considered as a multi purpose chunk (like RIFF and LIST) it should be in capitals everywhere. But that is a different discussion. ✔
mutagen/_riff.py
Outdated
raise KeyError("First chunk should be a RIFF chunk.") | ||
|
||
# Read the RIFF file Type | ||
self.fileType = fileobj.read(4).decode('ascii') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decode can raise
mutagen/_riff.py
Outdated
|
||
# Calculate the location of the next chunk, | ||
# considering the pad byte | ||
# self.__next_offset += self.__next_offset % 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this about?
mutagen/wave.py
Outdated
# looks like this is failing if python is not started with -bb in TravisCI: | ||
# assert isinstance(id, text_type) | ||
|
||
return ((len(id) <= 4) and (min(id) >= u' ') and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!=4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That has been derived from aiff.py
If the chunks are indexed including the space padding, the key should be 4 bytes long indeed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any idea why assert isinstance(id, text_type)
is failing in some circumstances?
What does option -bb
do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-bb
makes text_type(bytes())
fail (and comparisons between bytes/text iirc)
mutagen/wave.py
Outdated
|
||
def assert_valid_chunk_id(id): | ||
if not is_valid_chunk_id(id): | ||
raise KeyError("RIFF/WAVE-chunk-Id must be four ASCII characters.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: I'd prefer ValueError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔ moved to: _riff.py
mutagen/_riff.py
Outdated
def is_valid_chunk_id(id): | ||
assert isinstance(id, text_type) | ||
|
||
return ((len(id) <= 4) and (min(id) >= u' ') and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!= 4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire check is odd, why only the first character for the lower limit and the last char for the upper limit.
if len(id) != 4:
return False
for i in range(0, 3):
if id[i] < u' ' or id[i] > u'~':
return False
return True
✔
mutagen/wave.py
Outdated
self.tags.filename = self.filename | ||
|
||
fileobj.seek(0, 0) | ||
self.info = WaveStreamInfo(fileobj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we load WaveStreamInfo() before the ID3 tags? Might make it fail earlier in case of a wrong format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. ✔
mutagen/wave.py
Outdated
|
||
if 'id3 ' in waveFile: | ||
waveFile['id3 '].delete() | ||
if 'ID3 ' in waveFile: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that needed with the lower() stuff in WaveFile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔
self._update_size(new_data_size) | ||
|
||
|
||
class WaveFile(RiffFile): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this inherit from RiffFile if it reimplements everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is messy.
I started of with a copy of aiff.py because I knew it had a pretty simular structure.
The idea is to move everything what is part of the RIFF standard into _riff.py, and WAVE specific stuff into wave.py. Which was easier said then done, because is was not always clear to me what belonged to core RIFF and what to MS-WAVE. Because the sub-chunks used a little endian size, in contradiction to the RIFF chuck using big endian, I figured the sub-chunk definition belonged to WAV spec. Looks like I was wrong. I found somewhere that chunks with capital tag-id's are common purpose chunks and lower case are RIFF-type specific.
Anyway, I will do my best do clean up the mess.
Ref: - quodlibet#321 (comment) - a5c3785ed8d6a35868bc169f07e40e889087fd2e
Something is going wrong in those generic unit tests, I have difficulties understanding how this testing works. I see the constructor is called multiple times with one of the test files, and at a certain point it does not seem to recognize the file anymore as a RIFF/WAVE.
|
Yeah, it gets called with variations of truncated files and expects a subclass of MutagenError to be raised or to succeed. This for uncovering wrong/missing error handling in the parsing code. In your example traceback: load() should not raise ValueError for invalid files. |
If I have this failing test case:
How can I execute just this test case? It is stating |
@lazka Can you please can give some guidance? |
Ref: - quodlibet#321 (comment) - a5c3785ed8d6a35868bc169f07e40e889087fd2e
@Borewit I tested your code, rebased it against master and fixed the tests at https://github.com/phw/mutagen/commits/feature/riff-wave I also have a local patch for Picard to use this and tested this with Picard, seems to work just fine. You can either update your PR here or I can open a new PR, whatever you prefer. regarding the code itself I'm not so much into mutagen's internals. What worries me a bit is how the code in |
@phw thank you so much for picking up this work! It would be fantastic if the functionality ends up in Picard. Due to a lack of experience programming in Python and knowledge of Mutagen I got stuck and had to give up. AIFF and RIFF are somewhat simular. For example the size in AIFF chunk is big endian, in RIFF the size is encoded in little endian. I will include your changes in my PR. I created PR on my own PR. I haved added you as collaberator, which should give you the freedom to push changes on top of my PR. You don't hurt my feelings if you PR my changes directly (I have done that one time, and folks became pissed of, as if I comitted some plagiarism. I never understood that). |
Thanks for including my changes. Looks like we still have some issues with Python 2, I had not tested this locally. I will see whether I can provide fixes for this later this week, and if so will directly push it to this PR.
Yes, I also don't mind if people put their corrections on top of my PRs (as long as they leave my authorship intact and don't claim it to be their own work). But I know people sometimes take issue on this hence I rather ask before doing anything :) |
This was easier than expected, just a few instances where for Python 2 a proper unicode string is expected. Builds are running and so far it is looking good :) |
I don't mind the duplication.
ok |
Thanks @lazka. @phw, regarding:
I do as well, online or offline, have to dig it up. Can we allign the changes for a PR on Picard? |
Yes please :) I have my proposed changes linked in the Picard ticket. |
mutagen/_riff.py
Outdated
raise ValueError("Invalid RIFF-chunk-ID.") | ||
|
||
|
||
class RiffChunkHeader(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inherit from object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
mutagen/aiff.py
Outdated
@@ -44,9 +44,6 @@ def is_valid_chunk_id(id): | |||
|
|||
|
|||
def assert_valid_chunk_id(id): | |||
|
|||
assert isinstance(id, text_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was readded
mutagen/wave.py
Outdated
try: | ||
waveDataChunk = waveFile[u'data'] | ||
self.number_of_samples = waveDataChunk.data_size / block_align | ||
except KeyError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZeroDivisionError
mutagen/wave.py
Outdated
sample_size (`int`): The audio sample size | ||
""" | ||
|
||
length = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.0
mutagen/wave.py
Outdated
|
||
try: | ||
self.info = WaveStreamInfo(fileobj) | ||
except ValueError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is ValueError raised? Is this needed?
"""WAVE stream parsing errors.""" | ||
|
||
|
||
class WaveFile(RiffFile): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this to "_WaveFile"?
mutagen/_riff.py
Outdated
raise KeyError( | ||
"%r has no %r chunk" % (self._fileobj, id_)) | ||
|
||
def __delitem__(self, id_): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know aiff has the same code, but can you change this to delete_chunk(). It's confusing that del foo['id']
writes to the file and can raise IOError.
mutagen/wave.py
Outdated
"""Completely removes the ID3 chunk from the RIFF file""" | ||
|
||
try: | ||
del RiffFile(filething.fileobj)[u'id3'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to delete_chunk(), see above
mutagen/wave.py
Outdated
RiffFile.__init__(self, fileobj) | ||
|
||
if self.fileType != u'WAVE': | ||
raise MutagenError("Expected RIFF/WAVE.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MutagenError -> error
Ref: - quodlibet#321 (comment) - a5c3785ed8d6a35868bc169f07e40e889087fd2e
5b5f636
to
6ff97bf
Compare
…should be a text type
* fix wrong abstract attribute reference * fix reference to root ('RIFF') chuck
65cb514
to
c4b0b38
Compare
Thank you so much for pushing this effort forward @phw! |
Copy that, I will not overwrite it. |
For consistency with other formats.
c4b0b38
to
52274d7
Compare
I looked at the issue reported in #392 in detail, and examined the file structure exactly. Two results:
@lazka I think this is good for a final review. If you want I can cleanup the git history a bit and combine some commits (but make sure to maintain authorship). |
New PR with updated code at #408 |
See #408 |
Initial implementation of RIFF/WAVE format.
Currently using ID3v2 tag chunk to store metadata.
Storing metadata in the RIFF/INFO tag is still something which needs to be done.
Related issues: