Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added format check for method read_data in rawread #70

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Charliechen1
Copy link

The python audioop.lin2lin will complain if the length of data can not be divided by old_width, and it's not that convenient to check the length of the audio before using the model, especially when a large batches of audio files are used in some machine learning tasks. Therefore, I have made some patch for the input audio data if the length is not to the satisfaction. Thank you for taking my suggestion into consideration, and the project is truly intensive for me. 👍

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Thank you for the contribution—this looks promising.

Is it possible to provide a sample audio file where this fix is necessary? It would be great to have that documented here so we can come back to it if something goes wrong in the future.

I'd also love to hear more detail about why padding with the byte 0xFF is the right thing. I admit I'm not too knowledgable about the sample format, but I would have guessed that 0x00 (null) would have been the right padding byte.

Finally, I made some nitpicky Python style comments inline.

Thanks again!


remainder = len(data) % old_width
if remainder != 0 :
data = data + PATCH_BYTE*(old_width-remainder)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some very low-level style (i.e., PEP8) comments:

  • Please remove the whitespace on the blank line.
  • Please remove the space between the 0 and the : in the if statement.
  • Please add spaces around the binary operators * and -.

@@ -23,6 +23,7 @@

# Produce two-byte (16-bit) output samples.
TARGET_WIDTH = 2
PATCH_BYTE = b'\xff'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above doesn't apply to this constant. So please add a blank line above it and, ideally, write a brief sentence explaining what this is useful for.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really thanks to your advice, I am a fresh graduate, so there must be lots of things to learn lol. The suggestions are quite helpful to me.

@Charliechen1
Copy link
Author

Here for your reference:
I print the data and get:
b'\xff\xff\xff\xff\xfe\xff\xfc\xff\xfc\xff\xfc'
And I figured out that it's due to a broken download.
Therefore, would it be better to raise a warning under this circumstance?

@sampsyo
Copy link
Member

sampsyo commented Jul 12, 2018

Hmm; perhaps! But on the other hand, another reasonable (silent) fix might be to round down instead of up—that is, to drop the last (partial) sample if it exists. Would that make sense to you?

@Charliechen1
Copy link
Author

It should works.

@sampsyo
Copy link
Member

sampsyo commented Jul 13, 2018

OK, great! Want to give it a try and see if it works on the file you have?

@Charliechen1
Copy link
Author

Sure~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants