-
Notifications
You must be signed in to change notification settings - Fork 614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Audio 7/?] Extract sequences to assembly #2119
Conversation
Co-authored-by: MNGoldenEagle <[email protected]> Co-authored-by: zelda2774 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't mark them, but there are a bunch of commented out print statements (probably debugging stuff right). My preference would be to clean them up, but I'm not going to push it too much this time.
#found_sectype = None | ||
#for offset,_,_,section_type in self.tables: | ||
# if offset == base_pos: | ||
# found_sectype = section_type | ||
# break | ||
#else: | ||
# assert False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a TODO WIP or just old dead cod?
Same question for the outfile writes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty cool, I had no idea sequences were this complex, I thought it would be more like MIDI
|
||
# Disassemble to text | ||
|
||
with ThreadPool(processes=os.cpu_count()) as pool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a ThreadPool doesn't actually make this any faster since the extraction code is pure Python, which is single-threaded due to the GIL. In fact it actually makes it slower for me because of the extra overhead (7.6s with the ThreadPool vs 7.0s without). I'd suggest either using a multiprocessing.Pool
(which is very jank but we have some examples in the repo already) or adding a TODO to parallelize this later.
(sorry, I didn't notice earlier during samplebank extraction, same thing probably applies there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Samplebank extraction is definitely sped up from multiprocessing in this way. Before it would take me a minute to extract all the samples instead of ~6 seconds. I didn't see the same improvements for sequences though, likely for the reasons you mentioned. I'll try to adjust it to actually work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to get it to work with multiprocessing.Pool
since it complains about being unable to pickle memoryview
. I've decided to just not multiprocess sequences for now, it's not that slow anyway.
tools/audio_extraction.py
Outdated
# Tables have no clear start and end in a sequence. Mark the locations of all tables that appear in sequences. | ||
seq_disas_hacks = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is indeed all tables, I wonder if we should promote this from a "hack" and call it "seq_disas_tables" or something
if ret & 0x80: | ||
ret = ((ret << 8) & 0x7F00) | disas.read_u8() | ||
if ret < 128 and disas.insn_begin not in disas.force_long: | ||
print(f"Unnecessary use of long immediate encoding @ 0x{disas.insn_begin:X}: {ret}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this output, is this an MM-only thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In non-handwritten sequences this only happens in MM. In particular MM has 3 warnings in non-handwritten sequences:
[Sequence_90] Unnecessary use of long immediate encoding @ 0x38: 15
[Sequence_116] Invalid instrument sourced from Soundfont_36: 15
[Sequence_127] Invalid instrument sourced from Soundfont_37: 15
Both OoT and MM have unnecessary use of long encodings in their handwritten sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I wonder if that indicates that there were indeed separate mnemonics for this originally, or if the encoding should be part of the mnemonic in some way (like the .s in sqrt.s I guess). Maybe it would be better to do that too instead of having the reassembler guess an encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to do it as it is now since it's more user-friendly to let the assembler decide and just handle the special cases as-needed, but I think you're right and they probably didn't have it set up this way originally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are users going to write these by hand anyway? I've been imagining that they use some kind of compiler/converter from some other format like MIDI.
Or, I suppose there could be variants like
notedv.l ; long encoding
notedv.s ; short encoding
notedv ; psuedo-instruction that expands to either
which is not too different from how it is now, but there'd be 1 less hack and it's clearer that it's not a real instruction when the assembler decides for notedv
.
(I'm not insisting on changing this btw, I don't think it's worth it at this point. Just spitballing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that most won't be writing these by hand, and converters from MIDI would be used mostly. However there are cases where you do edit these by hand such as the handwritten sequences, particularly sequence 0 that implements all the sound effects. The assembler should be able to provide convenience features for these cases.
I don't mind changing the syntax to something like this, although the .s variant would never be used since the disassembler should just output the pseudo-instruction everywhere it can (so that readers of the extracted sequences don't get into bad habits)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I'm not sure it's clearly better, and I think it's a pretty big change with a fair amount of design work (which instructions need these variants? is .s
really the right suffix or does that make it seem like a MIPS float?) which tbh I don't really want to be responsible for, given that I've been exposed to sequence assembly for about 1 hour now
extract_sequences(audioseq_seg, extracted_dir, version_info, write_xml, sequence_table, sequence_font_table, | ||
sequence_xmls, soundfonts) | ||
|
||
print("Done") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe print(f"Sequence extraction took {dt:.3f}s")
(instead of print("Done")
? The "Done" seems a bit out of place)
SqSection.ENVELOPE : "ENVELOPE", | ||
SqSection.FILTER : "FILTER", | ||
SqSection.UNKNOWN : "UNK", | ||
}[target_section] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi SqSection.SEQ.name == "SEQ"
and so on, could scrap this thing (well unk has a different name though, intentional? if so... ?)
<!-- This file is only for extraction of vanilla data. --> | ||
<Sequence Name="Sequence_10" Index="10"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe baserom instead of vanilla, but meh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works on my machine
This PR updates the audio extraction tools to allow extracting all sequence files besides the "handwritten" sequences 0, 1, 2 and 109 to a textual assembly format. Names for the sequence instructions are lifted from prior work (SM64 Decomp, #1236 and #1509) and should not be considered as final, see the table in
disassemble_sequence.py
for the full set. The remaining sequence files and the build process for them will come next.