Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Audio 7/?] Extract sequences to assembly #2119

Merged
merged 3 commits into from
Sep 4, 2024

Conversation

Thar0
Copy link
Contributor

@Thar0 Thar0 commented Sep 1, 2024

This PR updates the audio extraction tools to allow extracting all sequence files besides the "handwritten" sequences 0, 1, 2 and 109 to a textual assembly format. Names for the sequence instructions are lifted from prior work (SM64 Decomp, #1236 and #1509) and should not be considered as final, see the table in disassemble_sequence.py for the full set. The remaining sequence files and the build process for them will come next.

Co-authored-by: MNGoldenEagle <[email protected]>
Co-authored-by: zelda2774 <[email protected]>
Copy link
Contributor

@hensldm hensldm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't mark them, but there are a bunch of commented out print statements (probably debugging stuff right). My preference would be to clean them up, but I'm not going to push it too much this time.

Comment on lines 1176 to 1182
#found_sectype = None
#for offset,_,_,section_type in self.tables:
# if offset == base_pos:
# found_sectype = section_type
# break
#else:
# assert False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a TODO WIP or just old dead cod?
Same question for the outfile writes.

Copy link
Contributor

@cadmic cadmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty cool, I had no idea sequences were this complex, I thought it would be more like MIDI


# Disassemble to text

with ThreadPool(processes=os.cpu_count()) as pool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a ThreadPool doesn't actually make this any faster since the extraction code is pure Python, which is single-threaded due to the GIL. In fact it actually makes it slower for me because of the extra overhead (7.6s with the ThreadPool vs 7.0s without). I'd suggest either using a multiprocessing.Pool (which is very jank but we have some examples in the repo already) or adding a TODO to parallelize this later.

(sorry, I didn't notice earlier during samplebank extraction, same thing probably applies there)

Copy link
Contributor Author

@Thar0 Thar0 Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Samplebank extraction is definitely sped up from multiprocessing in this way. Before it would take me a minute to extract all the samples instead of ~6 seconds. I didn't see the same improvements for sequences though, likely for the reasons you mentioned. I'll try to adjust it to actually work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to get it to work with multiprocessing.Pool since it complains about being unable to pickle memoryview. I've decided to just not multiprocess sequences for now, it's not that slow anyway.

Comment on lines 157 to 158
# Tables have no clear start and end in a sequence. Mark the locations of all tables that appear in sequences.
seq_disas_hacks = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is indeed all tables, I wonder if we should promote this from a "hack" and call it "seq_disas_tables" or something

if ret & 0x80:
ret = ((ret << 8) & 0x7F00) | disas.read_u8()
if ret < 128 and disas.insn_begin not in disas.force_long:
print(f"Unnecessary use of long immediate encoding @ 0x{disas.insn_begin:X}: {ret}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this output, is this an MM-only thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In non-handwritten sequences this only happens in MM. In particular MM has 3 warnings in non-handwritten sequences:

[Sequence_90] Unnecessary use of long immediate encoding @ 0x38: 15
[Sequence_116] Invalid instrument sourced from Soundfont_36: 15
[Sequence_127] Invalid instrument sourced from Soundfont_37: 15

Both OoT and MM have unnecessary use of long encodings in their handwritten sequences.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I wonder if that indicates that there were indeed separate mnemonics for this originally, or if the encoding should be part of the mnemonic in some way (like the .s in sqrt.s I guess). Maybe it would be better to do that too instead of having the reassembler guess an encoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to do it as it is now since it's more user-friendly to let the assembler decide and just handle the special cases as-needed, but I think you're right and they probably didn't have it set up this way originally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are users going to write these by hand anyway? I've been imagining that they use some kind of compiler/converter from some other format like MIDI.

Or, I suppose there could be variants like

notedv.l  ; long encoding
notedv.s  ; short encoding
notedv    ; psuedo-instruction that expands to either

which is not too different from how it is now, but there'd be 1 less hack and it's clearer that it's not a real instruction when the assembler decides for notedv.

(I'm not insisting on changing this btw, I don't think it's worth it at this point. Just spitballing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that most won't be writing these by hand, and converters from MIDI would be used mostly. However there are cases where you do edit these by hand such as the handwritten sequences, particularly sequence 0 that implements all the sound effects. The assembler should be able to provide convenience features for these cases.

I don't mind changing the syntax to something like this, although the .s variant would never be used since the disassembler should just output the pseudo-instruction everywhere it can (so that readers of the extracted sequences don't get into bad habits)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I'm not sure it's clearly better, and I think it's a pretty big change with a fair amount of design work (which instructions need these variants? is .s really the right suffix or does that make it seem like a MIPS float?) which tbh I don't really want to be responsible for, given that I've been exposed to sequence assembly for about 1 hour now

extract_sequences(audioseq_seg, extracted_dir, version_info, write_xml, sequence_table, sequence_font_table,
sequence_xmls, soundfonts)

print("Done")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe print(f"Sequence extraction took {dt:.3f}s") (instead of print("Done")? The "Done" seems a bit out of place)

SqSection.ENVELOPE : "ENVELOPE",
SqSection.FILTER : "FILTER",
SqSection.UNKNOWN : "UNK",
}[target_section]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi SqSection.SEQ.name == "SEQ" and so on, could scrap this thing (well unk has a different name though, intentional? if so... ?)

@fig02 fig02 added the wait Wait to merge label Sep 3, 2024
Comment on lines +1 to +2
<!-- This file is only for extraction of vanilla data. -->
<Sequence Name="Sequence_10" Index="10"/>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe baserom instead of vanilla, but meh

Copy link
Collaborator

@fig02 fig02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works on my machine

@fig02 fig02 enabled auto-merge (squash) September 4, 2024 17:54
@fig02 fig02 merged commit f1911cd into zeldaret:main Sep 4, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants