[Audio 7/?] Extract sequences to assembly #2119

Thar0 · 2024-09-01T23:04:07Z

This PR updates the audio extraction tools to allow extracting all sequence files besides the "handwritten" sequences 0, 1, 2 and 109 to a textual assembly format. Names for the sequence instructions are lifted from prior work (SM64 Decomp, #1236 and #1509) and should not be considered as final, see the table in disassemble_sequence.py for the full set. The remaining sequence files and the build process for them will come next.

Co-authored-by: MNGoldenEagle <[email protected]> Co-authored-by: zelda2774 <[email protected]>

hensldm

Didn't mark them, but there are a bunch of commented out print statements (probably debugging stuff right). My preference would be to clean them up, but I'm not going to push it too much this time.

hensldm · 2024-09-02T17:02:39Z

tools/audio/extraction/disassemble_sequence.py

+            #found_sectype = None
+            #for offset,_,_,section_type in self.tables:
+            #    if offset == base_pos:
+            #        found_sectype = section_type
+            #        break
+            #else:
+            #    assert False


Is this a TODO WIP or just old dead cod?
Same question for the outfile writes.

cadmic

Pretty cool, I had no idea sequences were this complex, I thought it would be more like MIDI

cadmic · 2024-09-02T18:23:29Z

tools/audio/extraction/audio_extract.py

+
+    # Disassemble to text
+
+    with ThreadPool(processes=os.cpu_count()) as pool:


Using a ThreadPool doesn't actually make this any faster since the extraction code is pure Python, which is single-threaded due to the GIL. In fact it actually makes it slower for me because of the extra overhead (7.6s with the ThreadPool vs 7.0s without). I'd suggest either using a multiprocessing.Pool (which is very jank but we have some examples in the repo already) or adding a TODO to parallelize this later.

(sorry, I didn't notice earlier during samplebank extraction, same thing probably applies there)

Samplebank extraction is definitely sped up from multiprocessing in this way. Before it would take me a minute to extract all the samples instead of ~6 seconds. I didn't see the same improvements for sequences though, likely for the reasons you mentioned. I'll try to adjust it to actually work.

I wasn't able to get it to work with multiprocessing.Pool since it complains about being unable to pickle memoryview. I've decided to just not multiprocess sequences for now, it's not that slow anyway.

cadmic · 2024-09-02T18:24:41Z

tools/audio_extraction.py

+    # Tables have no clear start and end in a sequence. Mark the locations of all tables that appear in sequences.
+    seq_disas_hacks = {


if this is indeed all tables, I wonder if we should promote this from a "hack" and call it "seq_disas_tables" or something

cadmic · 2024-09-02T18:50:55Z

tools/audio/extraction/disassemble_sequence.py

+        if ret & 0x80:
+            ret = ((ret << 8) & 0x7F00) | disas.read_u8()
+            if ret < 128 and disas.insn_begin not in disas.force_long:
+                print(f"Unnecessary use of long immediate encoding @ 0x{disas.insn_begin:X}: {ret}")


I don't see this output, is this an MM-only thing?

In non-handwritten sequences this only happens in MM. In particular MM has 3 warnings in non-handwritten sequences:

[Sequence_90] Unnecessary use of long immediate encoding @ 0x38: 15 [Sequence_116] Invalid instrument sourced from Soundfont_36: 15 [Sequence_127] Invalid instrument sourced from Soundfont_37: 15

Both OoT and MM have unnecessary use of long encodings in their handwritten sequences.

Interesting, I wonder if that indicates that there were indeed separate mnemonics for this originally, or if the encoding should be part of the mnemonic in some way (like the .s in sqrt.s I guess). Maybe it would be better to do that too instead of having the reassembler guess an encoding?

I think it's better to do it as it is now since it's more user-friendly to let the assembler decide and just handle the special cases as-needed, but I think you're right and they probably didn't have it set up this way originally.

Are users going to write these by hand anyway? I've been imagining that they use some kind of compiler/converter from some other format like MIDI.

Or, I suppose there could be variants like

notedv.l ; long encoding notedv.s ; short encoding notedv ; psuedo-instruction that expands to either

which is not too different from how it is now, but there'd be 1 less hack and it's clearer that it's not a real instruction when the assembler decides for notedv.

(I'm not insisting on changing this btw, I don't think it's worth it at this point. Just spitballing)

It's true that most won't be writing these by hand, and converters from MIDI would be used mostly. However there are cases where you do edit these by hand such as the handwritten sequences, particularly sequence 0 that implements all the sound effects. The assembler should be able to provide convenience features for these cases.

I don't mind changing the syntax to something like this, although the .s variant would never be used since the disassembler should just output the pseudo-instruction everywhere it can (so that readers of the extracted sequences don't get into bad habits)

Hm I'm not sure it's clearly better, and I think it's a pretty big change with a fair amount of design work (which instructions need these variants? is .s really the right suffix or does that make it seem like a MIPS float?) which tbh I don't really want to be responsible for, given that I've been exposed to sequence assembly for about 1 hour now

cadmic · 2024-09-02T19:23:17Z

tools/audio/extraction/audio_extract.py

+    extract_sequences(audioseq_seg, extracted_dir, version_info, write_xml, sequence_table, sequence_font_table,
+                      sequence_xmls, soundfonts)
+
+    print("Done")


Maybe print(f"Sequence extraction took {dt:.3f}s") (instead of print("Done")? The "Done" seems a bit out of place)

krm01 · 2024-09-02T20:52:36Z

tools/audio/extraction/disassemble_sequence.py

+            SqSection.ENVELOPE : "ENVELOPE",
+            SqSection.FILTER   : "FILTER",
+            SqSection.UNKNOWN  : "UNK",
+        }[target_section]


fyi SqSection.SEQ.name == "SEQ" and so on, could scrap this thing (well unk has a different name though, intentional? if so... ?)

…nces

fig02 · 2024-09-03T02:02:28Z

assets/xml/audio/sequences/seq_10.xml

+<!-- This file is only for extraction of vanilla data. -->
+<Sequence Name="Sequence_10" Index="10"/>


maybe baserom instead of vanilla, but meh

fig02

works on my machine

[Audio 7/?] Extract sequences to assembly

a1710af

Co-authored-by: MNGoldenEagle <[email protected]> Co-authored-by: zelda2774 <[email protected]>

hensldm reviewed Sep 2, 2024

View reviewed changes

cadmic reviewed Sep 2, 2024

View reviewed changes

krm01 approved these changes Sep 2, 2024

View reviewed changes

Suggested changes, some extra sequence disassembler cleanup

5e447f6

krm01 approved these changes Sep 2, 2024

View reviewed changes

Remove unused multiprocessing import and regen assets/xml/audio/seque…

cce8ddb

…nces

fig02 added the wait Wait to merge label Sep 3, 2024

fig02 reviewed Sep 3, 2024

View reviewed changes

fig02 approved these changes Sep 3, 2024

View reviewed changes

fig02 added the Met Review Requirements label Sep 3, 2024

cadmic approved these changes Sep 3, 2024

View reviewed changes

fig02 enabled auto-merge (squash) September 4, 2024 17:54

fig02 merged commit f1911cd into zeldaret:main Sep 4, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Audio 7/?] Extract sequences to assembly #2119

[Audio 7/?] Extract sequences to assembly #2119

Thar0 commented Sep 1, 2024

hensldm left a comment

hensldm Sep 2, 2024

cadmic left a comment

cadmic Sep 2, 2024

Thar0 Sep 2, 2024 •

edited

Loading

Thar0 Sep 2, 2024

cadmic Sep 2, 2024

cadmic Sep 2, 2024

Thar0 Sep 2, 2024

cadmic Sep 2, 2024

Thar0 Sep 2, 2024

cadmic Sep 2, 2024

Thar0 Sep 2, 2024

cadmic Sep 2, 2024

cadmic Sep 2, 2024

krm01 Sep 2, 2024

fig02 Sep 3, 2024

fig02 left a comment


		# Disassemble to text

		with ThreadPool(processes=os.cpu_count()) as pool:

		# Tables have no clear start and end in a sequence. Mark the locations of all tables that appear in sequences.
		seq_disas_hacks = {

		<!-- This file is only for extraction of vanilla data. -->
		<Sequence Name="Sequence_10" Index="10"/>

[Audio 7/?] Extract sequences to assembly #2119

[Audio 7/?] Extract sequences to assembly #2119

Conversation

Thar0 commented Sep 1, 2024

hensldm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cadmic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Thar0 Sep 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fig02 left a comment

Choose a reason for hiding this comment

Thar0 Sep 2, 2024 •

edited

Loading