Improve Calc README and FLOPs Script #1

Quentin-Anthony · 2024-12-16T21:49:30Z

Improves the calc README. Both overall writing and some inconsistencies in how we present flops/params
Completely rework the mamba flop calc script. Some errors were introduced in commits after the initial push

calc/calc_mamba_flops.py

…llows the conv Co-authored-by: pglorio <[email protected]>

Co-authored-by: pglorio <[email protected]>

pglorio · 2025-01-05T21:00:44Z

calc/calc_mamba_flops.py

+    # State updates
+    mamba2_block_flops += 2 * args.batch_size * args.sequence_length * d_inner * args.state_size
+    # Output projections
+    mamba2_block_flops += 2 * args.batch_size * args.sequence_length * d_inner * args.state_size * args.hidden_size


Suggested change

mamba2_block_flops += 2 * args.batch_size * args.sequence_length * d_inner * args.state_size * args.hidden_size

mamba2_block_flops += 2 * args.batch_size * args.sequence_length * d_inner * args.hidden_size

calc/calc_mamba_flops.py

pglorio · 2025-01-05T21:01:22Z

calc/calc_mamba_flops.py

+    # Output projections
+    mamba2_block_flops += 2 * args.batch_size * args.sequence_length * d_inner * args.state_size * args.hidden_size
+    # Final gating
+    mamba2_block_flops += args.batch_size * args.sequence_length * args.hidden_size


Suggested change

mamba2_block_flops += args.batch_size * args.sequence_length * args.hidden_size

deleted the last two lines because gating is included now before the output projections

calc/calc_mamba_flops.py

pglorio · 2025-01-05T21:38:12Z

calc/calc_mamba_flops.py

                total_attention_flops += shared_attention_flops
-                total_flops += shared_ffn_flops
-                total_ffn_flops = shared_ffn_flops
+                total_ffn_flops += shared_ffn_flops
                args.hidden_size = original_hidden_size
                # final downprojector matrix
                total_flops += 4 * args.batch_size * args.sequence_length * args.hidden_size * args.hidden_size


Suggested change

total_flops += 4 * args.batch_size * args.sequence_length * args.hidden_size * args.hidden_size

total_flops += 2 * args.batch_size * args.sequence_length * args.hidden_size * args.hidden_size * iter_factor

this is a square matrix, the downproj happens inside attention as commented above

pglorio · 2025-01-05T21:48:53Z

calc/calc_mamba_flops.py


-    mamba_flops = compute_mamba_flops(args)
+    # Calculate component FLOPs
+    mamba1_flops = compute_mamba1_flops(args, iter_factor)
    mamba2_flops = compute_mamba2_flops(args)


Suggested change

mamba2_flops = compute_mamba2_flops(args)

mamba2_flops = iter_factor * compute_mamba2_flops(args)

calc/calc_mamba_flops.py

Co-authored-by: pglorio <[email protected]>

pglorio · 2025-01-17T20:40:46Z

calc/calc_mamba_flops.py

                # final downprojector matrix
-                total_flops += 4 * args.batch_size * args.sequence_length * args.hidden_size * args.hidden_size
+                total_flops += 4 * args.hidden_size * args.hidden_size


Suggested change

total_flops += 4 * args.hidden_size * args.hidden_size

total_flops += 2 * args.hidden_size * args.hidden_size

this layer maps hidden_size -> hidden_size

pglorio · 2025-01-17T20:42:23Z

calc/calc_mamba_flops.py

                # final downprojector matrix
-                total_flops += 4 * args.batch_size * args.sequence_length * args.hidden_size * args.hidden_size
+                total_flops += 4 * args.hidden_size * args.hidden_size
            else:


Suggested change

else:

total_flops += mamba2_flops

total_mamba2_flops += mamba2_flops

else:

layer type g contains a mamba2 layer after transformer

pglorio · 2025-01-17T20:51:32Z

calc/calc_mamba_flops.py

+    mamba2_block_flops += 2 * d_inner * args.state_size * args.hidden_size
+    mamba2_block_flops += args.hidden_size


Suggested change

mamba2_block_flops += 2 * d_inner * args.state_size * args.hidden_size

mamba2_block_flops += args.hidden_size

mamba2_block_flops += 2 * d_inner * args.hidden_size

there is no bias in the out projector

update readme and fix mamba flop calc script

22ecbdd

Quentin-Anthony changed the title ~~Update calc readme and script~~ Improve Calc README and FLOPs Script Dec 16, 2024

pglorio reviewed Jan 5, 2025

View reviewed changes

calc/calc_mamba_flops.py Outdated Show resolved Hide resolved

pglorio reviewed Jan 5, 2025

View reviewed changes

calc/calc_mamba_flops.py Outdated Show resolved Hide resolved

pglorio reviewed Jan 5, 2025

View reviewed changes

calc/calc_mamba_flops.py Outdated Show resolved Hide resolved

pglorio reviewed Jan 5, 2025

View reviewed changes

calc/calc_mamba_flops.py Show resolved Hide resolved

Quentin-Anthony and others added 6 commits January 15, 2025 15:03

drop one factor of d_inner and add contribution from the SiLU that fo…

480c3b8

…llows the conv Co-authored-by: pglorio <[email protected]>

fix d_inner handling

af0a693

Co-authored-by: pglorio <[email protected]>

add missing C and Dx terms

1d56170

Co-authored-by: pglorio <[email protected]>

fix gated norm

48b4c59

Co-authored-by: pglorio <[email protected]>

improve flop output and fix some bugs

b063714

fix token factors

bbbad6c

pglorio reviewed Jan 16, 2025

View reviewed changes

Quentin-Anthony and others added 7 commits January 15, 2025 17:51

remove final gating since fused

ccbbb30

Co-authored-by: pglorio <[email protected]>

add shared attn flop func

2df3bfb

Co-authored-by: pglorio <[email protected]>

switch attn flop function for shared

1543f06

Co-authored-by: pglorio <[email protected]>

fix since shared attn is rectangular

f275dfe

Co-authored-by: pglorio <[email protected]>

same rectangular shared attn fix

2fcf2d8

Co-authored-by: pglorio <[email protected]>

fix shared layer hidden logic

1fcb9be

Co-authored-by: pglorio <[email protected]>

account for all tokens in attn block

0b33318

pglorio reviewed Jan 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Calc README and FLOPs Script #1

Improve Calc README and FLOPs Script #1

Quentin-Anthony commented Dec 16, 2024

pglorio Jan 5, 2025

pglorio Jan 5, 2025

pglorio Jan 5, 2025

pglorio Jan 5, 2025

pglorio Jan 5, 2025

pglorio Jan 17, 2025

pglorio Jan 17, 2025

pglorio Jan 17, 2025

	mamba2_block_flops += 2 * args.batch_size * args.sequence_length * d_inner * args.state_size * args.hidden_size
	mamba2_block_flops += 2 * args.batch_size * args.sequence_length * d_inner * args.hidden_size

	total_flops += 4 * args.batch_size * args.sequence_length * args.hidden_size * args.hidden_size
	total_flops += 2 * args.batch_size * args.sequence_length * args.hidden_size * args.hidden_size * iter_factor

	mamba2_flops = compute_mamba2_flops(args)
	mamba2_flops = iter_factor * compute_mamba2_flops(args)

	total_flops += 4 * args.hidden_size * args.hidden_size
	total_flops += 2 * args.hidden_size * args.hidden_size

-            else:
+                total_flops += mamba2_flops
+                total_mamba2_flops += mamba2_flops
+            else:

		mamba2_block_flops += 2 * d_inner * args.state_size * args.hidden_size
		mamba2_block_flops += args.hidden_size

Improve Calc README and FLOPs Script #1

Are you sure you want to change the base?

Improve Calc README and FLOPs Script #1

Conversation

Quentin-Anthony commented Dec 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment