Skip to content
This repository has been archived by the owner on Aug 7, 2024. It is now read-only.

Add rowwise scaling to Float8Inference module #305

Open
wants to merge 4 commits into
base: gh/drisspg/4/base
Choose a base branch
from

Commits on Jul 3, 2024

  1. Add rowwwise scaling to Float8Inference module

    [ghstack-poisoned]
    drisspg committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    ce3baaf View commit details
    Browse the repository at this point in the history
  2. Update on "Add rowwwise scaling to Float8Inference module"

    [ghstack-poisoned]
    drisspg committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    39e0ad1 View commit details
    Browse the repository at this point in the history

Commits on Jul 4, 2024

  1. Update on "Add rowwise scaling to Float8Inference module"

    # Summary
    
    # Performance
    - Need to investigate the Rowwise dynamic case, I would think this should be faster than TensorWise dynamic
    ```Shell
    Benchmark Results:
    +--------------------------+-------------+
    | Variant                  |   Time (μs) |
    +==========================+=============+
    | BF16                     |     2540.56 |
    +--------------------------+-------------+
    | FP8 Dynamic              |     1512.96 |
    +--------------------------+-------------+
    | FP8 Static               |     1363.75 |
    +--------------------------+-------------+
    | FP8 Weight Only          |     2774.22 |
    +--------------------------+-------------+
    | FP8 Dynamic AxisWise     |     1510.82 |
    +--------------------------+-------------+
    | FP8 Static AxisWise      |     1438.92 |
    +--------------------------+-------------+
    | FP8 Weight Only AxisWise |     2762.88 |
    +--------------------------+-------------+
    
    Comparison Results:
    +--------------------------+-------------+-------------------+---------------+
    | Variant                  |   Time (μs) | Speedup vs BF16   |   MAE vs BF16 |
    +==========================+=============+===================+===============+
    | BF16                     |     2540.56 | 1.00x             |    0          |
    +--------------------------+-------------+-------------------+---------------+
    | FP8 Dynamic              |     1512.96 | 1.68x             |    0.00543213 |
    +--------------------------+-------------+-------------------+---------------+
    | FP8 Static               |     1363.75 | 1.86x             |    0.00546265 |
    +--------------------------+-------------+-------------------+---------------+
    | FP8 Weight Only          |     2774.22 | 0.92x             |    0.00379944 |
    +--------------------------+-------------+-------------------+---------------+
    | FP8 Dynamic AxisWise     |     1510.82 | 1.68x             |    0.00543213 |
    +--------------------------+-------------+-------------------+---------------+
    | FP8 Static AxisWise      |     1438.92 | 1.77x             |    0.00546265 |
    +--------------------------+-------------+-------------------+---------------+
    | FP8 Weight Only AxisWise |     2762.88 | 0.92x             |    0.00379944 |
    +--------------------------+-------------+-------------------+---------------+
    ```
    
    ### Numerics
    
    Using this pytorch/ao#446
    TensorWise Dynamic scaling:
    
    ``` Shell
    +------------+--------------------------------------------+
    | Task       | Metrics                                    |
    +============+============================================+
    | winogrande | +-----------------+----------+             |
    |            | | acc,none        | 0.735596 |             |
    |            | +-----------------+----------+             |
    |            | | acc_stderr,none | 0.012395 |             |
    |            | +-----------------+----------+             |
    +------------+--------------------------------------------+
    | wikitext   | +-----------------------------+----------+ |
    |            | | bits_per_byte,none          | 0.538637 | |
    |            | +-----------------------------+----------+ |
    |            | | bits_per_byte_stderr,none   | N/A      | |
    |            | +-----------------------------+----------+ |
    |            | | byte_perplexity,none        | 1.452600 | |
    |            | +-----------------------------+----------+ |
    |            | | byte_perplexity_stderr,none | N/A      | |
    |            | +-----------------------------+----------+ |
    |            | | word_perplexity,none        | 7.363215 | |
    |            | +-----------------------------+----------+ |
    |            | | word_perplexity_stderr,none | N/A      | |
    |            | +-----------------------------+----------+ |
    +------------+--------------------------------------------+
    ```
    
    AxisWise Dynamic Scaling
    
    ``` Shell
    +------------+--------------------------------------------+
    | Task       | Metrics                                    |
    +============+============================================+
    | winogrande | +-----------------+----------+             |
    |            | | acc,none        | 0.735596 |             |
    |            | +-----------------+----------+             |
    |            | | acc_stderr,none | 0.012395 |             |
    |            | +-----------------+----------+             |
    +------------+--------------------------------------------+
    | wikitext   | +-----------------------------+----------+ |
    |            | | bits_per_byte,none          | 0.538637 | |
    |            | +-----------------------------+----------+ |
    |            | | bits_per_byte_stderr,none   | N/A      | |
    |            | +-----------------------------+----------+ |
    |            | | byte_perplexity,none        | 1.452600 | |
    |            | +-----------------------------+----------+ |
    |            | | byte_perplexity_stderr,none | N/A      | |
    |            | +-----------------------------+----------+ |
    |            | | word_perplexity,none        | 7.363215 | |
    |            | +-----------------------------+----------+ |
    |            | | word_perplexity_stderr,none | N/A      | |
    |            | +-----------------------------+----------+ |
    +------------+--------------------------------------------+
    
    ```
    
    
    
    [ghstack-poisoned]
    drisspg committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    0ec7ada View commit details
    Browse the repository at this point in the history

Commits on Jul 17, 2024

  1. Update on "Add rowwise scaling to Float8Inference module"

    # Summary
    
    # Performance
    - Need to investigate the Rowwise dynamic case, I would think this should be faster than TensorWise dynamic
    ```Shell
    Benchmark Results:
    +--------------------------+-------------+
    | Variant                  |   Time (μs) |
    +==========================+=============+
    | BF16                     |     2540.56 |
    +--------------------------+-------------+
    | FP8 Dynamic              |     1512.96 |
    +--------------------------+-------------+
    | FP8 Static               |     1363.75 |
    +--------------------------+-------------+
    | FP8 Weight Only          |     2774.22 |
    +--------------------------+-------------+
    | FP8 Dynamic AxisWise     |     1510.82 |
    +--------------------------+-------------+
    | FP8 Static AxisWise      |     1438.92 |
    +--------------------------+-------------+
    | FP8 Weight Only AxisWise |     2762.88 |
    +--------------------------+-------------+
    
    Comparison Results:
    +--------------------------+-------------+-------------------+---------------+
    | Variant                  |   Time (μs) | Speedup vs BF16   |   MAE vs BF16 |
    +==========================+=============+===================+===============+
    | BF16                     |     2540.56 | 1.00x             |    0          |
    +--------------------------+-------------+-------------------+---------------+
    | FP8 Dynamic              |     1512.96 | 1.68x             |    0.00543213 |
    +--------------------------+-------------+-------------------+---------------+
    | FP8 Static               |     1363.75 | 1.86x             |    0.00546265 |
    +--------------------------+-------------+-------------------+---------------+
    | FP8 Weight Only          |     2774.22 | 0.92x             |    0.00379944 |
    +--------------------------+-------------+-------------------+---------------+
    | FP8 Dynamic AxisWise     |     1510.82 | 1.68x             |    0.00543213 |
    +--------------------------+-------------+-------------------+---------------+
    | FP8 Static AxisWise      |     1438.92 | 1.77x             |    0.00546265 |
    +--------------------------+-------------+-------------------+---------------+
    | FP8 Weight Only AxisWise |     2762.88 | 0.92x             |    0.00379944 |
    +--------------------------+-------------+-------------------+---------------+
    ```
    
    ### Numerics
    
    Using this pytorch/ao#446
    TensorWise Dynamic scaling:
    
    ``` Shell
    +------------+--------------------------------------------+
    | Task       | Metrics                                    |
    +============+============================================+
    | winogrande | +-----------------+----------+             |
    |            | | acc,none        | 0.735596 |             |
    |            | +-----------------+----------+             |
    |            | | acc_stderr,none | 0.012395 |             |
    |            | +-----------------+----------+             |
    +------------+--------------------------------------------+
    | wikitext   | +-----------------------------+----------+ |
    |            | | bits_per_byte,none          | 0.538637 | |
    |            | +-----------------------------+----------+ |
    |            | | bits_per_byte_stderr,none   | N/A      | |
    |            | +-----------------------------+----------+ |
    |            | | byte_perplexity,none        | 1.452600 | |
    |            | +-----------------------------+----------+ |
    |            | | byte_perplexity_stderr,none | N/A      | |
    |            | +-----------------------------+----------+ |
    |            | | word_perplexity,none        | 7.363215 | |
    |            | +-----------------------------+----------+ |
    |            | | word_perplexity_stderr,none | N/A      | |
    |            | +-----------------------------+----------+ |
    +------------+--------------------------------------------+
    ```
    
    AxisWise Dynamic Scaling
    
    ``` Shell
    +------------+--------------------------------------------+
    | Task       | Metrics                                    |
    +============+============================================+
    | winogrande | +-----------------+----------+             |
    |            | | acc,none        | 0.735596 |             |
    |            | +-----------------+----------+             |
    |            | | acc_stderr,none | 0.012395 |             |
    |            | +-----------------+----------+             |
    +------------+--------------------------------------------+
    | wikitext   | +-----------------------------+----------+ |
    |            | | bits_per_byte,none          | 0.538637 | |
    |            | +-----------------------------+----------+ |
    |            | | bits_per_byte_stderr,none   | N/A      | |
    |            | +-----------------------------+----------+ |
    |            | | byte_perplexity,none        | 1.452600 | |
    |            | +-----------------------------+----------+ |
    |            | | byte_perplexity_stderr,none | N/A      | |
    |            | +-----------------------------+----------+ |
    |            | | word_perplexity,none        | 7.363215 | |
    |            | +-----------------------------+----------+ |
    |            | | word_perplexity_stderr,none | N/A      | |
    |            | +-----------------------------+----------+ |
    +------------+--------------------------------------------+
    
    ```
    
    
    
    [ghstack-poisoned]
    drisspg committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    aced22f View commit details
    Browse the repository at this point in the history