diff --git a/Training-models-in-SageMaker-notebooks.html b/Training-models-in-SageMaker-notebooks.html index b3e6040..05cf1de 100644 --- a/Training-models-in-SageMaker-notebooks.html +++ b/Training-models-in-SageMaker-notebooks.html @@ -1357,14 +1357,13 @@

Cost of distributed computing

tl;dr Use 1 instance unless you are finding that you’re waiting hours for the training/tuning to complete.

-

Let’s break down some key points for deciding between 1 -instance vs. multiple instances from a cost perspective:

+

Let’s break down some key points for deciding between 1 instance +vs. multiple instances from a cost perspective:

  1. Instance cost per hour: -
    • SageMaker charges per instance-hour. Running multiple -instances in parallel can finish training faster, reducing -wall-clock time, but the cost per hour will increase -with each added instance.
    • +
      • SageMaker charges per instance-hour. Running multiple instances in +parallel can finish training faster, reducing wall-clock time, but the +cost per hour will increase with each added instance.
    • Single instance vs. multiple instance wall-clock @@ -1372,18 +1371,17 @@

      Cost of distributed computing

      • When using a single instance, training will take significantly longer, especially if your data is large. However, the wall-clock time difference between 1 instance and 10 instances may not translate to a -direct 10x speedup when using multiple instances due to -communication overheads.
      • +direct 10x speedup when using multiple instances due to communication +overheads.
      • For example, with data-parallel training, instances need to -synchronize gradients between batches, which introduces -communication costs and may slow down training on -larger clusters.
      • +synchronize gradients between batches, which introduces communication +costs and may slow down training on larger clusters.
    • Scaling efficiency:
      • Parallelizing training does not scale perfectly due to those -overheads. Adding instances generally provides diminishing -returns on training time reduction.
      • +overheads. Adding instances generally provides diminishing returns on +training time reduction.
      • For example, doubling instances from 1 to 2 may reduce training time by close to 50%, but going from 8 to 16 instances may only reduce training time by around 20-30%, depending on the model and batch @@ -1391,27 +1389,22 @@

        Cost of distributed computing

    • Typical recommendation: -
      • For small-to-moderate datasets or cases where -training time isn’t a critical factor, a single -instance may be more cost-effective, as it avoids parallel -processing overheads.
      • -
      • For large datasets or where training speed is a -high priority (e.g., tuning complex deep learning models), using -multiple instances can be beneficial despite the cost -increase due to time savings.
      • +
        • For small-to-moderate datasets or cases where training time isn’t a +critical factor, a single instance may be more cost-effective, as it +avoids parallel processing overheads.
        • +
        • For large datasets or where training speed is a high priority (e.g., +tuning complex deep learning models), using multiple instances can be +beneficial despite the cost increase due to time savings.
      • Practical cost estimation:
        • Suppose a single instance takes T hours to train and costs $C per hour. For a 10-instance setup, the cost would be approximately: -
          • -Single instance: T * $C +
            • Single instance: T * $C
            • -
            • -10 instances (parallel): -(T / k) * (10 * $C), where k is the speedup -factor (<10 due to overhead).
            • +
            • 10 instances (parallel): (T / k) * (10 * $C), where +k is the speedup factor (<10 due to overhead).
          • If the speedup is only about 5x instead of 10x due to communication overhead, then the cost difference may be minimal, with a slight edge to diff --git a/aio.html b/aio.html index b035a11..5ddc786 100644 --- a/aio.html +++ b/aio.html @@ -3396,16 +3396,15 @@

            Cost of distributed computing

            tl;dr Use 1 instance unless you are finding that you’re waiting hours for the training/tuning to complete.

            -

            Let’s break down some key points for deciding between 1 -instance vs. multiple instances from a cost perspective:

            +

            Let’s break down some key points for deciding between 1 instance +vs. multiple instances from a cost perspective:

            1. Instance cost per hour:
                -
              • SageMaker charges per instance-hour. Running multiple -instances in parallel can finish training faster, reducing -wall-clock time, but the cost per hour will increase -with each added instance.
              • +
              • SageMaker charges per instance-hour. Running multiple instances in +parallel can finish training faster, reducing wall-clock time, but the +cost per hour will increase with each added instance.
            2. @@ -3415,20 +3414,19 @@

              Cost of distributed computing

            3. When using a single instance, training will take significantly longer, especially if your data is large. However, the wall-clock time difference between 1 instance and 10 instances may not translate to a -direct 10x speedup when using multiple instances due to -communication overheads.
            4. +direct 10x speedup when using multiple instances due to communication +overheads.
            5. For example, with data-parallel training, instances need to -synchronize gradients between batches, which introduces -communication costs and may slow down training on -larger clusters.
            6. +synchronize gradients between batches, which introduces communication +costs and may slow down training on larger clusters.
        • Scaling efficiency:
          • Parallelizing training does not scale perfectly due to those -overheads. Adding instances generally provides diminishing -returns on training time reduction.
          • +overheads. Adding instances generally provides diminishing returns on +training time reduction.
          • For example, doubling instances from 1 to 2 may reduce training time by close to 50%, but going from 8 to 16 instances may only reduce training time by around 20-30%, depending on the model and batch @@ -3438,14 +3436,12 @@

            Cost of distributed computing

          • Typical recommendation:
              -
            • For small-to-moderate datasets or cases where -training time isn’t a critical factor, a single -instance may be more cost-effective, as it avoids parallel -processing overheads.
            • -
            • For large datasets or where training speed is a -high priority (e.g., tuning complex deep learning models), using -multiple instances can be beneficial despite the cost -increase due to time savings.
            • +
            • For small-to-moderate datasets or cases where training time isn’t a +critical factor, a single instance may be more cost-effective, as it +avoids parallel processing overheads.
            • +
            • For large datasets or where training speed is a high priority (e.g., +tuning complex deep learning models), using multiple instances can be +beneficial despite the cost increase due to time savings.
          • @@ -3455,13 +3451,10 @@

            Cost of distributed computing

            costs $C per hour. For a 10-instance setup, the cost would be approximately:
              -
            • -Single instance: T * $C +
            • Single instance: T * $C
            • -
            • -10 instances (parallel): -(T / k) * (10 * $C), where k is the speedup -factor (<10 due to overhead).
            • +
            • 10 instances (parallel): (T / k) * (10 * $C), where +k is the speedup factor (<10 due to overhead).
          • If the speedup is only about 5x instead of 10x due to communication diff --git a/instructor/Training-models-in-SageMaker-notebooks.html b/instructor/Training-models-in-SageMaker-notebooks.html index 49492c1..2e42524 100644 --- a/instructor/Training-models-in-SageMaker-notebooks.html +++ b/instructor/Training-models-in-SageMaker-notebooks.html @@ -1359,14 +1359,13 @@

            Cost of distributed computing

            tl;dr Use 1 instance unless you are finding that you’re waiting hours for the training/tuning to complete.

            -

            Let’s break down some key points for deciding between 1 -instance vs. multiple instances from a cost perspective:

            +

            Let’s break down some key points for deciding between 1 instance +vs. multiple instances from a cost perspective:

            1. Instance cost per hour: -
              • SageMaker charges per instance-hour. Running multiple -instances in parallel can finish training faster, reducing -wall-clock time, but the cost per hour will increase -with each added instance.
              • +
                • SageMaker charges per instance-hour. Running multiple instances in +parallel can finish training faster, reducing wall-clock time, but the +cost per hour will increase with each added instance.
              • Single instance vs. multiple instance wall-clock @@ -1374,18 +1373,17 @@

                Cost of distributed computing

                • When using a single instance, training will take significantly longer, especially if your data is large. However, the wall-clock time difference between 1 instance and 10 instances may not translate to a -direct 10x speedup when using multiple instances due to -communication overheads.
                • +direct 10x speedup when using multiple instances due to communication +overheads.
                • For example, with data-parallel training, instances need to -synchronize gradients between batches, which introduces -communication costs and may slow down training on -larger clusters.
                • +synchronize gradients between batches, which introduces communication +costs and may slow down training on larger clusters.
              • Scaling efficiency:
                • Parallelizing training does not scale perfectly due to those -overheads. Adding instances generally provides diminishing -returns on training time reduction.
                • +overheads. Adding instances generally provides diminishing returns on +training time reduction.
                • For example, doubling instances from 1 to 2 may reduce training time by close to 50%, but going from 8 to 16 instances may only reduce training time by around 20-30%, depending on the model and batch @@ -1393,27 +1391,22 @@

                  Cost of distributed computing

              • Typical recommendation: -
                • For small-to-moderate datasets or cases where -training time isn’t a critical factor, a single -instance may be more cost-effective, as it avoids parallel -processing overheads.
                • -
                • For large datasets or where training speed is a -high priority (e.g., tuning complex deep learning models), using -multiple instances can be beneficial despite the cost -increase due to time savings.
                • +
                  • For small-to-moderate datasets or cases where training time isn’t a +critical factor, a single instance may be more cost-effective, as it +avoids parallel processing overheads.
                  • +
                  • For large datasets or where training speed is a high priority (e.g., +tuning complex deep learning models), using multiple instances can be +beneficial despite the cost increase due to time savings.
                • Practical cost estimation:
                  • Suppose a single instance takes T hours to train and costs $C per hour. For a 10-instance setup, the cost would be approximately: -
                    • -Single instance: T * $C +
                      • Single instance: T * $C
                      • -
                      • -10 instances (parallel): -(T / k) * (10 * $C), where k is the speedup -factor (<10 due to overhead).
                      • +
                      • 10 instances (parallel): (T / k) * (10 * $C), where +k is the speedup factor (<10 due to overhead).
                    • If the speedup is only about 5x instead of 10x due to communication overhead, then the cost difference may be minimal, with a slight edge to diff --git a/instructor/aio.html b/instructor/aio.html index 678a853..c666506 100644 --- a/instructor/aio.html +++ b/instructor/aio.html @@ -3404,16 +3404,15 @@

                      Cost of distributed computing

                      tl;dr Use 1 instance unless you are finding that you’re waiting hours for the training/tuning to complete.

                      -

                      Let’s break down some key points for deciding between 1 -instance vs. multiple instances from a cost perspective:

                      +

                      Let’s break down some key points for deciding between 1 instance +vs. multiple instances from a cost perspective:

                      1. Instance cost per hour:
                          -
                        • SageMaker charges per instance-hour. Running multiple -instances in parallel can finish training faster, reducing -wall-clock time, but the cost per hour will increase -with each added instance.
                        • +
                        • SageMaker charges per instance-hour. Running multiple instances in +parallel can finish training faster, reducing wall-clock time, but the +cost per hour will increase with each added instance.
                      2. @@ -3423,20 +3422,19 @@

                        Cost of distributed computing

                      3. When using a single instance, training will take significantly longer, especially if your data is large. However, the wall-clock time difference between 1 instance and 10 instances may not translate to a -direct 10x speedup when using multiple instances due to -communication overheads.
                      4. +direct 10x speedup when using multiple instances due to communication +overheads.
                      5. For example, with data-parallel training, instances need to -synchronize gradients between batches, which introduces -communication costs and may slow down training on -larger clusters.
                      6. +synchronize gradients between batches, which introduces communication +costs and may slow down training on larger clusters.
                  • Scaling efficiency:
                    • Parallelizing training does not scale perfectly due to those -overheads. Adding instances generally provides diminishing -returns on training time reduction.
                    • +overheads. Adding instances generally provides diminishing returns on +training time reduction.
                    • For example, doubling instances from 1 to 2 may reduce training time by close to 50%, but going from 8 to 16 instances may only reduce training time by around 20-30%, depending on the model and batch @@ -3446,14 +3444,12 @@

                      Cost of distributed computing

                    • Typical recommendation:
                        -
                      • For small-to-moderate datasets or cases where -training time isn’t a critical factor, a single -instance may be more cost-effective, as it avoids parallel -processing overheads.
                      • -
                      • For large datasets or where training speed is a -high priority (e.g., tuning complex deep learning models), using -multiple instances can be beneficial despite the cost -increase due to time savings.
                      • +
                      • For small-to-moderate datasets or cases where training time isn’t a +critical factor, a single instance may be more cost-effective, as it +avoids parallel processing overheads.
                      • +
                      • For large datasets or where training speed is a high priority (e.g., +tuning complex deep learning models), using multiple instances can be +beneficial despite the cost increase due to time savings.
                    • @@ -3463,13 +3459,10 @@

                      Cost of distributed computing

                      costs $C per hour. For a 10-instance setup, the cost would be approximately:
                        -
                      • -Single instance: T * $C +
                      • Single instance: T * $C
                      • -
                      • -10 instances (parallel): -(T / k) * (10 * $C), where k is the speedup -factor (<10 due to overhead).
                      • +
                      • 10 instances (parallel): (T / k) * (10 * $C), where +k is the speedup factor (<10 due to overhead).
                    • If the speedup is only about 5x instead of 10x due to communication diff --git a/md5sum.txt b/md5sum.txt index 6288e72..ee48418 100644 --- a/md5sum.txt +++ b/md5sum.txt @@ -9,7 +9,7 @@ "episodes/SageMaker-notebooks-as-controllers.md" "7b44f533d49559aa691b8ab2574b4e81" "site/built/SageMaker-notebooks-as-controllers.md" "2024-11-06" "episodes/Accessing-S3-via-SageMaker-notebooks.md" "65e591a493b3bba8fdcfa29a7d00dd13" "site/built/Accessing-S3-via-SageMaker-notebooks.md" "2024-11-14" "episodes/Interacting-with-code-repo.md" "105dace64e3a1ea6570d314e4b3ccfff" "site/built/Interacting-with-code-repo.md" "2024-11-06" -"episodes/Training-models-in-SageMaker-notebooks.md" "6fec4e57fac474e83f4732f3ac1706bf" "site/built/Training-models-in-SageMaker-notebooks.md" "2025-01-09" +"episodes/Training-models-in-SageMaker-notebooks.md" "29cc9de0af426d24af5d7245bc46fe51" "site/built/Training-models-in-SageMaker-notebooks.md" "2025-01-09" "episodes/Training-models-in-SageMaker-notebooks-part2.md" "35107ac2e6cb99307714b0f25b2576c4" "site/built/Training-models-in-SageMaker-notebooks-part2.md" "2024-11-07" "episodes/Hyperparameter-tuning.md" "c9fe9c20d437dc2f88315438ac6460db" "site/built/Hyperparameter-tuning.md" "2024-11-07" "episodes/Resource-management-cleanup.md" "bb9671676d8d86679b598531c2e294b0" "site/built/Resource-management-cleanup.md" "2024-11-08" diff --git a/pkgdown.yml b/pkgdown.yml index db888d2..10f4311 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -2,4 +2,4 @@ pandoc: 3.1.11 pkgdown: 2.1.1 pkgdown_sha: ~ articles: {} -last_built: 2025-01-09T15:45Z +last_built: 2025-01-09T15:48Z