diff --git a/Training-models-in-SageMaker-notebooks.html b/Training-models-in-SageMaker-notebooks.html
index b3e6040..05cf1de 100644
--- a/Training-models-in-SageMaker-notebooks.html
+++ b/Training-models-in-SageMaker-notebooks.html
@@ -1357,14 +1357,13 @@
tl;dr Use 1 instance unless you are finding that
you’re waiting hours for the training/tuning to complete.
-
Let’s break down some key points for deciding between 1
-instance vs. multiple instances from a cost perspective:
+
Let’s break down some key points for deciding between 1 instance
+vs. multiple instances from a cost perspective:
-
Instance cost per hour:
-
- SageMaker charges per instance-hour. Running multiple
-instances in parallel can finish training faster, reducing
-wall-clock time, but the cost per hour will increase
-with each added instance.
+- SageMaker charges per instance-hour. Running multiple instances in
+parallel can finish training faster, reducing wall-clock time, but the
+cost per hour will increase with each added instance.
-
Single instance vs. multiple instance wall-clock
@@ -1372,18 +1371,17 @@
Cost of distributed computing
- When using a single instance, training will take significantly
longer, especially if your data is large. However, the wall-clock time
difference between 1 instance and 10 instances may not translate to a
-direct 10x speedup when using multiple instances due to
-communication overheads.
+direct 10x speedup when using multiple instances due to communication
+overheads.
- For example, with data-parallel training, instances need to
-synchronize gradients between batches, which introduces
-communication costs and may slow down training on
-larger clusters.
+synchronize gradients between batches, which introduces communication
+costs and may slow down training on larger clusters.
-
Scaling efficiency:
- Parallelizing training does not scale perfectly due to those
-overheads. Adding instances generally provides diminishing
-returns on training time reduction.
+overheads. Adding instances generally provides diminishing returns on
+training time reduction.
- For example, doubling instances from 1 to 2 may reduce training time
by close to 50%, but going from 8 to 16 instances may only reduce
training time by around 20-30%, depending on the model and batch
@@ -1391,27 +1389,22 @@
Cost of distributed computing
-
Typical recommendation:
-
- For small-to-moderate datasets or cases where
-training time isn’t a critical factor, a single
-instance may be more cost-effective, as it avoids parallel
-processing overheads.
-- For large datasets or where training speed is a
-high priority (e.g., tuning complex deep learning models), using
-multiple instances can be beneficial despite the cost
-increase due to time savings.
+- For small-to-moderate datasets or cases where training time isn’t a
+critical factor, a single instance may be more cost-effective, as it
+avoids parallel processing overheads.
+- For large datasets or where training speed is a high priority (e.g.,
+tuning complex deep learning models), using multiple instances can be
+beneficial despite the cost increase due to time savings.
-
Practical cost estimation:
- Suppose a single instance takes
T
hours to train and
costs $C
per hour. For a 10-instance setup, the cost would
be approximately:
--
-Single instance:
T * $C
+- Single instance:
T * $C
--
-10 instances (parallel):
-
(T / k) * (10 * $C)
, where k
is the speedup
-factor (<10 due to overhead).
+- 10 instances (parallel):
(T / k) * (10 * $C)
, where
+k
is the speedup factor (<10 due to overhead).
- If the speedup is only about 5x instead of 10x due to communication
overhead, then the cost difference may be minimal, with a slight edge to
diff --git a/aio.html b/aio.html
index b035a11..5ddc786 100644
--- a/aio.html
+++ b/aio.html
@@ -3396,16 +3396,15 @@
Cost of distributed computing
tl;dr Use 1 instance unless you are finding that
you’re waiting hours for the training/tuning to complete.
-
Let’s break down some key points for deciding between 1
-instance vs. multiple instances from a cost perspective:
+
Let’s break down some key points for deciding between 1 instance
+vs. multiple instances from a cost perspective:
-
Instance cost per hour:
-- SageMaker charges per instance-hour. Running multiple
-instances in parallel can finish training faster, reducing
-wall-clock time, but the cost per hour will increase
-with each added instance.
+- SageMaker charges per instance-hour. Running multiple instances in
+parallel can finish training faster, reducing wall-clock time, but the
+cost per hour will increase with each added instance.
-
@@ -3415,20 +3414,19 @@
Cost of distributed computing
- When using a single instance, training will take significantly
longer, especially if your data is large. However, the wall-clock time
difference between 1 instance and 10 instances may not translate to a
-direct 10x speedup when using multiple instances due to
-communication overheads.
+direct 10x speedup when using multiple instances due to communication
+overheads.
- For example, with data-parallel training, instances need to
-synchronize gradients between batches, which introduces
-communication costs and may slow down training on
-larger clusters.
+synchronize gradients between batches, which introduces communication
+costs and may slow down training on larger clusters.
-
Scaling efficiency:
- Parallelizing training does not scale perfectly due to those
-overheads. Adding instances generally provides diminishing
-returns on training time reduction.
+overheads. Adding instances generally provides diminishing returns on
+training time reduction.
- For example, doubling instances from 1 to 2 may reduce training time
by close to 50%, but going from 8 to 16 instances may only reduce
training time by around 20-30%, depending on the model and batch
@@ -3438,14 +3436,12 @@
Cost of distributed computing
-
Typical recommendation:
-- For small-to-moderate datasets or cases where
-training time isn’t a critical factor, a single
-instance may be more cost-effective, as it avoids parallel
-processing overheads.
-- For large datasets or where training speed is a
-high priority (e.g., tuning complex deep learning models), using
-multiple instances can be beneficial despite the cost
-increase due to time savings.
+- For small-to-moderate datasets or cases where training time isn’t a
+critical factor, a single instance may be more cost-effective, as it
+avoids parallel processing overheads.
+- For large datasets or where training speed is a high priority (e.g.,
+tuning complex deep learning models), using multiple instances can be
+beneficial despite the cost increase due to time savings.
-
@@ -3455,13 +3451,10 @@
Cost of distributed computing
costs $C
per hour. For a 10-instance setup, the cost would
be approximately:
--
-Single instance:
T * $C
+ - Single instance:
T * $C
--
-10 instances (parallel):
-
(T / k) * (10 * $C)
, where k
is the speedup
-factor (<10 due to overhead).
+- 10 instances (parallel):
(T / k) * (10 * $C)
, where
+k
is the speedup factor (<10 due to overhead).
- If the speedup is only about 5x instead of 10x due to communication
diff --git a/instructor/Training-models-in-SageMaker-notebooks.html b/instructor/Training-models-in-SageMaker-notebooks.html
index 49492c1..2e42524 100644
--- a/instructor/Training-models-in-SageMaker-notebooks.html
+++ b/instructor/Training-models-in-SageMaker-notebooks.html
@@ -1359,14 +1359,13 @@
Cost of distributed computing
tl;dr Use 1 instance unless you are finding that
you’re waiting hours for the training/tuning to complete.
-
Let’s break down some key points for deciding between 1
-instance vs. multiple instances from a cost perspective:
+
Let’s break down some key points for deciding between 1 instance
+vs. multiple instances from a cost perspective:
-
Instance cost per hour:
-
- SageMaker charges per instance-hour. Running multiple
-instances in parallel can finish training faster, reducing
-wall-clock time, but the cost per hour will increase
-with each added instance.
+- SageMaker charges per instance-hour. Running multiple instances in
+parallel can finish training faster, reducing wall-clock time, but the
+cost per hour will increase with each added instance.
-
Single instance vs. multiple instance wall-clock
@@ -1374,18 +1373,17 @@
Cost of distributed computing
- When using a single instance, training will take significantly
longer, especially if your data is large. However, the wall-clock time
difference between 1 instance and 10 instances may not translate to a
-direct 10x speedup when using multiple instances due to
-communication overheads.
+direct 10x speedup when using multiple instances due to communication
+overheads.
- For example, with data-parallel training, instances need to
-synchronize gradients between batches, which introduces
-communication costs and may slow down training on
-larger clusters.
+synchronize gradients between batches, which introduces communication
+costs and may slow down training on larger clusters.
-
Scaling efficiency:
- Parallelizing training does not scale perfectly due to those
-overheads. Adding instances generally provides diminishing
-returns on training time reduction.
+overheads. Adding instances generally provides diminishing returns on
+training time reduction.
- For example, doubling instances from 1 to 2 may reduce training time
by close to 50%, but going from 8 to 16 instances may only reduce
training time by around 20-30%, depending on the model and batch
@@ -1393,27 +1391,22 @@
Cost of distributed computing
-
Typical recommendation:
-
- For small-to-moderate datasets or cases where
-training time isn’t a critical factor, a single
-instance may be more cost-effective, as it avoids parallel
-processing overheads.
-- For large datasets or where training speed is a
-high priority (e.g., tuning complex deep learning models), using
-multiple instances can be beneficial despite the cost
-increase due to time savings.
+- For small-to-moderate datasets or cases where training time isn’t a
+critical factor, a single instance may be more cost-effective, as it
+avoids parallel processing overheads.
+- For large datasets or where training speed is a high priority (e.g.,
+tuning complex deep learning models), using multiple instances can be
+beneficial despite the cost increase due to time savings.
-
Practical cost estimation:
- Suppose a single instance takes
T
hours to train and
costs $C
per hour. For a 10-instance setup, the cost would
be approximately:
--
-Single instance:
T * $C
+- Single instance:
T * $C
--
-10 instances (parallel):
-
(T / k) * (10 * $C)
, where k
is the speedup
-factor (<10 due to overhead).
+- 10 instances (parallel):
(T / k) * (10 * $C)
, where
+k
is the speedup factor (<10 due to overhead).
- If the speedup is only about 5x instead of 10x due to communication
overhead, then the cost difference may be minimal, with a slight edge to
diff --git a/instructor/aio.html b/instructor/aio.html
index 678a853..c666506 100644
--- a/instructor/aio.html
+++ b/instructor/aio.html
@@ -3404,16 +3404,15 @@
Cost of distributed computing
tl;dr Use 1 instance unless you are finding that
you’re waiting hours for the training/tuning to complete.
-
Let’s break down some key points for deciding between 1
-instance vs. multiple instances from a cost perspective:
+
Let’s break down some key points for deciding between 1 instance
+vs. multiple instances from a cost perspective:
-
Instance cost per hour:
-- SageMaker charges per instance-hour. Running multiple
-instances in parallel can finish training faster, reducing
-wall-clock time, but the cost per hour will increase
-with each added instance.
+- SageMaker charges per instance-hour. Running multiple instances in
+parallel can finish training faster, reducing wall-clock time, but the
+cost per hour will increase with each added instance.
-
@@ -3423,20 +3422,19 @@
Cost of distributed computing
- When using a single instance, training will take significantly
longer, especially if your data is large. However, the wall-clock time
difference between 1 instance and 10 instances may not translate to a
-direct 10x speedup when using multiple instances due to
-communication overheads.
+direct 10x speedup when using multiple instances due to communication
+overheads.
- For example, with data-parallel training, instances need to
-synchronize gradients between batches, which introduces
-communication costs and may slow down training on
-larger clusters.
+synchronize gradients between batches, which introduces communication
+costs and may slow down training on larger clusters.
-
Scaling efficiency:
- Parallelizing training does not scale perfectly due to those
-overheads. Adding instances generally provides diminishing
-returns on training time reduction.
+overheads. Adding instances generally provides diminishing returns on
+training time reduction.
- For example, doubling instances from 1 to 2 may reduce training time
by close to 50%, but going from 8 to 16 instances may only reduce
training time by around 20-30%, depending on the model and batch
@@ -3446,14 +3444,12 @@
Cost of distributed computing
-
Typical recommendation:
-- For small-to-moderate datasets or cases where
-training time isn’t a critical factor, a single
-instance may be more cost-effective, as it avoids parallel
-processing overheads.
-- For large datasets or where training speed is a
-high priority (e.g., tuning complex deep learning models), using
-multiple instances can be beneficial despite the cost
-increase due to time savings.
+- For small-to-moderate datasets or cases where training time isn’t a
+critical factor, a single instance may be more cost-effective, as it
+avoids parallel processing overheads.
+- For large datasets or where training speed is a high priority (e.g.,
+tuning complex deep learning models), using multiple instances can be
+beneficial despite the cost increase due to time savings.
-
@@ -3463,13 +3459,10 @@
Cost of distributed computing
costs $C
per hour. For a 10-instance setup, the cost would
be approximately:
--
-Single instance:
T * $C
+ - Single instance:
T * $C
--
-10 instances (parallel):
-
(T / k) * (10 * $C)
, where k
is the speedup
-factor (<10 due to overhead).
+- 10 instances (parallel):
(T / k) * (10 * $C)
, where
+k
is the speedup factor (<10 due to overhead).
- If the speedup is only about 5x instead of 10x due to communication
diff --git a/md5sum.txt b/md5sum.txt
index 6288e72..ee48418 100644
--- a/md5sum.txt
+++ b/md5sum.txt
@@ -9,7 +9,7 @@
"episodes/SageMaker-notebooks-as-controllers.md" "7b44f533d49559aa691b8ab2574b4e81" "site/built/SageMaker-notebooks-as-controllers.md" "2024-11-06"
"episodes/Accessing-S3-via-SageMaker-notebooks.md" "65e591a493b3bba8fdcfa29a7d00dd13" "site/built/Accessing-S3-via-SageMaker-notebooks.md" "2024-11-14"
"episodes/Interacting-with-code-repo.md" "105dace64e3a1ea6570d314e4b3ccfff" "site/built/Interacting-with-code-repo.md" "2024-11-06"
-"episodes/Training-models-in-SageMaker-notebooks.md" "6fec4e57fac474e83f4732f3ac1706bf" "site/built/Training-models-in-SageMaker-notebooks.md" "2025-01-09"
+"episodes/Training-models-in-SageMaker-notebooks.md" "29cc9de0af426d24af5d7245bc46fe51" "site/built/Training-models-in-SageMaker-notebooks.md" "2025-01-09"
"episodes/Training-models-in-SageMaker-notebooks-part2.md" "35107ac2e6cb99307714b0f25b2576c4" "site/built/Training-models-in-SageMaker-notebooks-part2.md" "2024-11-07"
"episodes/Hyperparameter-tuning.md" "c9fe9c20d437dc2f88315438ac6460db" "site/built/Hyperparameter-tuning.md" "2024-11-07"
"episodes/Resource-management-cleanup.md" "bb9671676d8d86679b598531c2e294b0" "site/built/Resource-management-cleanup.md" "2024-11-08"
diff --git a/pkgdown.yml b/pkgdown.yml
index db888d2..10f4311 100644
--- a/pkgdown.yml
+++ b/pkgdown.yml
@@ -2,4 +2,4 @@ pandoc: 3.1.11
pkgdown: 2.1.1
pkgdown_sha: ~
articles: {}
-last_built: 2025-01-09T15:45Z
+last_built: 2025-01-09T15:48Z