diff --git a/Training-models-in-SageMaker-notebooks.html b/Training-models-in-SageMaker-notebooks.html
index b3e6040..05cf1de 100644
--- a/Training-models-in-SageMaker-notebooks.html
+++ b/Training-models-in-SageMaker-notebooks.html
@@ -1357,14 +1357,13 @@ <h3 class="callout-title">Cost of distributed computing</h3>
 <div class="callout-content">
 <p><strong>tl;dr</strong> Use 1 instance unless you are finding that
 you’re waiting hours for the training/tuning to complete.</p>
-<p>Let’s break down some key points for deciding between <strong>1
-instance vs. multiple instances</strong> from a cost perspective:</p>
+<p>Let’s break down some key points for deciding between 1 instance
+vs. multiple instances from a cost perspective:</p>
 <ol style="list-style-type: decimal"><li>
 <strong>Instance cost per hour</strong>:
-<ul><li>SageMaker charges per instance-hour. Running <strong>multiple
-instances</strong> in parallel can finish training faster, reducing
-wall-clock time, but the <strong>cost per hour will increase</strong>
-with each added instance.</li>
+<ul><li>SageMaker charges per instance-hour. Running multiple instances in
+parallel can finish training faster, reducing wall-clock time, but the
+cost per hour will increase with each added instance.</li>
 </ul></li>
 <li>
 <strong>Single instance vs. multiple instance wall-clock
@@ -1372,18 +1371,17 @@ <h3 class="callout-title">Cost of distributed computing</h3>
 <ul><li>When using a single instance, training will take significantly
 longer, especially if your data is large. However, the wall-clock time
 difference between 1 instance and 10 instances may not translate to a
-direct 10x speedup when using multiple instances due to
-<strong>communication overheads</strong>.</li>
+direct 10x speedup when using multiple instances due to communication
+overheads.</li>
 <li>For example, with data-parallel training, instances need to
-synchronize gradients between batches, which introduces
-<strong>communication costs</strong> and may slow down training on
-larger clusters.</li>
+synchronize gradients between batches, which introduces communication
+costs and may slow down training on larger clusters.</li>
 </ul></li>
 <li>
 <strong>Scaling efficiency</strong>:
 <ul><li>Parallelizing training does not scale perfectly due to those
-overheads. Adding instances generally provides <strong>diminishing
-returns</strong> on training time reduction.</li>
+overheads. Adding instances generally provides diminishing returns on
+training time reduction.</li>
 <li>For example, doubling instances from 1 to 2 may reduce training time
 by close to 50%, but going from 8 to 16 instances may only reduce
 training time by around 20-30%, depending on the model and batch
@@ -1391,27 +1389,22 @@ <h3 class="callout-title">Cost of distributed computing</h3>
 </ul></li>
 <li>
 <strong>Typical recommendation</strong>:
-<ul><li>For <strong>small-to-moderate datasets</strong> or cases where
-training time isn’t a critical factor, a <strong>single
-instance</strong> may be more cost-effective, as it avoids parallel
-processing overheads.</li>
-<li>For <strong>large datasets</strong> or where training speed is a
-high priority (e.g., tuning complex deep learning models), using
-<strong>multiple instances</strong> can be beneficial despite the cost
-increase due to time savings.</li>
+<ul><li>For small-to-moderate datasets or cases where training time isn’t a
+critical factor, a single instance may be more cost-effective, as it
+avoids parallel processing overheads.</li>
+<li>For large datasets or where training speed is a high priority (e.g.,
+tuning complex deep learning models), using multiple instances can be
+beneficial despite the cost increase due to time savings.</li>
 </ul></li>
 <li>
 <strong>Practical cost estimation</strong>:
 <ul><li>Suppose a single instance takes <code>T</code> hours to train and
 costs <code>$C</code> per hour. For a 10-instance setup, the cost would
 be approximately:
-<ul><li>
-<strong>Single instance:</strong> <code>T * $C</code>
+<ul><li>Single instance: <code>T * $C</code>
 </li>
-<li>
-<strong>10 instances (parallel):</strong>
-<code>(T / k) * (10 * $C)</code>, where <code>k</code> is the speedup
-factor (&lt;10 due to overhead).</li>
+<li>10 instances (parallel): <code>(T / k) * (10 * $C)</code>, where
+<code>k</code> is the speedup factor (&lt;10 due to overhead).</li>
 </ul></li>
 <li>If the speedup is only about 5x instead of 10x due to communication
 overhead, then the cost difference may be minimal, with a slight edge to
diff --git a/aio.html b/aio.html
index b035a11..5ddc786 100644
--- a/aio.html
+++ b/aio.html
@@ -3396,16 +3396,15 @@ <h3 class="callout-title">Cost of distributed computing</h3>
 <div class="callout-content">
 <p><strong>tl;dr</strong> Use 1 instance unless you are finding that
 you’re waiting hours for the training/tuning to complete.</p>
-<p>Let’s break down some key points for deciding between <strong>1
-instance vs. multiple instances</strong> from a cost perspective:</p>
+<p>Let’s break down some key points for deciding between 1 instance
+vs. multiple instances from a cost perspective:</p>
 <ol style="list-style-type: decimal">
 <li>
 <strong>Instance cost per hour</strong>:
 <ul>
-<li>SageMaker charges per instance-hour. Running <strong>multiple
-instances</strong> in parallel can finish training faster, reducing
-wall-clock time, but the <strong>cost per hour will increase</strong>
-with each added instance.</li>
+<li>SageMaker charges per instance-hour. Running multiple instances in
+parallel can finish training faster, reducing wall-clock time, but the
+cost per hour will increase with each added instance.</li>
 </ul>
 </li>
 <li>
@@ -3415,20 +3414,19 @@ <h3 class="callout-title">Cost of distributed computing</h3>
 <li>When using a single instance, training will take significantly
 longer, especially if your data is large. However, the wall-clock time
 difference between 1 instance and 10 instances may not translate to a
-direct 10x speedup when using multiple instances due to
-<strong>communication overheads</strong>.</li>
+direct 10x speedup when using multiple instances due to communication
+overheads.</li>
 <li>For example, with data-parallel training, instances need to
-synchronize gradients between batches, which introduces
-<strong>communication costs</strong> and may slow down training on
-larger clusters.</li>
+synchronize gradients between batches, which introduces communication
+costs and may slow down training on larger clusters.</li>
 </ul>
 </li>
 <li>
 <strong>Scaling efficiency</strong>:
 <ul>
 <li>Parallelizing training does not scale perfectly due to those
-overheads. Adding instances generally provides <strong>diminishing
-returns</strong> on training time reduction.</li>
+overheads. Adding instances generally provides diminishing returns on
+training time reduction.</li>
 <li>For example, doubling instances from 1 to 2 may reduce training time
 by close to 50%, but going from 8 to 16 instances may only reduce
 training time by around 20-30%, depending on the model and batch
@@ -3438,14 +3436,12 @@ <h3 class="callout-title">Cost of distributed computing</h3>
 <li>
 <strong>Typical recommendation</strong>:
 <ul>
-<li>For <strong>small-to-moderate datasets</strong> or cases where
-training time isn’t a critical factor, a <strong>single
-instance</strong> may be more cost-effective, as it avoids parallel
-processing overheads.</li>
-<li>For <strong>large datasets</strong> or where training speed is a
-high priority (e.g., tuning complex deep learning models), using
-<strong>multiple instances</strong> can be beneficial despite the cost
-increase due to time savings.</li>
+<li>For small-to-moderate datasets or cases where training time isn’t a
+critical factor, a single instance may be more cost-effective, as it
+avoids parallel processing overheads.</li>
+<li>For large datasets or where training speed is a high priority (e.g.,
+tuning complex deep learning models), using multiple instances can be
+beneficial despite the cost increase due to time savings.</li>
 </ul>
 </li>
 <li>
@@ -3455,13 +3451,10 @@ <h3 class="callout-title">Cost of distributed computing</h3>
 costs <code>$C</code> per hour. For a 10-instance setup, the cost would
 be approximately:
 <ul>
-<li>
-<strong>Single instance:</strong> <code>T * $C</code>
+<li>Single instance: <code>T * $C</code>
 </li>
-<li>
-<strong>10 instances (parallel):</strong>
-<code>(T / k) * (10 * $C)</code>, where <code>k</code> is the speedup
-factor (&lt;10 due to overhead).</li>
+<li>10 instances (parallel): <code>(T / k) * (10 * $C)</code>, where
+<code>k</code> is the speedup factor (&lt;10 due to overhead).</li>
 </ul>
 </li>
 <li>If the speedup is only about 5x instead of 10x due to communication
diff --git a/instructor/Training-models-in-SageMaker-notebooks.html b/instructor/Training-models-in-SageMaker-notebooks.html
index 49492c1..2e42524 100644
--- a/instructor/Training-models-in-SageMaker-notebooks.html
+++ b/instructor/Training-models-in-SageMaker-notebooks.html
@@ -1359,14 +1359,13 @@ <h3 class="callout-title">Cost of distributed computing</h3>
 <div class="callout-content">
 <p><strong>tl;dr</strong> Use 1 instance unless you are finding that
 you’re waiting hours for the training/tuning to complete.</p>
-<p>Let’s break down some key points for deciding between <strong>1
-instance vs. multiple instances</strong> from a cost perspective:</p>
+<p>Let’s break down some key points for deciding between 1 instance
+vs. multiple instances from a cost perspective:</p>
 <ol style="list-style-type: decimal"><li>
 <strong>Instance cost per hour</strong>:
-<ul><li>SageMaker charges per instance-hour. Running <strong>multiple
-instances</strong> in parallel can finish training faster, reducing
-wall-clock time, but the <strong>cost per hour will increase</strong>
-with each added instance.</li>
+<ul><li>SageMaker charges per instance-hour. Running multiple instances in
+parallel can finish training faster, reducing wall-clock time, but the
+cost per hour will increase with each added instance.</li>
 </ul></li>
 <li>
 <strong>Single instance vs. multiple instance wall-clock
@@ -1374,18 +1373,17 @@ <h3 class="callout-title">Cost of distributed computing</h3>
 <ul><li>When using a single instance, training will take significantly
 longer, especially if your data is large. However, the wall-clock time
 difference between 1 instance and 10 instances may not translate to a
-direct 10x speedup when using multiple instances due to
-<strong>communication overheads</strong>.</li>
+direct 10x speedup when using multiple instances due to communication
+overheads.</li>
 <li>For example, with data-parallel training, instances need to
-synchronize gradients between batches, which introduces
-<strong>communication costs</strong> and may slow down training on
-larger clusters.</li>
+synchronize gradients between batches, which introduces communication
+costs and may slow down training on larger clusters.</li>
 </ul></li>
 <li>
 <strong>Scaling efficiency</strong>:
 <ul><li>Parallelizing training does not scale perfectly due to those
-overheads. Adding instances generally provides <strong>diminishing
-returns</strong> on training time reduction.</li>
+overheads. Adding instances generally provides diminishing returns on
+training time reduction.</li>
 <li>For example, doubling instances from 1 to 2 may reduce training time
 by close to 50%, but going from 8 to 16 instances may only reduce
 training time by around 20-30%, depending on the model and batch
@@ -1393,27 +1391,22 @@ <h3 class="callout-title">Cost of distributed computing</h3>
 </ul></li>
 <li>
 <strong>Typical recommendation</strong>:
-<ul><li>For <strong>small-to-moderate datasets</strong> or cases where
-training time isn’t a critical factor, a <strong>single
-instance</strong> may be more cost-effective, as it avoids parallel
-processing overheads.</li>
-<li>For <strong>large datasets</strong> or where training speed is a
-high priority (e.g., tuning complex deep learning models), using
-<strong>multiple instances</strong> can be beneficial despite the cost
-increase due to time savings.</li>
+<ul><li>For small-to-moderate datasets or cases where training time isn’t a
+critical factor, a single instance may be more cost-effective, as it
+avoids parallel processing overheads.</li>
+<li>For large datasets or where training speed is a high priority (e.g.,
+tuning complex deep learning models), using multiple instances can be
+beneficial despite the cost increase due to time savings.</li>
 </ul></li>
 <li>
 <strong>Practical cost estimation</strong>:
 <ul><li>Suppose a single instance takes <code>T</code> hours to train and
 costs <code>$C</code> per hour. For a 10-instance setup, the cost would
 be approximately:
-<ul><li>
-<strong>Single instance:</strong> <code>T * $C</code>
+<ul><li>Single instance: <code>T * $C</code>
 </li>
-<li>
-<strong>10 instances (parallel):</strong>
-<code>(T / k) * (10 * $C)</code>, where <code>k</code> is the speedup
-factor (&lt;10 due to overhead).</li>
+<li>10 instances (parallel): <code>(T / k) * (10 * $C)</code>, where
+<code>k</code> is the speedup factor (&lt;10 due to overhead).</li>
 </ul></li>
 <li>If the speedup is only about 5x instead of 10x due to communication
 overhead, then the cost difference may be minimal, with a slight edge to
diff --git a/instructor/aio.html b/instructor/aio.html
index 678a853..c666506 100644
--- a/instructor/aio.html
+++ b/instructor/aio.html
@@ -3404,16 +3404,15 @@ <h3 class="callout-title">Cost of distributed computing</h3>
 <div class="callout-content">
 <p><strong>tl;dr</strong> Use 1 instance unless you are finding that
 you’re waiting hours for the training/tuning to complete.</p>
-<p>Let’s break down some key points for deciding between <strong>1
-instance vs. multiple instances</strong> from a cost perspective:</p>
+<p>Let’s break down some key points for deciding between 1 instance
+vs. multiple instances from a cost perspective:</p>
 <ol style="list-style-type: decimal">
 <li>
 <strong>Instance cost per hour</strong>:
 <ul>
-<li>SageMaker charges per instance-hour. Running <strong>multiple
-instances</strong> in parallel can finish training faster, reducing
-wall-clock time, but the <strong>cost per hour will increase</strong>
-with each added instance.</li>
+<li>SageMaker charges per instance-hour. Running multiple instances in
+parallel can finish training faster, reducing wall-clock time, but the
+cost per hour will increase with each added instance.</li>
 </ul>
 </li>
 <li>
@@ -3423,20 +3422,19 @@ <h3 class="callout-title">Cost of distributed computing</h3>
 <li>When using a single instance, training will take significantly
 longer, especially if your data is large. However, the wall-clock time
 difference between 1 instance and 10 instances may not translate to a
-direct 10x speedup when using multiple instances due to
-<strong>communication overheads</strong>.</li>
+direct 10x speedup when using multiple instances due to communication
+overheads.</li>
 <li>For example, with data-parallel training, instances need to
-synchronize gradients between batches, which introduces
-<strong>communication costs</strong> and may slow down training on
-larger clusters.</li>
+synchronize gradients between batches, which introduces communication
+costs and may slow down training on larger clusters.</li>
 </ul>
 </li>
 <li>
 <strong>Scaling efficiency</strong>:
 <ul>
 <li>Parallelizing training does not scale perfectly due to those
-overheads. Adding instances generally provides <strong>diminishing
-returns</strong> on training time reduction.</li>
+overheads. Adding instances generally provides diminishing returns on
+training time reduction.</li>
 <li>For example, doubling instances from 1 to 2 may reduce training time
 by close to 50%, but going from 8 to 16 instances may only reduce
 training time by around 20-30%, depending on the model and batch
@@ -3446,14 +3444,12 @@ <h3 class="callout-title">Cost of distributed computing</h3>
 <li>
 <strong>Typical recommendation</strong>:
 <ul>
-<li>For <strong>small-to-moderate datasets</strong> or cases where
-training time isn’t a critical factor, a <strong>single
-instance</strong> may be more cost-effective, as it avoids parallel
-processing overheads.</li>
-<li>For <strong>large datasets</strong> or where training speed is a
-high priority (e.g., tuning complex deep learning models), using
-<strong>multiple instances</strong> can be beneficial despite the cost
-increase due to time savings.</li>
+<li>For small-to-moderate datasets or cases where training time isn’t a
+critical factor, a single instance may be more cost-effective, as it
+avoids parallel processing overheads.</li>
+<li>For large datasets or where training speed is a high priority (e.g.,
+tuning complex deep learning models), using multiple instances can be
+beneficial despite the cost increase due to time savings.</li>
 </ul>
 </li>
 <li>
@@ -3463,13 +3459,10 @@ <h3 class="callout-title">Cost of distributed computing</h3>
 costs <code>$C</code> per hour. For a 10-instance setup, the cost would
 be approximately:
 <ul>
-<li>
-<strong>Single instance:</strong> <code>T * $C</code>
+<li>Single instance: <code>T * $C</code>
 </li>
-<li>
-<strong>10 instances (parallel):</strong>
-<code>(T / k) * (10 * $C)</code>, where <code>k</code> is the speedup
-factor (&lt;10 due to overhead).</li>
+<li>10 instances (parallel): <code>(T / k) * (10 * $C)</code>, where
+<code>k</code> is the speedup factor (&lt;10 due to overhead).</li>
 </ul>
 </li>
 <li>If the speedup is only about 5x instead of 10x due to communication
diff --git a/md5sum.txt b/md5sum.txt
index 6288e72..ee48418 100644
--- a/md5sum.txt
+++ b/md5sum.txt
@@ -9,7 +9,7 @@
 "episodes/SageMaker-notebooks-as-controllers.md" "7b44f533d49559aa691b8ab2574b4e81" "site/built/SageMaker-notebooks-as-controllers.md" "2024-11-06"
 "episodes/Accessing-S3-via-SageMaker-notebooks.md" "65e591a493b3bba8fdcfa29a7d00dd13" "site/built/Accessing-S3-via-SageMaker-notebooks.md" "2024-11-14"
 "episodes/Interacting-with-code-repo.md" "105dace64e3a1ea6570d314e4b3ccfff" "site/built/Interacting-with-code-repo.md" "2024-11-06"
-"episodes/Training-models-in-SageMaker-notebooks.md" "6fec4e57fac474e83f4732f3ac1706bf" "site/built/Training-models-in-SageMaker-notebooks.md" "2025-01-09"
+"episodes/Training-models-in-SageMaker-notebooks.md" "29cc9de0af426d24af5d7245bc46fe51" "site/built/Training-models-in-SageMaker-notebooks.md" "2025-01-09"
 "episodes/Training-models-in-SageMaker-notebooks-part2.md" "35107ac2e6cb99307714b0f25b2576c4" "site/built/Training-models-in-SageMaker-notebooks-part2.md" "2024-11-07"
 "episodes/Hyperparameter-tuning.md" "c9fe9c20d437dc2f88315438ac6460db" "site/built/Hyperparameter-tuning.md" "2024-11-07"
 "episodes/Resource-management-cleanup.md" "bb9671676d8d86679b598531c2e294b0" "site/built/Resource-management-cleanup.md" "2024-11-08"
diff --git a/pkgdown.yml b/pkgdown.yml
index db888d2..10f4311 100644
--- a/pkgdown.yml
+++ b/pkgdown.yml
@@ -2,4 +2,4 @@ pandoc: 3.1.11
 pkgdown: 2.1.1
 pkgdown_sha: ~
 articles: {}
-last_built: 2025-01-09T15:45Z
+last_built: 2025-01-09T15:48Z