diff --git a/tutorials/W2D3_Microlearning/W2D3_Tutorial1.ipynb b/tutorials/W2D3_Microlearning/W2D3_Tutorial1.ipynb
index d2bbfd759..a3d29d393 100644
--- a/tutorials/W2D3_Microlearning/W2D3_Tutorial1.ipynb
+++ b/tutorials/W2D3_Microlearning/W2D3_Tutorial1.ipynb
@@ -924,7 +924,9 @@
     "execution": {}
    },
    "source": [
-    "Below, we provide an implementation of the node perturbation algorithm, so that you will be able to compare it to the weight perturbation algorithm in subsequent sections. Running this code will take around 9 minutes--you can move on to subsequent sections while you wait!"
+    "Below, we provide an implementation of the node perturbation algorithm, so that you will be able to compare it to the weight perturbation algorithm in subsequent sections. Running this code will take around 9 minutes--you can move on to subsequent sections while you wait!\n",
+    "\n",
+    "One important detail: there are two different notions of efficiency we could consider here: 1) sample efficiency and 2) runtime efficiency. Node perturbation is more sample efficient: in general it brings the loss lower with fewer samples than weight perturbation. However, our particular implementation of node perturbation runs a little slower than weight perturbation, so you could argue that it has worse runtime efficiency. This is just due to the fact that these algorithms were implemented by different people, and the author for node perturbation exploited python parallel computation a little less effectively."
    ]
   },
   {
diff --git a/tutorials/W2D3_Microlearning/instructor/W2D3_Tutorial1.ipynb b/tutorials/W2D3_Microlearning/instructor/W2D3_Tutorial1.ipynb
index c77096b96..4eb807a37 100644
--- a/tutorials/W2D3_Microlearning/instructor/W2D3_Tutorial1.ipynb
+++ b/tutorials/W2D3_Microlearning/instructor/W2D3_Tutorial1.ipynb
@@ -926,7 +926,9 @@
     "execution": {}
    },
    "source": [
-    "Below, we provide an implementation of the node perturbation algorithm, so that you will be able to compare it to the weight perturbation algorithm in subsequent sections. Running this code will take around 9 minutes--you can move on to subsequent sections while you wait!"
+    "Below, we provide an implementation of the node perturbation algorithm, so that you will be able to compare it to the weight perturbation algorithm in subsequent sections. Running this code will take around 9 minutes--you can move on to subsequent sections while you wait!\n",
+    "\n",
+    "One important detail: there are two different notions of efficiency we could consider here: 1) sample efficiency and 2) runtime efficiency. Node perturbation is more sample efficient: in general it brings the loss lower with fewer samples than weight perturbation. However, our particular implementation of node perturbation runs a little slower than weight perturbation, so you could argue that it has worse runtime efficiency. This is just due to the fact that these algorithms were implemented by different people, and the author for node perturbation exploited python parallel computation a little less effectively."
    ]
   },
   {
diff --git a/tutorials/W2D3_Microlearning/student/W2D3_Tutorial1.ipynb b/tutorials/W2D3_Microlearning/student/W2D3_Tutorial1.ipynb
index 61a551e82..9ae4736e5 100644
--- a/tutorials/W2D3_Microlearning/student/W2D3_Tutorial1.ipynb
+++ b/tutorials/W2D3_Microlearning/student/W2D3_Tutorial1.ipynb
@@ -900,7 +900,9 @@
     "execution": {}
    },
    "source": [
-    "Below, we provide an implementation of the node perturbation algorithm, so that you will be able to compare it to the weight perturbation algorithm in subsequent sections. Running this code will take around 9 minutes--you can move on to subsequent sections while you wait!"
+    "Below, we provide an implementation of the node perturbation algorithm, so that you will be able to compare it to the weight perturbation algorithm in subsequent sections. Running this code will take around 9 minutes--you can move on to subsequent sections while you wait!\n",
+    "\n",
+    "One important detail: there are two different notions of efficiency we could consider here: 1) sample efficiency and 2) runtime efficiency. Node perturbation is more sample efficient: in general it brings the loss lower with fewer samples than weight perturbation. However, our particular implementation of node perturbation runs a little slower than weight perturbation, so you could argue that it has worse runtime efficiency. This is just due to the fact that these algorithms were implemented by different people, and the author for node perturbation exploited python parallel computation a little less effectively."
    ]
   },
   {