diff --git a/doc/containers.md b/doc/containers.md
index cecf782c0..06c1b29ad 100755
--- a/doc/containers.md
+++ b/doc/containers.md
@@ -152,7 +152,7 @@ pred = mlp:forward(torch.randn(10,3)) -- 2D Tensor of size 10x3 goes through the
                                       -- Each Linear+Reshape module receives a slice of dimension 1
                                       -- which corresponds to a 1D Tensor of size 3.
                                       -- Eventually all the Linear+Reshape modules' outputs of size 2x1
-                                      -- are concatenated alond the 2nd dimension (column space)
+                                      -- are concatenated along the 2nd dimension (column space)
                                       -- to form pred, a 2D Tensor of size 2x10.
 
 > pred
diff --git a/doc/criterion.md b/doc/criterion.md
index 06d97dc25..1c3381843 100644
--- a/doc/criterion.md
+++ b/doc/criterion.md
@@ -657,7 +657,7 @@ prl:add(p1_mlp)
 prl:add(p2_mlp)
 
 -- now we define our top level network that takes this parallel table
--- and computes the pairwise distance betweem the pair of outputs
+-- and computes the pairwise distance between the pair of outputs
 mlp = nn.Sequential()
 mlp:add(prl)
 mlp:add(nn.PairwiseDistance(1))
diff --git a/doc/simple.md b/doc/simple.md
index e18e15d59..bc309e5e4 100755
--- a/doc/simple.md
+++ b/doc/simple.md
@@ -170,7 +170,7 @@ Applies the following transformation to the incoming (optionally) normalized spa
 - `b_i` is a per-feature bias,
 - `x_i_max` is the maximum absolute value seen so far during training for feature `i`.
 
-The normalization of input features is very useful to avoid explosions during training if sparse input values are really high. It also helps ditinguish between the presence and the absence of a given feature.
+The normalization of input features is very useful to avoid explosions during training if sparse input values are really high. It also helps distinguish between the presence and the absence of a given feature.
 
 #### Parameters ####
 - `inputSize` is the maximum number of features.