You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -55,36 +55,36 @@ Once this example makes sense to you, make a **copy** of [this spreadsheet](http
55
55
56
56
Now for the cosine similarity of each document, we take the dot product of the normalized vector for the query and the normalized vector for that document. We get the following:
57
57
58
-
`Doc1: (0.577 x 0.750) = 0.433`
58
+
`Doc1: (1 x 0.750) = 0.750`
59
59
60
60
61
-
`Doc2: (0.577 x 0) = 0`
61
+
`Doc2: (1 x 0) = 0`
62
62
63
-
The highest score belongs to Doc 1, so this is the returned document with a cosine of 0.433.
63
+
The highest score belongs to Doc 1, so this is the returned document with a cosine of 0.750.
64
64
65
65
Now, imagine the IR system has been tracking and logging your previous queries. The last query you searched was “new phone”. In a simplified version of personalized search, the IR system adds “phone” to your one-word query under the hood, so that the final query used is “apple phone”.
66
66
67
67
2. Which document is returned for the two-word query, “apple phone”, and what is the cosine?
68
68
```
69
69
Written solution below:
70
70
```
71
-
Google sheet with the solutions is [here](https://docs.google.com/spreadsheets/d/1bGQbz7Ojwa4_h6Nga310u3zl916KjE2-3IsBPoK9ksI/edit?usp=sharing).
71
+
Google sheet with the solutions is [here](https://docs.google.com/spreadsheets/d/1bGQbz7Ojwa4_h6Nga310u3zl916KjE2-3IsBPoK9ksI/edit?gid=1913608939#gid=1913608939).
72
72
73
73
Our new query is "apple phone", so we only need to change one cell in our sheet: the count of phone in the query.
Keeping the rest of the sheet the same, we again take the dot product of the normalized query vector and the normalized
81
81
document vector. We get the following:
82
82
83
-
`Doc1: (0.577 x 0.750) + (0.577 x 0) = 0.433`
83
+
`Doc1: (0.7071 x 0.750) + (0.7071 x 0) + (0 x 0.661) = 0.531`
84
84
85
-
`Doc2: (0.577 x 0) + (0.577 x 1) = 0.577`
85
+
`Doc2: (0.7071 x 0) + (0.7071 x 1) + (0 x 0) = 0.7071`
86
86
87
-
Now, the highest score belongs to Doc 2, so this is the returned document with a cosine of 1. This cosine value should make sense to you, given the document!
87
+
Now, the highest score belongs to Doc 2, so this is the returned document with a cosine of 0.7071. This cosine value should make sense to you, given the document!
88
88
89
89
We will now go back to the whole class and discuss group answers for Part 1 in a plenary session.
0 commit comments