Skip to content

Commit 8025f82

Browse files
authored
Update Lab3_InformationRetrieval_Solutions.md
1 parent 93c5e51 commit 8025f82

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

Lab3_InformationRetrieval_Solutions.md

+10-10
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ Once this example makes sense to you, make a **copy** of [this spreadsheet](http
3535

3636
| term | tf-raw | tf-wt | df | idf | tf-idf | normalized |
3737
|--------|--------|--------------------|----|------------------|-----------------------|-----------------------------------------------|
38-
| apple | 1 | 1 + log(1) = 1 | 1 | log(2/1) = 0.301 | 1 x 0.301 = 0.301 | 0.301 / sqrt(0.301^2 + 0.301^2 + 0.301^2) = 0.577 |
38+
| apple | 1 | 1 + log(1) = 1 | 1 | log(2/1) = 0.301 | 1 x 0.301 = 0.301 | 0.301 / sqrt(0.301^2 + 0^2 + 0^2) = 1 |
3939

4040
Now let's do our calculations for Doc 1.
4141

@@ -55,36 +55,36 @@ Once this example makes sense to you, make a **copy** of [this spreadsheet](http
5555

5656
Now for the cosine similarity of each document, we take the dot product of the normalized vector for the query and the normalized vector for that document. We get the following:
5757

58-
`Doc1: (0.577 x 0.750) = 0.433`
58+
`Doc1: (1 x 0.750) = 0.750`
5959

6060

61-
`Doc2: (0.577 x 0) = 0`
61+
`Doc2: (1 x 0) = 0`
6262

63-
The highest score belongs to Doc 1, so this is the returned document with a cosine of 0.433.
63+
The highest score belongs to Doc 1, so this is the returned document with a cosine of 0.750.
6464

6565
Now, imagine the IR system has been tracking and logging your previous queries. The last query you searched was “new phone”. In a simplified version of personalized search, the IR system adds “phone” to your one-word query under the hood, so that the final query used is “apple phone”.
6666

6767
2. Which document is returned for the two-word query, “apple phone”, and what is the cosine?
6868
```
6969
Written solution below:
7070
```
71-
Google sheet with the solutions is [here](https://docs.google.com/spreadsheets/d/1bGQbz7Ojwa4_h6Nga310u3zl916KjE2-3IsBPoK9ksI/edit?usp=sharing).
71+
Google sheet with the solutions is [here](https://docs.google.com/spreadsheets/d/1bGQbz7Ojwa4_h6Nga310u3zl916KjE2-3IsBPoK9ksI/edit?gid=1913608939#gid=1913608939).
7272

7373
Our new query is "apple phone", so we only need to change one cell in our sheet: the count of phone in the query.
7474

7575
| term | tf-raw | tf-wt | df | idf | tf-idf | normalized |
7676
|--------|--------|--------------------|----|------------------|-----------------------|-----------------------------------------------|
77-
| apple | 1 | 1 + log(1) = 1 | 1 | log(2/1) = 0.301 | 1 x 0.301 = 0.301 | 0.301 / sqrt(0.301^2 + 0.301^2 + 0.301^2) = 0.577 |
78-
| phone | 1 | 1 + log(1) = 1 | 1 | log(2/1) = 0.301 | 1 x 0.301 = 0.301 | 0.301 / sqrt(0.301^2 + 0.301^2 + 0.301^2) = 0.577 |
77+
| apple | 1 | 1 + log(1) = 1 | 1 | log(2/1) = 0.301 | 1 x 0.301 = 0.301 | 0.301 / sqrt(0.301^2 + 0.301^2 + 0^2) = 0.7071 |
78+
| phone | 1 | 1 + log(1) = 1 | 1 | log(2/1) = 0.301 | 1 x 0.301 = 0.301 | 0.301 / sqrt(0.301^2 + 0.301^2 + 0^2) = 0.7071 |
7979

8080
Keeping the rest of the sheet the same, we again take the dot product of the normalized query vector and the normalized
8181
document vector. We get the following:
8282

83-
`Doc1: (0.577 x 0.750) + (0.577 x 0) = 0.433`
83+
`Doc1: (0.7071 x 0.750) + (0.7071 x 0) + (0 x 0.661) = 0.531`
8484

85-
`Doc2: (0.577 x 0) + (0.577 x 1) = 0.577`
85+
`Doc2: (0.7071 x 0) + (0.7071 x 1) + (0 x 0) = 0.7071`
8686

87-
Now, the highest score belongs to Doc 2, so this is the returned document with a cosine of 1. This cosine value should make sense to you, given the document!
87+
Now, the highest score belongs to Doc 2, so this is the returned document with a cosine of 0.7071. This cosine value should make sense to you, given the document!
8888

8989
We will now go back to the whole class and discuss group answers for Part 1 in a plenary session.
9090

0 commit comments

Comments
 (0)