Skip to content
This repository was archived by the owner on Oct 17, 2024. It is now read-only.

Commit af8ed1d

Browse files
committed
Initial commit
0 parents  commit af8ed1d

24 files changed

+3104
-0
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
solutions/*
2+
.yardoc

.ruby-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.1.2

.tool-versions

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
ruby 3.1.2

README.md

Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
# Image Cache
2+
3+
## Problem Summary
4+
5+
This problem involves writing some Ruby code to implement a filesystem base image cache.
6+
7+
### Expected time
8+
9+
45 minutes.
10+
11+
### Competencies
12+
13+
- Basic Ruby coding skills
14+
- Ability to synthesize requirements and turn those into code
15+
- Understanding of files & IO
16+
- Error handling
17+
- Deterministic hashing and / or uuid generation
18+
- Cache eviction strategies
19+
20+
### Environment
21+
22+
- IDE and Ruby 3 environment. VSCode, IntelliJ or RubyMine free community versions work fine.
23+
- [VSCode](https://code.visualstudio.com/)
24+
- [IntelliJ](https://www.jetbrains.com/idea/)
25+
- [RubyMine](https://www.jetbrains.com/ruby/)
26+
27+
### Resources for candidate
28+
29+
Some starting code is provided.
30+
31+
### Background knowledge
32+
33+
> Kaleido/Canva is a very visual product, there are lots of images involved. Many of our backend services
34+
> need to download these images and do something with them. For example, when downloading a design,
35+
> one of our backend services will download the image and store it on the filesystem while
36+
> processing the download. We’d like to avoid downloading the same image again and again, one
37+
> approach to this is to cache them.
38+
>
39+
> Through the course of the interview, we'd like you to write some Ruby code that caches images
40+
> on the filesystem. The images are uniquely identified by urls.
41+
42+
### Likely clarifying questions:
43+
44+
Q: How much disk space is there?
45+
46+
> A:
47+
> The instances have very large disks. For the purposes of this interview, you can consider the
48+
> instances to have infinite disk capacity. If there is time at the end, we can chat about how we
49+
> might implement a cache that can take disk capacity into consideration.
50+
51+
Q: Does every url have a unique image? Are there duplicate images?
52+
53+
> A:
54+
> No, we can consider a unique url to represent a unique image. We are not concerned with the binary
55+
> content of individual images. We are also not concerned if two different urls reference identical
56+
> image content, in this case we will still save two images on the filesystem. If there is time at
57+
> the end, we can chat about how we might implement a cache that can take duplicate images into
58+
> consideration.
59+
60+
Q: Is there any authentication on the urls?
61+
62+
> A:
63+
> For the purposes of the interview, we can assume that we are able to access all image urls.
64+
65+
Q: What sort of cache-eviction strategy should be used?
66+
67+
> A:
68+
> We will take a look at the code in a minute where the requirements are described in the comments.
69+
> We will initially use a simple lease / release mechanism, but if there is time at the end, we can
70+
> discuss different cache-eviction strategies.
71+
72+
## Session outline
73+
74+
Step 1: Brief walk through the code
75+
Step 2: Implement lease
76+
Step 3: Implement release
77+
78+
All candidates are expected to complete all steps. Better candidates might finish quickly and then
79+
there may be time for additional questions.
80+
81+
### Step 1: Brief walk through the code
82+
83+
Let the candidate open up their IDE and get oriented with the code.
84+
85+
> Open up your IDE and take a look at caches.py. There is an ImageCache class with a lease and
86+
> release method. This the class that we would like you to implement during the interview.
87+
>
88+
> Take a few minutes to read through the comments. Let me know if you have any questions or if
89+
> something isn't clear.
90+
91+
Answer any clarifying questions the candidate might have, but don't let them get too stuck on the
92+
details just yet. Opening up the tests often can be helpful to clarify the requirements.
93+
94+
> Lets take a look at the tests inside tests.py. There are 3 tests. Lets run them quickly. They
95+
> should all fail. If you can get all 3 tests passing, it is a good sign that you are on the right
96+
> track.
97+
>
98+
> Let’s just take a look at the first test. You can see that we are leasing the same image twice.
99+
> And we are asserting that we only need to download the image once, because it is being cached.
100+
>
101+
> Take a look at the other 2 tests to confirm your understanding of the problem.
102+
103+
Again, answer any clarifying questions the candidate may have before showing them files.py
104+
105+
> Take a look at files.py. This is only here for your convenience. There are a few helpful
106+
> utilities for working with files. You don't need to use them.
107+
108+
#### Levelling
109+
110+
- B1: there is no expectation for B1 to ask any clarifying question.
111+
- B2: disk space and some understanding of constraints on the excercise, how do we store the cache?
112+
- B3: question around understanding constraints, some knowledge about how a cache works
113+
114+
### Step 2: Implement lease
115+
116+
> Ok, lets get started on the implementation! Feel free to use Google and ask more clarifying
117+
> questions as you progress.
118+
119+
The candidate should now begin their implementation.
120+
121+
Some things to note:
122+
123+
- If the candidate asks about ImageClient (or image_client) explain that they don't need to
124+
implement any HTTP requests to fetch images. The ImageCache is provided for explanatory purposes
125+
and is mocked out in the tests. Hopefully, they are familiar with IoC or dependency injection.
126+
- Using a hard-coded location such as "/tmp" to store images is acceptable and makes it easy to
127+
debug issues. Instead of a hardcoded location, the base path for downloaded images can also be
128+
injected into the constructor.
129+
- The candidate should realise that they need a unique name for each file that is downloaded. They
130+
should hopefully also realise that the image’s URL is a good candidate for a unique name if it
131+
wasn’t for the special characters used. Ideally, a candidate would encode or hash the url so it is
132+
safe to use as a filename. Other solutions might include mapping URL to some unique id.
133+
- A good candidate should recognise that exceptions can occur in the implementation. If a candidate
134+
provides a solution without error handling, ask them where they think exceptions can happen and to
135+
add some error handling. ImageCacheException is provided Exception for the candidate to use.
136+
- If the candidate runs the tests after implementing lease, the first two tests should probably be
137+
passing.
138+
139+
Once lease is implemented and you are satisfied, move on to the implementation of release.
140+
141+
#### Levelling
142+
143+
- B1: no additional expectations apart from passing the interview.
144+
- B2: unique id generation and url/mapping, interface usage -> being familiar enough with dependency injection, error handling (not implementing exceptions in particular, but having a discussion around errors)
145+
- B3: edge cases, performance considerations
146+
- C1: B3 same expectations as B3
147+
- C2: B3 same expectations as B3
148+
149+
### Step 3: Implement release
150+
151+
> Now let's implement release!
152+
153+
Implementing release is normally straightforward after implementing lease. Even if the candidate
154+
didn't realise they need to keep a counter of some sort for the leases to know when to delete the
155+
images.
156+
157+
Some things to note:
158+
159+
- If the candidate recognises that it might be a poor implementation to delete files synchronously
160+
during the release (it might be better in reality to delete them asynchronously), agree, but explain
161+
that we want the naive implementation for the purposes of the interview.
162+
- The candidate should realise that well-behaved clients should not call release before lease and
163+
that it indicates an error condition. If not, raise this topic and ask them to add some error
164+
handling. A good candidate might also add a test for this.
165+
166+
#### Levelling
167+
168+
- B1: no additional expectations apart from passing the interview.
169+
- B2: after changing release they should change lease if necessary without additional prompting
170+
- B3: tests are running, notice that you need some sort of counter (multiple clients), add more tests
171+
- C1: B3 same expectations as B3
172+
- C2: B3 same expectations as B3
173+
174+
### Bonus discussion questions
175+
176+
These are some deeper discussion questions to get further signals:
177+
178+
- The current implementation of the cache is a bit naive and some images are leased again shortly
179+
after they have been released. This causes extra downloads to occur. Is there a way we can keep an
180+
image in the cache, even if no one holds a lease? Note: Now is a good time to discuss [cache
181+
eviction strategies](https://en.wikipedia.org/wiki/Cache_replacement_policies).
182+
- If it didn't come up during the implementation, point out the image_client in the constructor of
183+
ImageCache and ask them if the candidate recognises the pattern. Dependency Injection. Discuss why
184+
it is useful.
185+
- What can we do to ensure that the cache is warm after a restart occurs? The simple solution is
186+
to traverse the directory where cached files are stored and loading the cache up with those files.
187+
How might one know which file maps to which url?
188+
- How might we design the cache if the file system has limited space. What implications does this
189+
have? This should raise questions like "What should we do when there is no more space left to
190+
download?". Throwing an exception is fine here.
191+
- How might we design the system if we want to optimise for filesystem space by de-duplicating
192+
images? Possibly content-address the cache by hashing the file contents. What are the pros vs cons?

0 commit comments

Comments
 (0)