|
| 1 | +# Image Cache |
| 2 | + |
| 3 | +## Problem Summary |
| 4 | + |
| 5 | +This problem involves writing some Ruby code to implement a filesystem base image cache. |
| 6 | + |
| 7 | +### Expected time |
| 8 | + |
| 9 | +45 minutes. |
| 10 | + |
| 11 | +### Competencies |
| 12 | + |
| 13 | +- Basic Ruby coding skills |
| 14 | +- Ability to synthesize requirements and turn those into code |
| 15 | +- Understanding of files & IO |
| 16 | +- Error handling |
| 17 | +- Deterministic hashing and / or uuid generation |
| 18 | +- Cache eviction strategies |
| 19 | + |
| 20 | +### Environment |
| 21 | + |
| 22 | +- IDE and Ruby 3 environment. VSCode, IntelliJ or RubyMine free community versions work fine. |
| 23 | +- [VSCode](https://code.visualstudio.com/) |
| 24 | +- [IntelliJ](https://www.jetbrains.com/idea/) |
| 25 | +- [RubyMine](https://www.jetbrains.com/ruby/) |
| 26 | + |
| 27 | +### Resources for candidate |
| 28 | + |
| 29 | +Some starting code is provided. |
| 30 | + |
| 31 | +### Background knowledge |
| 32 | + |
| 33 | +> Kaleido/Canva is a very visual product, there are lots of images involved. Many of our backend services |
| 34 | +> need to download these images and do something with them. For example, when downloading a design, |
| 35 | +> one of our backend services will download the image and store it on the filesystem while |
| 36 | +> processing the download. We’d like to avoid downloading the same image again and again, one |
| 37 | +> approach to this is to cache them. |
| 38 | +> |
| 39 | +> Through the course of the interview, we'd like you to write some Ruby code that caches images |
| 40 | +> on the filesystem. The images are uniquely identified by urls. |
| 41 | +
|
| 42 | +### Likely clarifying questions: |
| 43 | + |
| 44 | +Q: How much disk space is there? |
| 45 | + |
| 46 | +> A: |
| 47 | +> The instances have very large disks. For the purposes of this interview, you can consider the |
| 48 | +> instances to have infinite disk capacity. If there is time at the end, we can chat about how we |
| 49 | +> might implement a cache that can take disk capacity into consideration. |
| 50 | +
|
| 51 | +Q: Does every url have a unique image? Are there duplicate images? |
| 52 | + |
| 53 | +> A: |
| 54 | +> No, we can consider a unique url to represent a unique image. We are not concerned with the binary |
| 55 | +> content of individual images. We are also not concerned if two different urls reference identical |
| 56 | +> image content, in this case we will still save two images on the filesystem. If there is time at |
| 57 | +> the end, we can chat about how we might implement a cache that can take duplicate images into |
| 58 | +> consideration. |
| 59 | +
|
| 60 | +Q: Is there any authentication on the urls? |
| 61 | + |
| 62 | +> A: |
| 63 | +> For the purposes of the interview, we can assume that we are able to access all image urls. |
| 64 | +
|
| 65 | +Q: What sort of cache-eviction strategy should be used? |
| 66 | + |
| 67 | +> A: |
| 68 | +> We will take a look at the code in a minute where the requirements are described in the comments. |
| 69 | +> We will initially use a simple lease / release mechanism, but if there is time at the end, we can |
| 70 | +> discuss different cache-eviction strategies. |
| 71 | +
|
| 72 | +## Session outline |
| 73 | + |
| 74 | +Step 1: Brief walk through the code |
| 75 | +Step 2: Implement lease |
| 76 | +Step 3: Implement release |
| 77 | + |
| 78 | +All candidates are expected to complete all steps. Better candidates might finish quickly and then |
| 79 | +there may be time for additional questions. |
| 80 | + |
| 81 | +### Step 1: Brief walk through the code |
| 82 | + |
| 83 | +Let the candidate open up their IDE and get oriented with the code. |
| 84 | + |
| 85 | +> Open up your IDE and take a look at caches.py. There is an ImageCache class with a lease and |
| 86 | +> release method. This the class that we would like you to implement during the interview. |
| 87 | +> |
| 88 | +> Take a few minutes to read through the comments. Let me know if you have any questions or if |
| 89 | +> something isn't clear. |
| 90 | +
|
| 91 | +Answer any clarifying questions the candidate might have, but don't let them get too stuck on the |
| 92 | +details just yet. Opening up the tests often can be helpful to clarify the requirements. |
| 93 | + |
| 94 | +> Lets take a look at the tests inside tests.py. There are 3 tests. Lets run them quickly. They |
| 95 | +> should all fail. If you can get all 3 tests passing, it is a good sign that you are on the right |
| 96 | +> track. |
| 97 | +> |
| 98 | +> Let’s just take a look at the first test. You can see that we are leasing the same image twice. |
| 99 | +> And we are asserting that we only need to download the image once, because it is being cached. |
| 100 | +> |
| 101 | +> Take a look at the other 2 tests to confirm your understanding of the problem. |
| 102 | +
|
| 103 | +Again, answer any clarifying questions the candidate may have before showing them files.py |
| 104 | + |
| 105 | +> Take a look at files.py. This is only here for your convenience. There are a few helpful |
| 106 | +> utilities for working with files. You don't need to use them. |
| 107 | +
|
| 108 | +#### Levelling |
| 109 | + |
| 110 | +- B1: there is no expectation for B1 to ask any clarifying question. |
| 111 | +- B2: disk space and some understanding of constraints on the excercise, how do we store the cache? |
| 112 | +- B3: question around understanding constraints, some knowledge about how a cache works |
| 113 | + |
| 114 | +### Step 2: Implement lease |
| 115 | + |
| 116 | +> Ok, lets get started on the implementation! Feel free to use Google and ask more clarifying |
| 117 | +> questions as you progress. |
| 118 | +
|
| 119 | +The candidate should now begin their implementation. |
| 120 | + |
| 121 | +Some things to note: |
| 122 | + |
| 123 | +- If the candidate asks about ImageClient (or image_client) explain that they don't need to |
| 124 | + implement any HTTP requests to fetch images. The ImageCache is provided for explanatory purposes |
| 125 | + and is mocked out in the tests. Hopefully, they are familiar with IoC or dependency injection. |
| 126 | +- Using a hard-coded location such as "/tmp" to store images is acceptable and makes it easy to |
| 127 | + debug issues. Instead of a hardcoded location, the base path for downloaded images can also be |
| 128 | + injected into the constructor. |
| 129 | +- The candidate should realise that they need a unique name for each file that is downloaded. They |
| 130 | + should hopefully also realise that the image’s URL is a good candidate for a unique name if it |
| 131 | + wasn’t for the special characters used. Ideally, a candidate would encode or hash the url so it is |
| 132 | + safe to use as a filename. Other solutions might include mapping URL to some unique id. |
| 133 | +- A good candidate should recognise that exceptions can occur in the implementation. If a candidate |
| 134 | + provides a solution without error handling, ask them where they think exceptions can happen and to |
| 135 | + add some error handling. ImageCacheException is provided Exception for the candidate to use. |
| 136 | +- If the candidate runs the tests after implementing lease, the first two tests should probably be |
| 137 | + passing. |
| 138 | + |
| 139 | +Once lease is implemented and you are satisfied, move on to the implementation of release. |
| 140 | + |
| 141 | +#### Levelling |
| 142 | + |
| 143 | +- B1: no additional expectations apart from passing the interview. |
| 144 | +- B2: unique id generation and url/mapping, interface usage -> being familiar enough with dependency injection, error handling (not implementing exceptions in particular, but having a discussion around errors) |
| 145 | +- B3: edge cases, performance considerations |
| 146 | +- C1: B3 same expectations as B3 |
| 147 | +- C2: B3 same expectations as B3 |
| 148 | + |
| 149 | +### Step 3: Implement release |
| 150 | + |
| 151 | +> Now let's implement release! |
| 152 | +
|
| 153 | +Implementing release is normally straightforward after implementing lease. Even if the candidate |
| 154 | +didn't realise they need to keep a counter of some sort for the leases to know when to delete the |
| 155 | +images. |
| 156 | + |
| 157 | +Some things to note: |
| 158 | + |
| 159 | +- If the candidate recognises that it might be a poor implementation to delete files synchronously |
| 160 | + during the release (it might be better in reality to delete them asynchronously), agree, but explain |
| 161 | + that we want the naive implementation for the purposes of the interview. |
| 162 | +- The candidate should realise that well-behaved clients should not call release before lease and |
| 163 | + that it indicates an error condition. If not, raise this topic and ask them to add some error |
| 164 | + handling. A good candidate might also add a test for this. |
| 165 | + |
| 166 | +#### Levelling |
| 167 | + |
| 168 | +- B1: no additional expectations apart from passing the interview. |
| 169 | +- B2: after changing release they should change lease if necessary without additional prompting |
| 170 | +- B3: tests are running, notice that you need some sort of counter (multiple clients), add more tests |
| 171 | +- C1: B3 same expectations as B3 |
| 172 | +- C2: B3 same expectations as B3 |
| 173 | + |
| 174 | +### Bonus discussion questions |
| 175 | + |
| 176 | +These are some deeper discussion questions to get further signals: |
| 177 | + |
| 178 | +- The current implementation of the cache is a bit naive and some images are leased again shortly |
| 179 | + after they have been released. This causes extra downloads to occur. Is there a way we can keep an |
| 180 | + image in the cache, even if no one holds a lease? Note: Now is a good time to discuss [cache |
| 181 | + eviction strategies](https://en.wikipedia.org/wiki/Cache_replacement_policies). |
| 182 | +- If it didn't come up during the implementation, point out the image_client in the constructor of |
| 183 | + ImageCache and ask them if the candidate recognises the pattern. Dependency Injection. Discuss why |
| 184 | + it is useful. |
| 185 | +- What can we do to ensure that the cache is warm after a restart occurs? The simple solution is |
| 186 | + to traverse the directory where cached files are stored and loading the cache up with those files. |
| 187 | + How might one know which file maps to which url? |
| 188 | +- How might we design the cache if the file system has limited space. What implications does this |
| 189 | + have? This should raise questions like "What should we do when there is no more space left to |
| 190 | + download?". Throwing an exception is fine here. |
| 191 | +- How might we design the system if we want to optimise for filesystem space by de-duplicating |
| 192 | + images? Possibly content-address the cache by hashing the file contents. What are the pros vs cons? |
0 commit comments