Skip to content
Peter Alvaro edited this page Jan 24, 2013 · 5 revisions

The CS194-017 Entrance Exercise

In this exercise we're going to jump right into distributed programming by writing a replicated auction service. Replicated, because we run multiple copies of the same program to tolerate the failure of individual machines; a service, because it listens for and handles connections from clients; and auction because, well, that's what it's going to do.
Although the basic functionality of the auction service is dead simple, we'll see that replication makes things tricky.

You may implement the server in any language you choose. But no matter how it is implemented, it must present a uniform interface and be correct. The uniform interface -- using the RESTful HTTP style -- will allow any client that can speak HTTP (including our testing scripts) to interact with it. We'll specify in detail below what it means for your auction server to be correct.

RESTful interface

In order for your service to present a convenient interface to client software, it will have to embed a (little) webserver. We'll identify the server's interfaces (specifically, starting auctions, making bids and checking who won) with HTTP resources via URIs. The clients will use HTTP GET requests to these resources when they want to read state at the server, and HTTP POST requests when they want to modify state.

See discussion below for information on libraries that make it easy to embed such a webserver.

Our interface is simple. We want each of the "Resources" in the table below to be accessible via the corresponding HTTP "Method" at a URI corresponding to the Resource name.
For example, if a client on your machine issues an HTTP GET request to http://localhost:8080/winner?name=foo, it should get an HTTP response containing the identifier of the client having the highest bid for auction "foo", if such an auction exists and has ended, or "UNKNOWN" otherwise.

For the resources that take query attributes, the URI would need them appended in the usual manner. E.g. http://localhost:8080/bid?name=foo&client=1234&bid=100.

Resource Method Query Attributes Function Returns
start_auction POST name=, end_time= Start an auction named name, expiring at end_time -
bid POST name=, client=, bid= Place a bid (bid) for client client in active auction name -
status GET name= Check the current winning bidder in auction name The highest bidder in name, or "UNKNOWN" if no bids have been placed
winner GET name= Check the winning bidder for completed auction name "UNKNOWN", or the winning bidder in name
rst POST - Reinitialize server state, clearing all auctions and bids -

Your implementation must accept these HTTP calls and provide HTTP responses as described in the next section. A well-behaved REST server would be expected to support GET calls to the start_auction and bid resources as well, but you don't need to worry about that.

Correctness

We'll specify the correct behavior of the auction server first by saying what it must do (this part is somewhat obvious), then by saying what it must NOT do.

What the auction server MUST do (the obvious stuff)

  • Highest bid wins: after an auction closes, calls to winner return the client identifier with the highest valued bid for the specified auction. Until an auction closes, calls to winner should return "UNKNOWN".

What the auction server must NOT do (the constraints)

During the course of the class, we will learn about many different ways in which things can go wrong in a distributed system, and schemes for handling them.

For the first part of this exercise, we will postpone the issue of replication: you will implement a single server, and we will assume it does not fail on its own. However, we still need to focus on client behavior. Some client actions make sense (they are "legal") and some don't (illegal). We could choose many schemes to handle illegal client requests. For example, we could shut down the server whenever a client sends an illegal action, but this would result in a very fragile service. Instead, the uniform handling mechanism for illegal client requests in this exercise will be to simply ignore them. Thus, although your HTTP server must always return a valid HTTP response, we will not use HTTP error codes to distinguish between legal and illegal requests: all calls should return HTTP 200 (OK).

Illegal client requests

The legal clients requests are listed in the table above. Here are important illegal client requests you need to handle in this exercise:

  1. A start_auction request is illegal if the given name already exists, or if the end_time is in the past.
  2. A bid request is illegal if it refers to a nonexistent auction ( name did not occur in a previous start_auction request), it refers to an auction that is over, or if its bid amount is less than or equal to an existing bid.

Example

Consider a series of interactions taking place over time between a set of clients (bidders) and the auction service.

request number receipt time Resource Method Query Response
0 1357761374 start_auction POST name=foo,end_time=1357761377
1 1357761374 status GET name=foo UNKNOWN
2 1357761374 bid POST name=foo,client=1,bid=100
3 1357761374 bid POST name=foo,client=2,bid=300
4 1357761374 bid POST name=foo,client=1,bid=400
5 1357761374 bid POST name=foo,client=3,bid=100
6 1357761374 status GET name=foo 1
7 1357761374 winner GET name=foo UNKNOWN
8 1357761379 winner GET name=foo 1
9 1357761379 start_auction POST name=foo,end_time=1357761384
10 1357761379 winner GET name=foo 1

Note the following about the time-steps above:

  • Time 1. The auction status for "foo" is unknown before anyone bids.
  • Time 5. Agent 3's bid is ignored because it is lower than existing bids.
  • Time 6. After a period of bidding, a call to status returns "1" because client 1 has the winning bid ($400).
  • Time 7. However, the auction is still running, so calls to winner return "UNKNOWN".
  • Time 8. When the auction closes, calls to winner return "1".
  • Time 9. Someone tries to start auction "foo", but this request is ignored because "foo" already exists.
  • Time 10. Henceforth calls to winner for closed auction "foo" always return "1".

Part II: Replicated auction server

Hopefully you completed the first part of the assignment painlessly. For the second part of this assignment, we will replicate the auction server across three distinct servers ("servers"). Replicating the service makes it more robust, since it can keep running even after the failure of an individual server. Unfortunately, replication can add significant complexity to a service. Updates made at an individual server must propagate to other replicas, giving rise to new illegal behaviors.

During the bidding process, clients can connect to any server for each request. Each server independently processes new bids, accepts them if they are the new maximum for that server, and propagates them to the other replicas. We will make the (rather strong) assumption of synchronized clocks. This means that all servers will close the bidding simultaneously.

A correct replicated auction service behaves like a correct single-site auction service, with the additional requirement that all replicas converge -- that is, agree about the winner when an auction is over. A valid response from any replica must be correct (they return the name of someone who made a maximum bid submitted to any server before closing) and unique (only one name is credited as the Winner, even in the case of "ties" -- two or more names submitted a bid with the maximum value). This property must hold regardless of the server that receives the winner request.

In sum, Part II adds an additional correctness requirement:

  • Agreement: eventually, all replicas return the same responses to calls to winner.
request number server receipt time Resource Method Query Response
0 localhost:8080 1357793915 start_auction POST name=bar,end_time=1357793925
1 localhost:8080 1357793915 bid POST name=bar,client=1,bid=100
2 localhost:8080 1357793915 bid POST name=bar,client=2,bid=300
3 localhost:8080 1357793915 bid POST name=bar,client=1,bid=400
4 localhost:8080 1357793915 bid POST name=bar,client=3,bid=100
5 localhost:8081 1357793915 status GET name=bar 1
6 localhost:8082 1357793915 status GET name=bar UNKNOWN
7 localhost:8081 1357793920 bid POST name=bar,client=3,bid=700
8 localhost:8082 1357793920 bid POST name=bar,client=4,bid=900
9 localhost:8080 1357793930 winner GET name=bar 4
10 localhost:8080 1357793930 winner GET name=bar 4
11 localhost:8080 1357793930 winner GET name=bar 4
  • Time 5. Bids make to the service running at port 8080 have quickly propagated to the service at port 8081
  • Time 6. ...but they are slow propagating to port 8082. This is OK -- the agreement constraint only pertains to calls to winner.
  • Time 9-11. After auction close, all servers agree about the winner.

Fault-Tolerance

Your replicated auction service should be able to tolerate the failure of individual machines.

To simulate failures, our testing scripts will perform a series of bids, sleep for a few seconds to account for propagation delays, and submit a POST request to /rst to reset a random replica.

After this simulated failure, the other replicas should continue to process requests as if the failure had not occurred; auction outcomes should not be affected by the failure.

Unit Tests

We are providing you with some unit tests that verify that a given implementation is (somewhat) correct. They provide a starting point to make sure your code is in good shape, but we'll be subjecting your submission to stronger tests. You don't have to run these unit tests, but (a) they'll help you get confidence in your code, and (b) even if you don't run these, your code will have to work with our final grading tests that use similar infrastructure.

To get the unit tests working in your development environment on AWS, you'll need to first install a recent version of Ruby (I used 1.9.2).

Now that you've got Ruby installed and are in a working directory, you can grab the unit test script

git clone git://github.com/programthecloud/ptcrepo.git
cp ptcrepo/entrance_exam/inquisitor.rb .

Once you are ready to run the unit test, start your server running (listening at port 8080), and then run:

ruby inquisitor.rb ip1 ip2 ip3

You may want to inspect the inquisitor.rb script and enhance it with your own tests.

Logistics

The homework is due Monday, 1/28 at noon. You will not "turn in" your solution; you will simply leave it running on your AWS instances, and we will test it via REST calls. It's your responsibility to make sure the server is running happily on Monday, and to send us the public DNS hostname and port of each of the service instances, as we describe below.

The exam has two parts:

  1. A single-site auction service. Set up a single server implementing the auction REST API.
  2. A replicated auction service. Set up three servers implementing the service and replicating auction information to uphold the agreement property.

To submit your assignment, send an email with the subject 'cs194-017 entrance exam' to [email protected]. The body of the email should contain exactly four lines of text separated by newlines. Each line is a publicly-accessible hostname:port pair (as below). The first line should contain the address of the single-site auction server. The following three contain the addresses of the three instances of the replicated auction service. For example:

Edited by palvaro on Wed, 1/23 at 4pm

To submit the assignment, fill out this form:

https://docs.google.com/spreadsheet/viewform?formkey=dDExd0swa1J6T2pYU2hzbjJ1S1hFeUE6MQ

After filling in your name and email address, provide the addresses of the auction services you have implemented and deployed, in the form

host:port

as in:

ec2-107-22-39-203.compute-1.amazonaws.com:8080

The first entry ("single-site auction server") should contain the address of the single-site auction server. The following three contain the addresses of the three instances of the replicated auction service. Again, it is up to you to ensure that the services are running from submission time until at least a day later.

Using EC2

You are responsible for hosting your own auction service, and may do so in whatever way you prefer. Amazon's EC2 service provides free access to new users--we recommend that you try it:

http://aws.amazon.com/free/

To use EC2's free tier, you'll need to create an Amazon account. Be careful to follow the free tier rules (e.g., make sure to create only ``micro'' instances) to avoid being charged.

Assistance

Online Discussion

We have set up a googlegroup called programthecloud where you can ask and answer each other's questions. The instructors will not patrol this group aggressively during the course of the assignment. But you should feel free as a community to share ideas and answer each others' questions; just please don't share code.

Advice on source languages

You may implement the auction server in your favorite language, but we make the following recommendation:

Choose a sufficiently high-level language to allow you to focus on getting the auction server logic right without being distracted by fussy details like sockets and header parsing. Modern scripting languages like Ruby and Python make it easy to create small servers that respond to web requests.

As a point of reference, we implemented a solution in Ruby using Webrick, and our implementation was under 100 lines of code. We suspect that Python's web.py should allow for an equally simple implementation. Whatever language you choose, it should allow you to implement the exercise with similar code size.