notes/socket_experiments.txt

(This file discusses some of the modules in the Socket sample that are
not explained in samples/Socket/README.  You should read that README
before reading this.)

The stepper_d module

(The stepper_d module was the first socket stepper we wrote and was
originally named just stepper, but has been renamed to stepper_d to
indicate that it is no longer the preferred stepper for this sample.
The module formerly named stepper_s has been renamed to stepper and is now
the preferred stepper recommended for this sample.)

Here is how the stepper_d module executes and checks test runs:  For
each action, the test runner pmt calls the action in the model (for
example recv_call) and collects the result (the return value, msg)
computed by the model.  Then pmt passes the action and the result to
stepper_d.  Then stepper_d calls the corresponding message (s.recv) in
the implementation, and collects the result (return value) computed by
the implementation (named data here to distinguish it from msg, from
the model).  Then stepper_d compares the two results.  If they are
equal (checked by the Python == operator), stepper_d returns None to
pmt to indicate that the test should continue; if not, stepper_d
constructs a string describing the failure and returns that string to
pmt to indicate that the test failed.  This stepper_d deals with the
split actions calling the implementation only on the _return actions.
This stepper_d handles the _call actions by simply storing the action
arguments locally until it is time to invoke the corresponding _return
action.

Here stepper_d itself determines whether each step in the test passes
or fails, and, if it fails, constructs the message describing the
failure.

This stepper_d organization is straightforward but it results in
serious limitations.  This stepper_d cannot handle nondeterminism nor
asynchrony.  This stepper_d expects that the connection will always
accept the entire message passed to send and then return immediately.
It expects that the next call to recv at the other end of the
connection will return the entire message immediately, and this
returned message is an exact copy of the message that was sent.  Any
other behaviors are considered test failures.  These are serious
limitations but stepper_d often works well enough, because real
sockets usually behave this way when when small messages are sent
across a fast network.  

The style of stepper_d follows a pattern that works well for
deterministic, synchronous sytems -- which sockets are not.  This
stepper_d is included to here to show how this pattern can be used
-- with some limitations -- even with nondeterministic, asynchronous
systems.  However, this style is not recommended for such systems.
Below we discuss stepper_o, stepper_a, and stepper, which all use a
different approach that works better here, and supports all model
behaviors correctly.

The stepper_util test harness utilities

The stepper_util module defines constants and methods used by all
steppers, including methods to open and close sockets and to make a
connection.  The stepper modules only handle the send and recv
methods --- those are the only operations where the steppers differ.


The test_stepper_d script

The test_stepper_d module is a test script with three test cases that
reveal some of the limitations of this stepper.  To run these tests,
type the command: trun test_stepper_d.  None of the test
cases in this script uses the -s option to set the random seed, so
repeating this command will result in different test runs.

The first test case runs this command: pmt -n 10 -c 6 msocket -i
stepper_d.  In this test case model behavior is not restricted by any
scenario machine, so _call actions are not always followed immediately
by _return actions.  However, the test does not block because
stepper_d does not call the socket implementation until the _return
action appears in the trace.  This test case does not use any
configuration modules to limit nondeterminism, so recv often returns
only part of the message provided by send.  But with messages this
small, the implementation usually returns the entire message - so the
behaviors of the model and the implementation differ, even though both
are correct.  This is the essence of nondeterminism: different
behaviors can all be correct.  But stepper_d does not handle
nondeterminism, so any differences between the behaviors of the model
and the implementation are reported as test failures.  So usually this
test case fails, due to limitations in stepper_d, not errors in the
model or the implementation.  The next two test cases attempt to remedy
this by limiting the behavior of the model.

The second test case runs this command: pmt -n 10 -c 6 msocket
synchronous -i stepper_d.  This command composes the msocket model with
the synchronous scenario, so it generates test runs where _call is
always immediately followed by _return.  However it does not limit
nondeterminism so usually this test case fails due to limitations
of stepper_d.

The third and final test case runs this command: pmt -n 10 -c 6
msocket deterministic synchronous -i stepper_d.  This limits the model
to behaviors that are synchronous (_call is always immediately
followed by _return) and deterministic (the entire message sent is
always immediately received).  Now the model satisfies all the
assumptions of stepper_d.  Moreover, with messages this small (they
are defined by the domains in the mstepper and deterministic modules),
the implementation is also synchronous and deterministic.  Therefore
the model and implementation behave exactly the same, so these test runs 
succeed.  


The msgsizes test suite

The msgsizes module is a test suite that investigates how socket
behavior depends on message size.

Recall that stepper_d assumes that socket behavior is synchronous
and deterministic.  We observe that this assumption is usually
satisfied when sending small messages.  How large a message does it
take to break this assumption?  The msgsizes module contains a test
suite (a collection of test runs) with increasingly larger messages,
to find how large a message must be so it is no longer completely
transmitted by one send.  When that happens, the test will fail.  The
message sizes range from 8192 characters (8K) up to 1024K (1M).

The first two test runs in this suite are intended to fail - the sent
message and the received message differ by one character (as if that
character had been corrupted during transmission).  This behavior is
not permitted by the model.  The last test run in this suite is
expected to fail with stepper_d, because only part of the sent
message is received.  


The test_analyze_msgsize script

The test_analyze_msgsize module is a test script that uses the PyModel
analyzer pma to check the test suite in the msgsizes module.  It
checks that the first two runs violate the msocket model, and all the
rest satisfy it.  Composing a test suite with a model is the usual way
to check that that the runs in the test suite are allowed by the model.
This can be used to validate either the test suite or the model
(depending on which is more trusted).  Here we are using it to
validate the msgsizes test suite.

This script contains one test case, that runs this command:

 pmv -o msgsizesFSM msgsizes msocket all_observables -l name -xy

Recall that the PyModel viewer pmv invokes the analyzer pma and the
graphics programs pmg and dot.  This command forms the composition of
msgsizes and msocket.  The all_observables module declares that all
actions in the model are observable actions.  In this context, that
means that the analyzer pma uses the action arguments in the test
suite and ignores the arguments defined in the domains in the model
program.  (The -l and -xy options reduce clutter in the generated
graphics.)

Running the test script (type: trun test_analyze msgsize) generates
msgsizesFSM.svg, which you can display in a browser.  It shows a graph
of every test run in the suite.  As intended, the first two runs are
not permitted by the model.  The graph of each of these two runs does
not include the final recv_return action, which is forbidden because
its message argument violates the model --- each contains one
erroneous character that differs from the message passed to send_call.
The final state in these runs is colored yellow because it is not an
accepting state (because it does not reach the final state of the run
in the test suite).  As intended, all of the other runs are permitted
by the model.  The final state in all runs (but the last) is colored
green, because it is an accepting state (it reaches the final state of
the run in the test suite).  The final state in the last run is
colored yellow.  It reaches the final state in the test suite, but it
is not an accepting state in the model (because part of the message is
still in transit - but this is intended here).


The test_msgsize script

Now that we have validated the test suite, we can execute it.  The
test_msgsize script is a test script that uses the PyModel tester pmt
to execute the runs in the msgsizes test suite with the Python
standard library socket module, with the aid of the stepper_d test
harness.

Execute the test suite: trun test_msgsize.  The first two runs fail,
as they should, because the test suite claims that the receive message
will contain an erroneous character, but the message actually received
by the real socket is a perfect copy of the sent message.  The next
runs, with larger and larger messages, all succeed.  Finally the
message becomes too large to send all of it in a single call and that
test case fails.  On my system (MacBook Pro, Mac OS X 10.7.5) it fails
on run number 9 (the tenth run), where the message size reaches 1
megabyte! (The failure is a timeout, indicating that the real socket
send call blocked).

The final call also fails, because the run in the test suite receives
only part of the message that was sent.  This behavior is permitted by
the model, but the real socket receives the entire message.  This
stepper_d considers any difference between the test suite and the
implementation to be a test failure.


The stepper_o observable stepper module

The stepper_o module is a stepper (test harness or adapter) that
supports nondeterminism but not asynchrony.  This stepper
distinguishes between controllable and observable actions (The _o
suffix is for "observable").  Controllable actions can be invoked in
the implementation by the test runner pmt (via stepper_o).  Here
send_call and recv_call are the controllable actions.  Observable
actions cannot be invoked by the test runner; they are generated by
the implementation and passed back to pmt (via stepper_o).  Here
send_return and recv_return are the observable actions.  The
observable actions must be identified in the model program module or a
configuration module; here they are indentified in the configuration
module observables.py.

(The following three paragraphs appear in the README.txt section "The
stepper module", in a slightly different form)

This stepper_o executes and checks test runs in this way: For each
controllable action (for example recv_call), the test runner pmt calls
the action in the msocket model (which may update the model state).
Then it passes the action to stepper_o, which calls the corresponding
method (s.recv) in the implementation.  Then stepper_o collects the
result (data) from the implementation and constructs the corresponding
observable action (recv_return) where the result appears as an
argument.  Then stepper_o appends this action to observation_queue,
where pmt can retrieve it.  Then pmt finds that action in the
observation queue, removes it, and checks it (including the msg
argument) with its enabling condition (recv_return_enabled).  If the
enabling condition returns False, the test fails.  If the enabling
condition returns True, pmt executes the observable action in the
model (possibly updating the model state) and the test continues.  If
the test run reaches the end and none of the actions have failed, the
test passes.

Notice that stepper_o does *not* determine whether each step in the
test passes or fails, and it does not construct the message describing
the failure.  Instead, stepper_o merely collects the response,
constructs and observable action, and passes it back to pmt.  It is
the test runner pmt that determines whether the test passes or fails,
and writes the message describing the failure.  This is a very
important difference between stepper_o and stepper_d.  This
organization delegates more work to the test runner pmt, where it need
not be rewritten for each implementation.  It is also more general,
since it handles both nondeterminism and concurrency in the enabling
conditions for the observable actions, so it is recommended for
systems that exhibit nondeterminism and concurrency.  This style does
require the model to be written with split actions, which might be a
disadvantage for simple systems.

This stepper_o (or any other stepper that uses observable actions)
cannot be driven from a test suite (which specifies every action in
the test run), but can only be used for on-the-fly testing (where the
implemention generates the observable actions).


The test_stepper_o script

The test_stepper_o script demonstrates stepper_o.  It is similar to
test_stepper, but with the revisions needed to work with stepper_o.
The first test case runs this command:

 pmt -n 10 -c 6 -t 5 msocket observables -i stepper_o

Here -t 5 specifies a 5-second timeout if a call blocks (otherwise the
test could hang indefinitely).  The observables configuration module specifies
that send_return and recv_return are observable actions, so pmt will only
invoke send_call and recv_call.

The first test case does not use the synchronous scenario so it often
blocks, times out, and fails.  The second test case uses the
synchronous scenario so it never blocks and always succeeds.  With our
test_stepper, the second test case often failed because the results
were nondeterministically chosen by the model, but here it always
succeeds because the return values are chosen by the implementation
rather than the model, and the test outcomes are determined by
checking those return values with the enabling conditions in the
model.  The third case always succeeds for the same reason.  The third
test case names the deterministic configuration, but it is not
necessary because here the return values are chosen by the
implemenation, not the model.


The stepper_a asynchronous stepper module

The stepper_a module supports nondeterminism and asynchrony.  It is
similar to stepper_o, but here each call to the implementation runs in
its own thread, so the implementation can block at that call without
blocking the whole test run.  This enables pmt to proceed and make
another call that results in unblocking the earlier call.


The test_stepper_a script

The test_stepper_a module is a test script that contains a single test
case that invokes the msocket model with the stepper_a asynchronous
stepper module.  With stepper_a, it is not necessary to compose
msocket with the sychronous scenario to force send (at one end of the
connection) to always precede recv (at the other end), or to force
each call to return (at one end) before making a another call (at the
other end).  Here stepper_a can handle any ordering of send and recv,
call and return.

The test case does not use the -s option to set the random seed, so repeating
the test will result in different traces.  Here is the beginning of one trace.
Notice the interleaving of send and recv, call and return:

Jonathans-MacBook-Pro:Socket jon$ trun test_stepper_a
...
Server accepts connection from  ('127.0.0.1', 51559)
send_call('a',)
recv_call(4,)
send_return(1,)
recv_return('a',)
...

Sometimes this test fails!  For example

Jonathans-MacBook-Pro:Socket jon$ trun test_stepper_a
...
Server accepts connection from  ('127.0.0.1', 51555)
recv_call(4,)
send_call('a',)
  0. Failure at step 2, recv_return('a',), action not enabled

Here stepper_a detects that in the implementation, recv returned the
message 'a'.  Here pmt calls this a failure because the model forbids
this behavior (see msocketFSM.svg for the behaviors it allows).  In
particular, the model only allows recv to return (part of) a message
*after* the send for that message returns, but here recv returns the
'a' message before the send that sent it returns.  It is not clear
whether the socket connection actually behaves that way.  Recall that
pmt actually sees the behavior of the implementation *as reported by
the stepper*.  Here the stepper starts two threads to monitor the
socket connection (one for send and one for recv), which are scheduled
(to run one at a time) by the Python run-time system.  It seems
possible that, at the socket connection itself, send actually returns
before recv, but the scheduler runs the thread that reports when recv
returns before the thread that reports when send returns.

The stepper asynchronous stepper module

(The stepper module was originally called stepper_s, but was renamed
to just stepper to indicate that it is now the preferred stepper
module for the socket sample.)

(Material that was here was removed to the new README.txt, beginning
with a section titled 'The stepper module'.  We only the following
paragraphs here.)

The stepper module and test_stepper script are similar to
stepper_a and test_stepper_a -- they are all designed to handle runs
where calls to recv at one end of the connection can precede calls to
send at the other end, and where calls and returns at the two ends can
interleave.   

However, with stepper and test_stepper we do *not* see any test
failures like we do with stepper_a and test_stepper_a, where stepper_a
reports interleaving in an order that the model forbids, where
recv_return returns a message before the originating send returns.
Since stepper never reports this, this supports our hypothesis that
the forbidden interleaving is an artifact of the way the threads in
stepper_a are scheduled.  It seems that in this sample, select is a better
way to support asynchrony than threads.


Revised Mar 2013