pcli
is a terminal that allows users to run distributed POETS applications
and manage a pstack
deployment. This manual provides an overview of pcli
and some usage examples.
- Usage
- The Basics
- Quick Start
- Processes
- The Process Viewer
- Starting a Process
- Process Management
- Distributed Processes
- The Job Queue
POETS Client (pcli) v0.1
Usage:
pcli.py [options]
Options:
-n --nocolor Disable terminal colors.
-q --quiet Suppress printing banner.
pcli
is built on top of a Python interpreter; commands are implemented as
Python functions and are called using Python's syntax. All of Python, its
standard library modules and package ecosystem are available as well. For
example ...
$ pcli --quiet
pcli> [x**2 for x in range(10)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
pcli> from datetime import datetime
pcli> datetime.now()
datetime.datetime(2018, 11, 26, 14, 21, 57, 46882)
pcli>
In other words, pcli
is essentially the same as the off-the-shelve Python
interpreter available on most systems but with built-in pstack
functions
(that are in fact imported from the module interactive.py
).
To improve user experience, pcli
features syntax colouring, fish-like autosuggestions and
persistent context. Variables of basic data types and command history are retained between
sessions.
You can run a simulation on pcli
using the run
command as follows:
pcli> run("tests/ring-oscillator-01.xml")
{'states': {u'n0': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}, u'n1': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}, u'n2': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}, u'n3': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}}, 'metrics': {u'Exit code': 0, u'Delivered messages': 40}}
(here loading one of the files from the repo's test
directory)
Notice that the output is in the machine-friendly format of
psim
, specifically a JSON object containing
simulation metrics and final device states. The above is therefore very
similar to invoking psim
(with the --quiet
and --result
switches) but
with the crucial difference that it's been executed by one of the engines
connected to the pstack
deployment in use.
One of the design goals of pstack
is to provide a desktop-like POETS
environment. To this end, some concepts are burrowed from POSIX including the
notion of a process as an instance of an executable binary. In pstack
, a
process is an in-memory instance of a POETS XML application.
Processes are in one of two states:
-
Waiting: the process is held in memory but awaits the availability of a sufficient number of suitable engines to start.
-
Running: the process has started; its regions are all being simulated by suitable engines.
Before delving into more commands and usage scenarios, it's probably a good
idea to introduce one of the main features of pcli
, the process viewer, a
full-screen real-time process and resource monitor in the spirit of top
.
Here's what it looks like ...
This tool can be started by typing top
in pcli
. It shows available engines
(three in this case), their resource utilization plus a process table. Changes
in the process table or engine availability/utilization are updated in
real-time.
Having a separate instance of pcli
with the process viewer running may be
helpful to visualize what happens when following the remaining steps in this
guide.
As shown in Quick Start, the command run
is used to start
processes. Here are more ways to use run
...
With no additional arguments other than the XML file, run
will block until
the process terminates then return a simulation result object. This behavior
can be changed by passing async=True
which will make run
return
immediately with a pending computation result called a future (read this
article for background
if you're unfamiliar with futures). The future object can be blocked on using
the function block
to return the simulation result.
Here's an example ...
pcli> future = run("tests/ring-oscillator-01.xml", async=True)
pcli> 1 + 2
3
pcli> block(future)
{'states': {u'n0': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}, u'n1': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}, u'n2': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}, u'n3': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}}, 'metrics': {u'Exit code': 0, u'Delivered messages': 40}}
This mechanism allows users to start processes and move on to do other things
while the process executes (in the unimaginative example above the user
evaluates 1 + 2
).
Application and pd
log messages can be
printed during execution by passing verbose=True
to run
. Here's an example
...
pcli> run("tests/ring-oscillator-01.xml", verbose=True)
[byron.cl.cam.ac.uk] Starting 6177 (region 0) ...
[byron.cl.cam.ac.uk] Region 0 -> Log [device n0, level 1]: counter = 1
[byron.cl.cam.ac.uk] Region 0 -> Log [device n1, level 1]: counter = 1
[byron.cl.cam.ac.uk] Region 0 -> Log [device n2, level 1]: counter = 1
[byron.cl.cam.ac.uk] Region 0 -> Log [device n3, level 1]: counter = 1
(log messages truncated for brevity)
[byron.cl.cam.ac.uk] Region 0 -> Log [device n0, level 1]: counter = 10
[byron.cl.cam.ac.uk] Region 0 -> Log [device n1, level 1]: counter = 10
[byron.cl.cam.ac.uk] Region 0 -> Log [device n2, level 1]: counter = 10
[byron.cl.cam.ac.uk] Region 0 -> Log [device n3, level 1]: counter = 10
[byron.cl.cam.ac.uk] Region 0 -> Exit [device n0]: handler_exit(0) called
[byron.cl.cam.ac.uk] Region 0 -> Metric [Delivered messages]: 40
[byron.cl.cam.ac.uk] Region 0 -> Metric [Exit code]: 0
[byron.cl.cam.ac.uk] Region 0 -> State [n0]: state = 0, counter = 10, toggle_buffer_ptr = 0
[byron.cl.cam.ac.uk] Region 0 -> State [n1]: state = 0, counter = 10, toggle_buffer_ptr = 0
[byron.cl.cam.ac.uk] Region 0 -> State [n2]: state = 0, counter = 10, toggle_buffer_ptr = 0
[byron.cl.cam.ac.uk] Region 0 -> State [n3]: state = 0, counter = 10, toggle_buffer_ptr = 0
[byron.cl.cam.ac.uk] Finished 6177 (region 0) ...
{'states': {u'n0': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}, u'n1': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}, u'n2': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}, u'n3': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}}, 'metrics': {u'Exit code': 0, u'Delivered messages': 40}}
Here, the simulation has been picked up by engine byron.cl.cam.ac.uk
. The
lines prefixed with
[byron.cl.cam.ac.uk] Region 0 ->
are intermediate engine outputs (in this case, the human-friendly output of
psim
) while other lines such as
[byron.cl.cam.ac.uk] Starting 6177 (region 0) ...
[byron.cl.cam.ac.uk] Finished 6177 (region 0) ...
are pd
outputs.
The maximum log level for applications can be specified by passing the level
argument to run
(the default log level is 1). For example ...
pcli> run("tests/ring-oscillator-01.xml", verbose=True, level=0)
[byron.cl.cam.ac.uk] Starting 4 (region 0) ...
[byron.cl.cam.ac.uk] Region 0 -> Exit [device n0]: handler_exit(0) called
[byron.cl.cam.ac.uk] Region 0 -> Metric [Delivered messages]: 40
[byron.cl.cam.ac.uk] Region 0 -> Metric [Exit code]: 0
[byron.cl.cam.ac.uk] Region 0 -> State [n0]: state = 0, counter = 10, toggle_buffer_ptr = 0
[byron.cl.cam.ac.uk] Region 0 -> State [n1]: state = 0, counter = 10, toggle_buffer_ptr = 0
[byron.cl.cam.ac.uk] Region 0 -> State [n2]: state = 0, counter = 10, toggle_buffer_ptr = 0
[byron.cl.cam.ac.uk] Region 0 -> State [n3]: state = 0, counter = 10, toggle_buffer_ptr = 0
[byron.cl.cam.ac.uk] Finished 4 (region 0) ...
{'states': {u'n0': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}, u'n1': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}, u'n2': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}, u'n3': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}}, 'metrics': {u'Exit code': 0, u'Delivered messages': 40}}
The above example shows a useful trick whereby setting level=0
suppresses
application log messages, leaving only engine and pd
outputs. Notice that
specifying level
without using verbose=True
is meaningless (since log
messages won't be collected in the first place).
Note on performance: log messages are relayed through a high-level
slow-performance interface and will therefore slow processes considerably.
It's therefore recommended that you pass verbose=True
only when debugging.
At the moment, pstack
does not support many ways to interact with a process
after it has been started (Application
Engines are a powerful mechanism to
manipulate processes but they must be hooked in during process creation).
Processes can be killed using the command kill
which takes the process
identifier (PID) as a single argument. The PID can be retrieved from the
future object, here's how ...
pcli> future = run("tests/ring-oscillator-01.xml", async=True)
pcli> kill(future.pid)
Explicit PIDs can be passed to kill
too (for example by looking up processes
in the process viewer, or through the command ps
which returns a list of current process PIDs). However, this is discouraged as
it may affect other users.
The examples above described various ways to start processes using run
, all
of which relied on using a single engine per process. pstack
supports
distributed simulations through simulation
regions; device subsets that get mapped to
different engines for distributed execution.
Distributed processes are started by passing the argument rmap
(region map)
to run
and specifying device-region mappings. Here's an example, again using
the ring oscillator application from the tests suite ...
pcli> run("tests/ring-oscillator-01.xml", rmap={"n1": 1}, verbose=True)
[aesop.cl.cam.ac.uk] Starting 4 (region 0) ...
[byron.cl.cam.ac.uk] Starting 4 (region 1) ...
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n0, level 1]: counter = 1
[byron.cl.cam.ac.uk] Region 1 -> Log [device n1, level 1]: counter = 1
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n2, level 1]: counter = 1
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n3, level 1]: counter = 1
[byron.cl.cam.ac.uk] Region 1 -> Log [device n1, level 1]: counter = 2
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n0, level 1]: counter = 2
[byron.cl.cam.ac.uk] Region 1 -> Log [device n1, level 1]: counter = 3
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n2, level 1]: counter = 2
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n3, level 1]: counter = 2
[byron.cl.cam.ac.uk] Region 1 -> Log [device n1, level 1]: counter = 4
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n0, level 1]: counter = 3
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n2, level 1]: counter = 3
[byron.cl.cam.ac.uk] Region 1 -> Log [device n1, level 1]: counter = 5
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n3, level 1]: counter = 3
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n0, level 1]: counter = 4
[byron.cl.cam.ac.uk] Region 1 -> Log [device n1, level 1]: counter = 6
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n2, level 1]: counter = 4
[byron.cl.cam.ac.uk] Region 1 -> Log [device n1, level 1]: counter = 7
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n3, level 1]: counter = 4
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n0, level 1]: counter = 5
[byron.cl.cam.ac.uk] Region 1 -> Log [device n1, level 1]: counter = 8
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n2, level 1]: counter = 5
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n3, level 1]: counter = 5
[byron.cl.cam.ac.uk] Region 1 -> Log [device n1, level 1]: counter = 9
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n0, level 1]: counter = 6
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n2, level 1]: counter = 6
[byron.cl.cam.ac.uk] Region 1 -> Log [device n1, level 1]: counter = 10
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n3, level 1]: counter = 6
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n0, level 1]: counter = 7
[byron.cl.cam.ac.uk] Region 1 -> Received external shutdown command
[byron.cl.cam.ac.uk] Region 1 -> Metric [Delivered messages]: 10
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n2, level 1]: counter = 7
[byron.cl.cam.ac.uk] Region 1 -> Metric [Exit code]: 0
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n3, level 1]: counter = 7
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n0, level 1]: counter = 8
[byron.cl.cam.ac.uk] Region 1 -> State [n1]: state = 0, counter = 10, toggle_buffer_ptr = 0
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n2, level 1]: counter = 8
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n3, level 1]: counter = 8
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n0, level 1]: counter = 9
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n2, level 1]: counter = 9
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n3, level 1]: counter = 9
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n0, level 1]: counter = 10
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n2, level 1]: counter = 10
[aesop.cl.cam.ac.uk] Region 0 -> Log [device n3, level 1]: counter = 10
[aesop.cl.cam.ac.uk] Region 0 -> Exit [device n0]: handler_exit(0) called
[aesop.cl.cam.ac.uk] Region 0 -> Metric [Delivered messages]: 30
[aesop.cl.cam.ac.uk] Region 0 -> Metric [Exit code]: 0
[aesop.cl.cam.ac.uk] Region 0 -> State [n0]: state = 0, counter = 10, toggle_buffer_ptr = 0
[aesop.cl.cam.ac.uk] Region 0 -> State [n2]: state = 0, counter = 10, toggle_buffer_ptr = 0
[aesop.cl.cam.ac.uk] Region 0 -> State [n3]: state = 0, counter = 10, toggle_buffer_ptr = 0
[byron.cl.cam.ac.uk] Finished 4 (region 1) ...
[aesop.cl.cam.ac.uk] Finished 4 (region 0) ...
{'states': {u'n0': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}, u'n1': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}, u'n2': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}, u'n3': {u'state': 0, u'counter': 10, u'toggle_buffer_ptr': 0}}, 'metrics': {u'Exit code': 0, u'Delivered messages': 40}}
This output is very similar to the one produced by a single-engine simulation of the same application in an earlier subsection. Crucially, the machine-friendly output (i.e. the JSON object at the bottom) is identical.
Where the single and distributed versions differ are the human-friendly output; application log and in-simulation messages. You may have already noticed that there are now two distinct prefixes for in-simulation outputs:
[aesop.cl.cam.ac.uk] Region 0 ->
[byron.cl.cam.ac.uk] Region 1 ->
Here indicating that the log messages were produced by two engines, each
simulating one of the regions 0
and 1
.
As explained in simulation regions, regions
are identified by non-zero integers and the default region is 0
. In the
above example, the region map was {"n1": 1}
which placed device n1
in
region 1
while leaving all others in the default region. The process
therefore consisted of two regions and was picked up by two of the available
engines.
It is common for in-simulation log messages to appear out of order in
distributed simulations. Also notice that each engine produced its own
simulation metrics; exit code and number of (locally) delivered messages.
These are combined by pcli
to produce the final simulation result.
By default, available engines take turns in accepting simulation jobs. This
behavior can be constrained by specifying region-engine mapping using the
argument rcon
(region constraints) of the run
command. Here's how ...
run("tests/ring-oscillator-01.xml", rmap={"n1": 1}, rcon={1: "byron.cl.cam.ac.uk"})
Here mandating that region 1
is mapped to the engine byron.cl.cam.ac.uk
.
If the named engine is busy or unavailable, the process will start in a
waiting state until the engine becomes available.
Created processes are pushed to a job queue on Redis in a waiting state. If suitable engines are unavailable (either offline or occupied with other simulations) then the pushed process will wait and start executing whenever engines become available.
Job queuing is handled transparently to the user. The command run
will
behave in the same way whether the process is queued or not (i.e. it will
block until the process finishes or return a future object if async=True
is
passed).