© Barcelona Supercomputing Center 2016 - 2018
The TANGO Programming Model and Runtime Abstraction Layer is a combination of the BSC's COMPSs and OmpSs task-based programming models, where COMPSs is dealing with the coarse-grain tasks and platform-level heterogeneity and OmpSs is dealing with fine-grain tasks and node-level heterogeneity.
- Supported platforms running Linux (i386, x86-64, ARM, PowerPC or IA64)
- Apache maven 3.0 or better
- Java SDK 8.0 or better
- GNU C/C++ compiler versions 4.4 or better
- CNU GCC Fortran
- autotools (libtool, automake, autoreconf, make)
- boost-devel
- python-devel 2.7 or better
- GNU bison 2.4.1 or better.
- GNU flex 2.5.4 or 2.5.33 or better. (Avoid versions 2.5.31 and 2.5.34 of flex as they are known to fail. Use 2.5.33 at least.)
- GNU gperf 3.0.0 or better.
- SQLite 3.6.16 or better.
- xdg-utils package
- graphviz package
- libxml2-devel 2.5.0 or better
- gcc-fortran
- papi-devel (sugested)
To install the whole framework you just need to checkout the code and run the following command
$ git clone --recursive https://github.com/TANGO-Project/programming-model-and-runtime.git
$ cd programming-model-and-runtime/
$ ./install.sh <Installation_Prefix> [options]
#Examples
#User local installation
$./install.sh $HOME/TANGO --no-monitor --no-tracing
#System installation
$ sudo -E ./install.sh /opt/TANGO
If you want to re-install only one part (COMPSs or OmpSs) you have to define the PM_BUILD environment variable before running the script. Values for the variable are ONLY_COMPSS, ONLY_OMPSS or ALL. Not defining de variable has the same effect as ALL
#User local installation
$ PM_BUILD=ONLY_COMPSS ./install.sh $HOME/TANGO --no-monitor --no-tracing
#System installation
$ PM_BUILD=ONLY_COMPSS sudo -E ./install.sh /opt/TANGO
To develop an application with the TANGO programming model, developers has to at least implement 3 files: the application main workflow in appName.c/cc, the application functions which are going to be coarse-grain tasks in appName.idl, and the implementation of the functions in appName-functions.cc. Other application files can be included in a src folder providing the building configuration in a Makefile
- appName.c/cc -> Contains the main coarse-grain task workflow
- appName.idl -> Coarse-grain task definition
- appName-functions.c/cc -> Implementation of the coarse grain tasks
To define a coarse-grain task which contains fine-grain tasks, developers have to annotate the coarse-grain fucntions with the OmpSs pragmas.
More information about how to define coarse-grain tasks and other concerns when implementing a coarse-grain task workflow can be found in http://compss.bsc.es/releases/compss/latest/docs/COMPSs_User_Manual_App_Development.pdf
More information about how to define fine-grain tasks and other considerations when implementing a fine-grain task workflow can be found in https://pm.bsc.es/ompss-docs/specs/
To compile the application use the compss_build_app script. The usage of this command is the following
$ compss_build_app --help
Usage: /opt/COMPSs/Runtime/scripts/user/compss_build_app [options] application_name application_arguments
* Options:
General:
--help, -h Print this help message
--opts Show available options
--version, -v Print COMPSs version
General Options:
--only-master Builds only the master part
Default: Disabled
--only-worker Builds only the worker part
Default: Disabled
Tools enablers:
--ompss Enables worker compilation with OmpSs Mercurium compiler
Default: Disabled
--cuda Enables worker compilation with CUDA compilation flags
Default: Disabled
--opencl Enables worker compilation with openCL compilation flags
Default: Disabled
Specific compiler and linker flags:
--CXX=<C++ compiler> Defines an specific C++ compiler (cross_compiling)
Default:
--CC=<C compiler> Defines and specific C compiler (cross_compiling)
Default:
--CFLAGS="-cFlag_1 ... -cFlag_N" Defines C compiler flags
Default:
--CXXFLAGS="-cxxflag_1 ... -cxxflag_N" Defines C++ compiler flags
Default:
--CPPFLAGS="-cppflag_1 ... -cppflag_N" Defines C pre-processor flags
Default:
--LDFLAGS="-ldflag_1 ... -ldflag_N" Defines Linker flags
Default:
--LIBS="-L<libPath> -l<lib> <stLib.a>" Define libraries in the compilation
Default: -lpthread /home/jorgee/Shared/Projects/Tango/apps/libjpeg/lib64/libjpeg.a
Specific tools flags:
--MCC="Mercurium C compiler" Specifies the mercurium C compiler profile (cross-compiling OmpSs)
Default: mcc
--MCXX="Mercurium C++ compiler" Specifies the mercurium compiler profile (cross-compiling OmpSs)
Default: mcxx
--with_ompss=<ompss_installation_path> Enables worker compilation with OmpSs Mercurium compiler installed in a certain location
Default: Disabled
--mercurium_flags="flags" Specifies extra flags to pass to the mercurium compiler
Default: Empty
--with_cuda=<cuda_installation_path> Enables worker compilation with CUDA installed in a certain location
Default: Disabled
--with_opencl=<ocl_installation_path> Enables worker compilation with openCL installed in a certain location
Default: Disabled
--opencl_libs="libs" Specifies extra opencl libraries locations
Default: Empty
To compile with only COMPSs application use the following command:
$ compss_build_app appName
To compile with a COMPSs application with fine-grain managed by OmpSs tasks use the following command:
$ compss_build_app --ompss appName
If there are fine-grain tasks which can be run in CUDA devices use the following command:
$ compss_build_app --ompss --cuda appName
An application implemented with the TANGO programming model can be easily executed by using the COMPSs execution scripts. It will automatically starts the Runtime Abstraction Layer and execute transparently either coarse-grain and fine-grain tasks in the selected resources.
Users can use the runcompss script to run the application in interactive nodes.
Usage: runcompss [options] application_name application_arguments
An example to run the application in the localhost (interesting for initial debugging)
$ runcompss --lang=c appName appArgs...
To run an application in a preconfigured grid of computers you have to provide the resource description in a resources.xml file and the application configuration in these resources in the project.xml. Information about how to define this files can be found in http://compss.bsc.es/releases/compss/latest/docs/COMPSs_User_Manual_App_Exec.pdf
runcompss --lang=c --project=/path/to/project.xml --resources=/path/to/resources.xml appName app_args
More information about other possible arguments can be found by executing
$ runcompss --help
To queue an application in a cluster managed by the SLURM resource manager, users has to use the enqueue_compss command.
Usage: enqueue_compss [queue_system_options] [rucompss_options] application_name application_arguments
The following command show how to queue the application by requesting 3 nodes with at least 12 cores, 2 gpus and 32GB of memory (approx.)
$ enqueue_compss --num_nodes=3 --cpus-per-node=12 --gpus-per-node=2 --node-memory=32000 --lang=c appName appArgs
Other options available for enqueue_compss can be found by executing
$ enqueue_compss --help
To run applications distributed in singularity containers, the user have to specify the image of this container with the container_image flag when submitting the application with the enqueue_compss command. The following lines show an example of this command:
$ enqueue_compss --num_nodes=3 --cpus-per-node=12 --gpus-per-node=2 \
--node-memory=32000 –container_image=/path/to/container.img \
--lang=c appName appArgs
When submitting an execution users must indicate the number of fixed nodes requested in the enqueue_compss command. To enable node elasticity users should indicate with the –elasticity flags, the number of extra nodes that an application can request according to its load. An example of this option is shown below.
$ enqueue_compss --num_nodes=3 --cpus-per-node=12 --gpus-per-node=2 \
--node-memory=32000 --elasticity=2 --lang=c appName appArgs
In this case, the runtime automatically will calculate the optimal number of resources between 3 and 5. This option can be also be be combined with singularity by setting the elasticity and container_image options.
$ enqueue_compss --num_nodes=3 --cpus-per-node=12 --gpus-per-node=2 \
--node-memory=32000 --elasticity=2 \
--container_image=/path/to/container.img \
--lang=c appName appArgs
In the case that, users want to let another component such as the Self-Adaptation Manager the responsibility to scale up/down the resources, the application must be submitted with the enable_external_adaptation option of the enqueue_compss command as indicated in the example below.
$ enqueue_compss --num_nodes=3 --cpus-per-node=12 --gpus-per-node=2 \
--node-memory=32000 --elasticity=2 --enable_external_adaptation=true \
--container_image=/path/to/container.img --lang=c appName appArgs
Then, when the application is running, the adaptation of the nodes can be performed by means of the adapt_compss_resources command. The usage of this command depends on the underlying infrastructure manager
In the case where the computing infrastructure is managed by Slurm, the adapt_compss_resources must be invoked in the following way
$ adapt_compss_resources <master_node> <master_job_id> CREATE SLURM-Cluster default <singularity_image>
This command will submit another job requesting a new resource of type "default" (the same as the requested in the enqueue_compss) running the COMPSs worker of the singularity_image.
$ adapt_compss_resources <master_node> <master_job_id> REMOVE SLURM-Cluster <node_to_delete>
The same command can be used event without Slurm. In this case, the application will be started with runcompss with the flag --enable_external_adaptation=true and a project and resources file with a Direct Cloud conector defined. This conector will provide the capability of starting and stoping working nodes according to the adapt_compss_resources with the following arguments.
$ adapt_compss_resources <master_node> <AppName_execNum> CREATE Direct <node_to_start> default
The same changes apply to the remove command:
$ adapt_compss_resources <master_node> <AppName_execNum> REMOVE SLURM-Cluster <node_to_delete>
The following lines show an example of direct configuration in the resources.xml where we describe all the resources available in the insfrastructure.
<CloudProvider Name="Direct">
<Endpoint>
<Server></Server>
<ConnectorJar>direct-conn.jar</ConnectorJar>
<ConnectorClass>es.bsc.conn.direct.Direct</ConnectorClass>
</Endpoint>
<Images>
<Image Name="default">
<CreationTime>10</CreationTime>
<Software>
<Application>JAVA</Application>
<Application>PYTHON</Application>
<Application>COMPSS</Application>
</Software>
<Adaptors>
<Adaptor Name="es.bsc.compss.nio.master.NIOAdaptor">
<SubmissionSystem>
<Interactive/>
</SubmissionSystem>
<Ports>
<MinPort>43001</MinPort>
<MaxPort>43002</MaxPort>
</Ports>
</Adaptor>
</Adaptors>
<SharedDisks>
<AttachedDisk Name="gpfs">
<MountPoint>/</MountPoint>
</AttachedDisk>
</SharedDisks>
</Image>
</Images>
<InstanceTypes>
<InstanceType Name="ns51">
<Processor Name="MainProcessor">
<ComputingUnits>12</ComputingUnits>
<Architecture>Intel</Architecture>
<Speed>2.6</Speed>
</Processor>
<Processor Name="GPU">
<Type>GPU</Type>
<ComputingUnits>2</ComputingUnits>
<Architecture>k80</Architecture>
<Speed>2.6</Speed>
</Processor>
<Memory>
<Size>96</Size>
</Memory>
</InstanceType>
...
</InstanceTypes>
</CloudProvider>
In the project.xml, we specify the resource we can add or remove in each execution. An example of project.xml is shown in the following lines
<Cloud>
<InitialVMs>1</InitialVMs>
<MinimumVMs>1</MinimumVMs>
<MaximumVMs>4</MaximumVMs>
<CloudProvider Name="Direct">
<LimitOfVMs>4</LimitOfVMs>
<Properties>
<Property>
<Name>estimated-creation-time</Name>
<Value>10</Value>
</Property>
</Properties>
<Images>
<Image Name="default">
<InstallDir>/home_nfs/home_ejarquej/installations/2.3.1/COMPSs/</InstallDir>
<WorkingDir>/tmp/compss_worker/</WorkingDir>
<Application>
<AppDir>/home_nfs/home_ejarquej/emulate_remote_processing</AppDir>
</Application>
</Image>
</Images>
<InstanceTypes>
<InstanceType Name="ns51"/>
<InstanceType Name="ns55"/>
<InstanceType Name="ns56"/>
<InstanceType Name="ns57"/>
</InstanceTypes>
</CloudProvider>
</Cloud>
During Y3, we have introduced a set of scripts to automate the build and execution of applications distributed across multiple architectures. To build the application for multiple architectures, we have created the compss_build_app_multi_arch command. In this command, the user specifies the architecture for the different application components as indicated below.
Usage: compss_build_app_multi_arch [options] application_name
* Options:
General:
--help, -h Prints this help message
--opts Shows available options
--supported_arch Shows supported architectures
--master=arch1 Specifies the target architecture for which
the master is going to be build.
Default: x86_64-suse-linux
--worker=<arch1,arch2,...> Specifies the target architectures for which
the worker is going to be build. Each
architecture separated by ",".
Default: x86_64-suse-linux
--only-master Compiles only the master for the specified
architectures.
--only-worker Compiles only the worker for the specified
architectures.
--cfg=<path> Specifies the location of the configuration
file that contains the environment to execute
when cross-compiling the application.
In this BASH script functions named as the
target architecture are needed to set up the
environment.
Default: /opt/COMPSs/Bindings/c/cfgs/compssrc
The following lines show an example where the master is compiled for an Intel architecture and the worker is compiled for an Intel and ARM 32bit architectures.
$> compss_build_app_multi_arch --master=x86_64-linux-gnu --worker=x86_64-linux-gnu,arm-linux-gnueabihf Matmul
More architectures can be managed by specifying a configuration file where different environment variables are defined to configure the cross-compilation for the different architectures. This configuration files is specified by the –cfg flag. The following lines show an example of how to define the cross-compilation and how to indicate the configuration flag in the build command.
$> cat arm_cc.cfg
arm-linux-gnueabihf () {
export CC=arm-linux-gnueabihf-gcc
export CXX=arm-linux-gnueabihf-g++
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-armhf
export BOOST_LIB=/opt/install-arm/libboost
export TARGET_HOST=arm-linux-gnueabihf
}
$> compss_build_app_multi_arch --master=x86_64-linux-gnu --worker=x86_64-linux-gnu,arm-linux-gnueabihf –-cfg=arm_cc.cfg Matmul
A similar extension has been added to the enqueue_compss command to enable an execution with a heterogeneous reservation. To define it, we have to run the enqueue_compss command with the –heterogeneous flag and indicating the type of resource you require for master and workers. An example of how to use this extension is shown below.
$ cat types.cfg
type_1(){
cpus_per_node=16
gpus_per_node=2
node_memory=96
}
type_2(){
cpus_per_node=48
gpus_per_node=0
node_memory=128
}
$enqueue_compss --heterogeneous --master=type_1 --workers=type_1:2,type_2:2 --types_cfg=types.cfg --lang=c appName appArgs
C++ Objects declared as arguments in a coarse-grain tasks must be passed in the task methods as object pointers in order to have a proper dependency management. We are evaluating how to support other possibilities.
TANGO Programming Model and the Runtime Abstraction Layer components must be used together as explained above. It can be combined with other TANGO components to achive more complex features:
- ALDE - TANGO Programming Model and Runtime will use ALDE to submit the code for compilation and packetization. Also it could be intereact with ALDE to submit the application directly to a TANGO compatible device supervisor.
- Device Supervisor - The TANGO Programming Model Runtime can interact directly with SLURM which is one TANGO Device Supervisor. Other device supervisors will be supported by means of ALDE.
- Self-Adaptation Manager - The TANGO Programming Model Runtime can interact with the Self-Adaptation Manager to optimize application execution in a TANGO compatible testbed.
- Monitor Infrastructure - The TANGO Programming Model Runtime can interact with TANGO Monitor Infrastructure to obtain the energy consumption metrics in order to take runtime optimization decisions.