-
Notifications
You must be signed in to change notification settings - Fork 432
UCX is a framework (collection of libraries and interfaces) that provides efficient and relatively easy way to construct widely used HPC protocols: MPI tag matching, RMA operations, rendezvous protocols, stream, fragmentation, remote atomic operations, etc.
Please join our mailing list: https://elist.ornl.gov/mailman/listinfo/ucx-group
-
UCT is a transport layer that abstracts the differences across various hardware architectures and provides a low-level API that enables the implementation of communication protocols. The primary goal of the layer is to provide direct and efficient access to hardware network resources with minimal software overhead. For this purpose UCT relies on low-level drivers provided by vendors such as InfiniBand Verbs, Cray’s uGNI, libfabrics, etc. In addition, the layer provides constructs for communication context management (thread-based and ap- plication level), and allocation and management of device- specific memories including those found in accelerators. In terms of communication APIs, UCT defines interfaces for immediate (short), buffered copy-and-send (bcopy), and zero- copy (zcopy) communication operations. The short operations are optimized for small messages that can be posted and completed in place. The bcopy operations are optimized for medium size messages that are typically sent through a so- called bouncing-buffer. Finally, the zcopy operations expose zero-copy memory-to-memory communication semantics.
-
UCP implements higher-level protocols that are typically used by message passing (MPI) and PGAS programming models by using lower-level capabilities exposed through the UCT layer. UCP is responsible for the following functionality: initialization of the library, selection of transports for communication, message fragmentation, and multi-rail communication. Currently, the API has the following classes of interfaces: Initialization, Remote Memory Access (RMA) communication, Atomic Memory Operations (AMO), Active Message, Tag-Matching, and Collectives.
-
UCS is a service layer that provides the necessary func- tionality for implementing portable and efficient utilities.
-
Open source framework supported by vendors
The UCX framework is maintained and supported by hardware vendors in addition to the open source community. Every pull-request is tested and multiple hardware platforms supported by vendors community. -
Performance, performance, performance…
The framework design, data structures, and components are design to provide highly optimized access to the network hardware. -
High level API for a broad range HPC programming models.
UCX provides a high level API implemented in software 'UCP' to fill in the gaps across interconnects. This allows to use a single set of APIs in a library to implement multiple interconnects. This reduces the level of complexities when implementing libraries such as Open MPI or OpenSHMEM. Because of this, UCX performance portable because a single implementation (in Open MPI or OpenSHMEM) will work efficiently on multiple interconnects. (e.g. uGNI, Verbs, libfabrics, etc). -
GPU Support
-
Support for interaction between multiple transports (or providers) to deliver messages.
For example, UCX has the logic (in UCP) to make 'GPUDirect', IB' and share memory work together efficiently to deliver the data where is needed without the user dealing with this. -
Cross-transport multi-rail capabilities
UCP implements RMA put/get, send/receive with tag matching, Active messages, atomic operations. In near future we plan to add support for commonly used collective operations.
No. GASNET exposes high level API for PGAS programming management that provides symmetric memory management capabilities and build in runtime environments. These capabilities are out of scope of UCX project. Instead, GASNET can leverage UCX framework for fast end efficient implementation of GASNET for the network technologies support by UCX.
UCX framework does not provide drivers, instead it relies on the drivers provided by vendors. Currently we use: OFA VERBs, Cray's UGNI, NVIDIA CUDA.
UCX, is a middleware communication layer that relies on vendors provided user level drivers including OFA Verbs or libfabrics (or any other drivers provided by another communities or vendors) to implement high-level protocols which can be used to close functionality gaps between various vendors drivers including various libfabrics providers: coordination across various drivers, multi-rail capabilities, software based RMA, AMOs, tag-matching for transports and drivers that do not support such capabilities natively.
No. Typically, Drivers aim to expose fine-grain access to the network architecture specific features. UCX abstracts the differences across various drivers and fill-in the gaps using software protocols for some of the architectures that don't provide hardware level support for all the operations.
UCX does not depend on an external runtime environment.
ucx_perftest
(UCX based application/benchmark) can be linked with an external runtime environment that can be used for remote ucx_perftest
launch, but this an optional configuration which is only used for environments that do not provide direct access to compute nodes. By default this option is disabled.
See How to install UCX and OpenMPI
Yes! In order to enable multi-rail, need to set following environment variables:
- UCX_MAX_EAGER_RAILS=[number of rails to use for eager protocol]
- UCX_MAX_RNDV_RAILS=[number of rails to use for rendezvous protocol]
- UCX_NET_DEVICES=[comma-separated list of network devices to use, syntax device:port]
Example:
mpirun -x UCX_MAX_EAGER_RAILS=2 -x UCX_MAX_RNDV_RAILS=3 -x UCX_NET_DEVICES=mlx5_0:1,mlx5_1:1 ...
- Fork
- Fix bug or implement a new feature
- Open Pull Request