Skip to content

Latest commit

 

History

History
198 lines (148 loc) · 18.7 KB

README.md

File metadata and controls

198 lines (148 loc) · 18.7 KB

nimplex

Static Badge License: MIT Nimble Package Arxiv npj Unconventional Computing

(Linux) Grid, Graph, and CLI Tests (MaxOS) Grid, Graph, and CLI Tests (Windows) Grid, Graph, and CLI Tests

NIM simPLEX: A concise high-performance scientific Nim library (with CLI and Python binding) providing samplings, uniform grids, traversal graphs, and more in compositional (simplex) spaces, where traditional methods designed for euclidean spaces fail or otherwise become impractical.

Such spaces are considered when an entity can be split into a set of distinct components (a composition), and they play a critical role in many disciplines of science, engineering, and mathematics. For instance, in materials science, chemical composition refers to the way a material (or, more generally, matter) is split into distinct components, such as chemical elements, based on considerations such as fraction of atoms, occupied volume, or contributed mass. And in economics, portfolio composition may refer to how finite capital is split across assets, such as cash, equity instruments, real estate, and commodities, based on their monetary value.

Quick Start

If you have a GitHub account, you can get started with nimplex very quickly by just clicking the button below to launch a CodeSpaces environment with everything installed (per instructions in Reproducible Installation section) and ready to go! From there, you can either use the CLI tool (as explained in CLI section) or import the library in Python (as explained in Usage in Python section) and start using it right away. Of course, it also comes with a full Nim compiler and VSCode IDE extensions for Nim, so you can efortlessely modify/extend the source code and re-compile it if you wish.

Open in GitHub Codespaces

Installation

There are several easy ways to quickly get nimplex up and running on your system. The choice depends primarily on your preffered way of interacting with the library (CLI, Nim, or Python) and your system configuration.

Reproducible Installation (recommended)

The recommended way is compiling the library yourself, which may sound scary but is fairly easily and the whole process should not take more than a couple of minutes.

Nim (compiler)

First, you need to install Nim language compiler which on most Unix (Linux/MacOS) systems is very straightforward.

  • On MacOS, assuming you have Homebrew installed, simply:

    brew install nim
  • Using conda, miniconda, mamba, or micromamba cross-platform package manager:

    conda install -c conda-forge nim
  • On most Linux distributions, you should also be able to use your built-in package manager like pacman, apt, yum, or rpm; however, the default channel/repository, especially on enterprise systems, may have an unsupported version (nim<2.0). While we do test nimCSO with 1.6 versions too, your experience may be degraded, so you may want to update it or go with another option.

  • You can, of course, also build it yourself from nim source code! It is relatively straightforward and fast compared to many other languages.

On Windows, you may consider using WSL, i.e., Windows Subsystem for Linux, which is strongly recommended, interplays well with VS Code, and will let you act as if you were on Linux. If you need to use Windows directly, you can follow these installation instructions.

nimplex

Then, you can use the bundled Nimble tool (package manager for Nim, similar to Rust's crate or Python's pip) to install two top-level nim dependencies:

  • arraymancer, which is a powerful N-dimensional array library
  • nimpy which helps with the Python bindings.

It's a single command:

nimble install --depsOnly

or, explicitly:

nimble install -y arraymancer nimpy

Finally, you can clone the repository and compile nimplex with:

git clone https://github.com/amkrajewski/nimplex
cd nimplex
nim c -r -d:release nimplex.nim --benchmark

which will compile the library and run a few benchmarks to make sure everything runs smoothly. You should then see a compiled binary file nimplex in the current directory which exposes the CLI tool.

If you want to use the Python bindings, you can now compile the library with slightly different flags (depending on your system configuration) like so for Linux/MacOS:

nim c --d:release --threads:on --app:lib --out:nimplex.so nimplex

and you should see a compiled library file nimplex.so in the current directory which can be immediately imported and used in Python as explained later. For Windows and other platforms, consult nimpy documentation on what flags and formats should be used.

Pre-Compiled Binaries (quick but not recommended)

If you happen to be on one of the common systems (for which we auto-compile the binaries) and you do not need to modify anything in the source code, there is a good chance you can simply download the latest release from the nimplex GitHub repository and run the executable (nimplex / nimplex.exe) or Python library (nimplex.so / nimplex.pyd) directly just by placing it in your working directory and using it as:

  1. An interactive command line interface (CLI) tool, which will guide you through how to use it if you run it without any arguments like so (on Linux/MacOS):
    ./nimplex   
    or with a concise configuration defining the task type and parameters (explained later in Usage in Nim):
    ./nimplex -c IFP 3 10
  2. An compiled Python library for Unix, which you can import and use in your Python code like so:
    import nimplex
    and immediately use the functions provided by the library, as described in Usage in Python:
    nimplex.simplex_internal_grid_fractional(dim=3, ndiv=10)

Capabilities

Note: Full technical discussion of methods and motivations is provided in the manuscript. The sections below are meant to provide a concise overview of the library's capabilities.

The library provides a growing number of methods specific to compositional (simplex) spaces:

  1. Monte Carlo sampling is the simplest method conceptually, where points are rendomly sampled from a simplex. In low dimensional cases, this can be accomplished by sampling from a uniform distribution in (d-1)-Cartesian space and then rejecting points outside the simplex (left panel below). However, in this approach, the inefficiency growth is factorial with the dimensionality of the simplex space. Instead, some try to sample from a uniform distribution in (d)-Cartesian space and normalize the points to sum to 1, however, this leads to over-sampling in the center of each simplex dimension (middle panel below).

    One can, however, fairly easily sample from a special case of Dirichlet distribution, as explained in the manuscript, which leads to uniform sampling in the simplex space (right panel below). Nimplex can sample around 10M points per second in 9-dimensional space on a modern CPU.

    Monte Carlo Sampling
  2. Simplex / Compositional Grids are a more structured approach to sampling, where all possible compositions quantized to a given resolution, like 1% for 100 divisions per dimension, are generated. This is useful for example when one wants to map a function over the simplex space. In total N_S(d, n_d) = \binom{d-1+n_d}{d-1} = \binom{d-1+n_d}{n_d} are generated, where d is the dimensionality of the simplex space and n_d is the number of divisions per dimension. Nimplex uses a modified version of NEXCOM algorithm to do that procedurally (see manuscript for details) and can generate around 5M points per second in 9-dimensional space on a modern CPU. A choice is given between generating the gird as a list of integer numbers of quantum units (left panel below) or as a list of fractional positions (right panel below). Integer and Fractional Simplex Grids in Ternary Space

  3. Internal Simplex / Compositional Grids are a modification of the above method, where only points inside the simplex, i.e. all components are present, are generated. This is useful in cases where, one cannot discard any component entirely, for instance, because manufacturing setup has minimum feed rate (leakage). Nimplex introduces a new algorithm to generate these points procedurally (see manuscript for details) based on further modification of NEXCOM algorithm.

    In total N_I(d, n_d) = \binom{n_d-1}{d-1} are generated, critically without any performance penalty compared to the full grid, which can reach orders of magnitude when d approaches n_d. Similar to the full grid, a choice is given between generating the gird as a list of integer numbers of quantum units or as a list of fractional positions.

  4. Simplex / Compositional Graphs generation is the most critical capability, first introduced in the nimplex manuscript. They are created by using combinatorics and disocvered patterns to assign edges between all neighboring nodes during the simplex grid (graph nodes) generation process. Effectively, a traversal graph is generated, spanning all possible compositions (given a resolution) creating an extremely efficient representation of the problem space, which allows deployment of numerous graph algorithms.

    Simplex Graph for Ternary Space

    Critically, unlike the O(N^2) distance-based graph generation methods, this approach scales linearly with the resulting number of nodes. Because of that, it is extremely efficient even in high-dimensional spaces, where the number of edges goes into trillions and beyond. Nimplex can both generate and find neighbors for around 2M points per second in 9-dimensional space on a modern CPU.

    As explored in the manuscript, such representations, even of different dimensions, can can then be used to efficeintly encode complex problem spaces where some prior assumptions and knowledge are available. In the Example #2 from manuscript, inspired by problem of joining titanium with stainless steel in 10.1016/j.addma.2022.102649 using 3-component spaces, one encode 3 separate paths where some components are shared in predetermined fashion. This to efficiently encode the problem space in form of a structure graph (left panel below) and then use it to construct a single simplex graph complex (right panel below) as a single consistent structure.

    Simplex Graph Complex

    With such graph representation, one can very easily deploy any scientific library for graph exploration, constrained and biased by models operating in the elemental space mapping nimplex provides. A neat and concise demonstration of this is provided in the 02.AdditiveManufacturingPathPlanning.ipynb under examples directory, where thermodynamic phase stability models constrain a 4-component (tetrahedral) design space existing in 7-component chemical space and property model related to yield strength (RMSAD) is used to bias designed paths towards objectives like property maximization or gradient minimization with extremely concise code simply modifying the weights on unidirectional edges in the graph. For instance, the figure below (approximately) depicts the shortest path through a subset of tetrahedron formed by solid solution phases, later stretched in space proportionally to RMDAS gradient magnitude.

    Gradient Magnitude Stretched Graph with Shortest Path

Several other methods are in testing and will likely be added in the future releases. If you have any suggestions, please open an issue on GitHub as we are always soliciting new ideas and use cases based on real-world problems in the scientific computing community.

Usage in Nim

Usage within Nim is fairly straightforward. You can install it using Nimble as explained earlier, or install it directly from GitHub, making sure to use the slightly modified @#nimble branch:

nimble install -y https://github.com/amkrajewski/nimplex@#nimble

or, if you wish to modify the source code, you can simply download the core file nimplex.nim and place it in your own code, as long as you have the dependencies installed, since it is standalone. Then simply follow the API documentation (amkrajewski.github.io/nimplex) which goes over all core functions and extra utilities like nimplex/utils/plotting and nimplex/utils/stitching.

Usage in Python

To use the library in Python, you can interact with it just like any other Python library. All input/output types are native Python types, so no additional conversion is necessary!. Once you have the library installed and imported, simply follow the API documentation, with an exception that you need to add _py to the function names. If you happen to forget adding _py, the Python interpreter will throw an error with a suggestion to do so. A couple of additional conveninece functions are listed under nimplex/#usage-in-python.

CLI

Interactive

Using Nimplex through the CLI relies on the same core library, but provides a simple interface for users who do not want to write any code. It can be used interactively, where the user is guided through the configuration process by just running the executable without any arguments:

./nimplex

Configured

Or it can be run with a concise configuration defining the task type and parameters. The configuration is a 3-letter string and 2-3 additional parameters, as explained below.

  • 3-letter configuration:
    1. Grid type or uniform random sampling:
      • F: Full grid (including the simplex boundary)
      • I: Internal grid (only points inside the simplex)
      • R: Random/Monte Carlo uniform sampling over simplex.
      • G: Graph (list of grid nodes and list of their neighbors)
    2. Fractional or Integer positions:
      • F: Fractional grid/graph (points are normalized to fractions of 1)
      • I: Integer grid/graph (points are integers)
    3. Print full result, its shape, or persist in a file:
      • P: Print (presents full result as a table)
      • S: Shape (only the shape / size information)
      • N: Persist to NumPy array file ("nimplex_.npy" or optionally a custom path as an additonal argument)
  • Simplex Dimensions / N of Components: An integer number of components in the simplex space.
  • N Divisions per Dimension / N of Samples: An integer number of either:
    1. Divisions per each simplex dimension for grid or graph tasks (F/I/G__)
    2. Number of samples for random sampling tasks (R__)
  • (optional) NumPy Array Output Filename: A custom path to the output NumPy array file (only for __N tasks).

For instance, to generate a 3-dimensional internal fractional grid with 10 divisions per dimension and persist it to a NumPy array file, you can run:

./nimplex -c IFN 3 10

and the output will be saved to nimplex_IF_3_10.npy in the current directory. If you want to save it to a different path, you can provide it as an additional argument:

./nimplex -c IFN 3 10 path/to/outfile.npy

Or if you want to print the full result to the console, allowing you to pipe it to virtually any other language or tool as plain text, you can run:

./nimplex -c IFP 3 10

Auxiliary Flags

You can also utilize the following auxiliary flags:

  • --help or -h --> Show help.
  • --benchmark or -b --> Run a set of tasks to benchmark performnace (simplex_grid(9, 12), simplex_internal_grid(9, 12), simplex_sampling_mc(9, 1_000_000), simplex_graph(9, 12)) and compare performance across implementations (simplex_graph(3, 1000) vs simplex_graph_3C(1000)).