Metadata-Version: 2.1
Name: cascading-shell
Version: 1.1
Home-page: https://github.com/janoliver/cash
Author: Jan Oliver Oelerich
Author-email: janoliver@oelerich.org
License: GPL
Keywords: shell,hpc,computer cluster
Classifier: Environment :: Console
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Natural Language :: English
Classifier: Operating System :: Unix
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: System :: Distributed Computing
Classifier: Topic :: System :: Shells
Classifier: Topic :: System :: Systems Administration
Classifier: Topic :: Utilities
Description-Content-Type: text/markdown
Provides-Extra: colored_output
Requires-Dist: termcolor ; extra == 'colored_output'

# CASH - Cascading Shell

CASH is a utility for administrators of large computer clusters to quickly run shell commands 
on all or a subset of the cluster nodes. CASH generates a cascading, or tree-like topology of the nodes,
and is therefore much faster than other tools that simply iterate the nodes or try to access many nodes in parallel.

CASH is supposed to be run from the administrator's machine, but may also be run from one of the cluster nodes. In
the first case, all communication between the computer cluster and the admin machine is channelled over a gateway host.

Please see below for the execution/communication model.

## Requirements

CASH has the following requirements:

- python > 3.6 on each node
- password-less SSH access to and between all nodes

## Setup

Please run `pip install cascading-shell` and use the `cash` command line tool on the admin machine. Then, configure your
cluster(s). Nodes and nodegroups are configured in `~/.cash.topo.json` like this:

```json
{
  "nodes": {
    "group1": "clus1node001,clus1node002,clus1node003",
    "all": {
      "site1": {
        "cluster1": {
          "rack1": "clus1node[001-020]",
          "rack2": "clus1node[021-040]",
          "rack3": "clus1node[041-060]"
        },
        "cluster2": {
          "rack1": "clus2node[001-020]",
          "rack2": "clus2node[021-040]",
          "rack3": "clus2node[041-060]"
        }
      },
      "site2": {
        "cluster3": {
          "rack1": "clus3node[001-020]",
          "rack2": "clus3node[021-040]",
          "rack3": "clus3node[041-060]"
        },
        "cluster4": {
          "rack1": "clus4node[001-020]",
          "rack2": "clus4node[021-040]",
          "rack3": "clus4node[041-060]"
        },
        "cluster5": "clus5node[001-020]"
      }
    }
  }
}
```

The config file has the following rules:

- Right now, everything lives under the `nodes` object. 
- The file format is standard JSON, where each key is a group name and each value is a comma separated list of nodes.
- Nodes with sequential numbers can be shortened using square brackets, e.g., `node[001-003]` resolves
  to `node001,node002,node003`. Be careful with leading zeros here! You may also use a comma here, such as: 
  `node[001-003,005]` -> `node001,node002,node003,node005`. You can also use multiple bracket instances: 
  `clus[1-3]node[001-003]` -> `clus1node001,clus1node002,clus1node003,clus2node001,clus2node002,clus2node003,clus3node001,clus3node002,clus3node003` and so on.
- Groups can be nested. The topology of the node tree is specified in the mandatory `all` group. It is wise to reflect 
  network latency/bandwidth in the tree; for instance, as in the above example, you may divide your HPC into groups of 
  site, cluster, rack if applicable. 
- Aside from `all`, you can specify as many groups as you wish and nest them to your liking. 

## Cascading communication model

CASH communicates with each node in a cascading fashion, where CASH itself on each node acts as a proxy for its 
immediate children and forwards all messages from the children to its parent and vice versa. Let's try to understand 
this with an example. Imaging the following topology configuration:

```json
{
  "nodes": {
    "all": {
      "site1": {
        "cluster1": {
          "rack1": "clus1node[1-3]",
          "rack2": "clus1node[4-6]"
        },
        "cluster2": {
          "rack1": "clus2node[1-3]",
          "rack2": "clus2node[4-6]"
        }
      },
      "site2": {
        "cluster3": {
          "rack1": "clus3node[1-3]",
          "rack2": "clus3node[4-6]"
        },
        "cluster4": {
          "rack1": "clus4node[1-3]",
          "rack2": "clus4node[4-6]"
        }
      }
    }
  }
}
```

We have a total of four clusters in two geographical sites, each cluster has two racks with three nodes each. We now 
want to execute a command on all nodes using CASH. First, CASH spawns an instance of itself on the gateway host, that
can be specified via the `DEFAULT_JUMP_HOST` variable or via the command line parameter `--jump-host`. From the gateway,
a connection to the first host of `site1` and the first host of `site2` is established, i.e., `clus1node1` and 
`clus3node1`. From each of those two nodes, CASH hops to the first node of each cluster (e.g., `clus1node2` for 
`cluster1`, as `clus1node1` was already used, and `clus2node1`), from there to the first 
node of each rack, and then to the remaining nodes. 

For example, `clus4node5` is reached in the following way: 
`ADMIN_MACHINE -> gateway -> clus3node1 (site) -> clus4node1 (cluster) -> clus4node4 (rack) -> clus4node5 (node)`. This 
tiered or cascading execution model of course makes sense only for a larger number of nodes than in this example. You
can tell CASH to use a flat instead of cascading connection model with the `--flatten` parameter. 

The number of parallel connections on each node is limited by the `--fan-size` parameter (env `DEFAULT_FANSIZE = 50`).
When more that FANSIZE nodes are direct children of one node, they are grouped by FANSIZE and an additional layer is 
formed. 

Every node that is part of the tree receives and forwards messages from/to its parent and its children, and also 
executes the desired shell command locally. 

## Usage

Here is a copy of `cash --help`:

```
usage: cash [-h] [-n NODES] [--jumphost JUMPHOST] [--ssh-timeout SSH_TIMEOUT]
            [-s FANSIZE] [--flatten] [-p] [--json | --shell | --quiet]
            {run,plan} ...

positional arguments:
  {run,plan}            Please use one of the following sub commands
    run                 Run command
    plan                Print tree as json to stdout (view with, e.g.,
                        firefox)

optional arguments:
  -h, --help            show this help message and exit
  -n NODES, --nodes NODES
                        Node or node groups.
  --jumphost JUMPHOST   Gateway host to cluster.
  --ssh-timeout SSH_TIMEOUT
                        Define a timeout for SSH sessions. 0 = no timeout
  -s FANSIZE, --fansize FANSIZE
                        Maximum number of parallel SSH sessions.
  --flatten             Disable tree mode.
  -p, --progress        Show progress of received answers
  --json                JSON output format
  --shell               Shell friendly output format
  --quiet               No output
```

- Node groups can be specified with `@group_name` in the `--nodes` parameter. 
- You can exclude hosts by using `-n "@group,-node01"`.
- You can use the square bracket syntax here, too: `-n "node[1-9]"`.

You can specify the defaults of the CLI parameter via the following environment variables:

```python
DEFAULT_SSH_TIMEOUT = 30
DEFAULT_FANSIZE = 50
DEFAULT_NODES_STRING = "@all"
DEFAULT_OUT_FORMAT = "text"
DEFAULT_JUMP_HOST = "jumphost"
DEFAULT_FLATTEN = False
DEFAULT_RUN_SHELL = True
```
