Metadata-Version: 2.1
Name: ayeaye-fossa
Version: 0.0.1
Summary: Execution engine for Aye-Aye ETL models
Home-page: https://github.com/Aye-Aye-Dev/Fossa
Author: Si Parker
Author-email: ayeaye@plogic.co.uk
Project-URL: Bug Tracker, https://github.com/Aye-Aye-Dev/Fossa/issues
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE

# fossa
Execution engine for Aye-Aye ETL models

## Overview

Fossa runs Aye-aye models and their subtasks across a distributed environment.

Aye-aye models can be run without Fossa (as they are just Python code) but when a model grows too big to execute on a single computer the task needs to be spread across multiple computers. A 'distributed environment' is one with multiple compute node that are networked so messages can be passed between nodes.

An instance of Fossa runs on each compute node where it facilitates the communication of messages between nodes.

A node could be a docker or full computer instance.


## Getting Started

Ensure your working directory is the same directory as this README file.

Then install dependencies and run the tests-

```shell
cp local_env_example .env
pipenv shell
pipenv install --dev
python -m unittest discover tests
```

The `.env` file is used by pipenv.

For all python commands below you will need to be in this pipenv shell.


## Running Fossa locally

In a distributed environment one instance of Fossa would run on each compute node. To experiment with Fossa just run one or more instances on a local computer.

Fossa runs a small web-server app which can be used to submit tasks and query the progress and status of tasks. In production, jobs are more likely to be fetched from a message queue.

Copy the example config file into your own person config; sym-link to `local_config.py` so the `run_local_app()` function in `fossa.app` can find it.

e.g.

```
cd fossa/settings
# replace xxxxx with your name or a more useful identifier for your environment
cp local_config_example.py local_config_xxxxx.py
# have a look in your config file. Is there anything you'd like to change to fit with your system?
ln -s local_config_xxxxx.py local_config.py
```

In the virtual env (provided by pipenv) from above and with the current working directory being the project's root directory-

```
python fossa/app.py
```

You'll now have a locally running web app. It will output IP addresses it is accepting connections from. Typically just point a browser at `http://0.0.0.0:2345/'


### Posting a task

In a production environment tasks are more likely to arrive through a message queue. But it's also possible to use an HTTP POST to submit a task.

If you used the `local_config_example.py` file as a starting point for your local config it will have a single model already in the `ACCEPTED_MODEL_CLASSES` parameter. This is a tiny example ETL model.

Fossa will only run models that have been pre-defined before start-up. A simple way to supply these models is through the config file but there are other ways too.

POST to your local instance of Fossa a task specification. This example runs the complete `SimpleExampleEtl` model-

```shell
curl --header "Content-Type: application/json" \
     --data '{"model_class":"SimpleExampleEtl"}'  \
     --request POST http://0.0.0.0:2345/api/0.01/task
```


### Distributed (but still local) processing

TODO

```shell
curl --header "Content-Type: application/json" \
     --data '{"model_class":"PartitionedExampleEtl"}'  \
     --request POST http://0.0.0.0:2345/api/0.01/task
```
