Metadata-Version: 2.1
Name: airflow-dbt-dinigo
Version: 0.5.10
Summary: Apache Airflow integration for dbt
License-File: LICENSE.txt
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3.7
Requires-Dist: apache-airflow>=1.10.3
Provides-Extra: google
Requires-Dist: apache-airflow-providers-google; extra == 'google'
Requires-Dist: google-cloud-build; extra == 'google'
Description-Content-Type: text/markdown

# airflow-dbt

This is a collection of [Airflow](https://airflow.apache.org/) operators to provide easy integration with [dbt](https://www.getdbt.com).

```py
from airflow import DAG
from airflow_dbt.operators.dbt_operator import (
    DbtSeedOperator,
    DbtSnapshotOperator,
    DbtRunOperator,
    DbtTestOperator
)
from airflow.utils.dates import days_ago

default_args = {
  'dir': '/srv/app/dbt',
  'start_date': days_ago(0)
}

with DAG(dag_id='dbt', default_args=default_args, schedule_interval='@daily') as dag:

  dbt_seed = DbtSeedOperator(
    task_id='dbt_seed',
  )

  dbt_snapshot = DbtSnapshotOperator(
    task_id='dbt_snapshot',
  )

  dbt_run = DbtRunOperator(
    task_id='dbt_run',
  )

  dbt_test = DbtTestOperator(
    task_id='dbt_test',
    retries=0,  # Failing tests would fail the task, and we don't want Airflow to try again
  )

  dbt_seed >> dbt_snapshot >> dbt_run >> dbt_test
```

## Installation

Install from PyPI:

```sh
pip install airflow-dbt
```

It will also need access to the `dbt` CLI, which should either be on your `PATH` or can be set with the `dbt_bin` argument in each operator.

## Usage

There are five operators currently implemented:

* `DbtDocsGenerateOperator`
  * Calls [`dbt docs generate`](https://docs.getdbt.com/reference/commands/cmd-docs)
* `DbtDepsOperator`
  * Calls [`dbt deps`](https://docs.getdbt.com/docs/deps)
* `DbtSeedOperator`
  * Calls [`dbt seed`](https://docs.getdbt.com/docs/seed)
* `DbtSnapshotOperator`
  * Calls [`dbt snapshot`](https://docs.getdbt.com/docs/snapshot)
* `DbtRunOperator`
  * Calls [`dbt run`](https://docs.getdbt.com/docs/run)
* `DbtTestOperator`
  * Calls [`dbt test`](https://docs.getdbt.com/docs/test)


Each of the above operators accept the arguments in [here (dbt_command_config)](airflow_dbt/dbt_command_config.py). The main ones being:

* `profiles_dir`
  * If set, passed as the `--profiles-dir` argument to the `dbt` command
* `target`
  * If set, passed as the `--target` argument to the `dbt` command
* `dir`
  * The directory to run the `dbt` command in
* `full_refresh`
  * If set to `True`, passes `--full-refresh`
* `vars`
  * If set, passed as the `--vars` argument to the `dbt` command. Should be set as a Python dictionary, as will be passed to the `dbt` command as YAML
* `models`
  * If set, passed as the `--models` argument to the `dbt` command
* `exclude`
  * If set, passed as the `--exclude` argument to the `dbt` command
* `select`
  * If set, passed as the `--select` argument to the `dbt` command
* `dbt_bin`
  * The `dbt` CLI. Defaults to `dbt`, so assumes it's on your `PATH`
* `verbose`
  * The operator will log verbosely to the Airflow logs
* `warn_error`
  * If set to `True`, passes `--warn-error` argument to `dbt` command and will treat warnings as errors

Typically you will want to use the `DbtRunOperator`, followed by the `DbtTestOperator`, as shown earlier.

You can also use the hook directly. Typically this can be used for when you need to combine the `dbt` command with another task in the same operators, for example running `dbt docs` and uploading the docs to somewhere they can be served from.

## A more advanced example:

If want to run your `dbt` project other tan in the airflow worker you can use
the `DbtCloudBuildHook` and apply it to the `DbtBaseOperator` or simply use the
provided `DbtCloudBuildOperator`:

```python
from airflow_dbt.hooks import DbtCloudBuildHook
from airflow_dbt.operators import DbtBaseOperator, DbtCloudBuildOperator
DbtBaseOperator(
    task_id='provide_hook',
    command='run',
    use_colors=False,
    config={
        'profiles_dir': './jaffle-shop',
        'project_dir': './jaffle-shop',
    },
    dbt_hook=DbtCloudBuildHook(
        gcs_staging_location='gs://my-bucket/compressed-dbt-project.tar.gz'
    )
)

DbtCloudBuildOperator(
    task_id='default_hook_cloudbuild',
    gcs_staging_location='gs://my-bucket/compressed-dbt-project.tar.gz',
    command='run',
    use_colors=False,
    config={
        'profiles_dir': './jaffle-shop',
        'project_dir': './jaffle-shop',
    },
)
```

You can either define the dbt params/config/flags in the operator or you can 
group them into a `config` param. They both have validation, but only the config
has templating. The following two tasks are equivalent:

```python
from airflow_dbt.operators.dbt_operator import DbtBaseOperator

DbtBaseOperator(
    task_id='config_param',
    command='run',
    config={
        'profiles_dir': './jaffle-shop',
        'project_dir': './jaffle-shop',
        'dbt_bin': '/usr/local/airflow/.local/bin/dbt',
        'use_colors': False
    }
)

DbtBaseOperator(
    task_id='flat_config',
    command='run',
    profiles_dir='./jaffle-shop',
    project_dir='./jaffle-shop',
    dbt_bin='/usr/local/airflow/.local/bin/dbt',
    use_colors=False
)
```

## Building Locally

To install from the repository:
First it's recommended to create a virtual environment:
```bash
python3 -m venv .venv

source .venv/bin/activate
```

Install using `pip`:
```bash
pip install .
```

## Testing

To run tests locally, first create a virtual environment (see [Building Locally](https://github.com/gocardless/airflow-dbt#building-locally) section)

Install dependencies:
```bash
pip install . pytest
```

Run the tests:
```bash
pytest tests/
```

## Code style
This project uses [flake8](https://flake8.pycqa.org/en/latest/).

To check your code, first create a virtual environment (see [Building Locally](https://github.com/gocardless/airflow-dbt#building-locally) section):
```bash
pip install flake8
flake8 airflow_dbt/ tests/ setup.py
```

## Package management

If you use dbt's package manager you should include all dependencies before deploying your dbt project.

For Docker users, packages specified in `packages.yml` should be included as part your docker image by calling `dbt deps` in your `Dockerfile`.

## Amazon Managed Workflows for Apache Airflow (MWAA)

If you use MWAA, you just need to update the `requirements.txt` file and add `airflow-dbt` and `dbt` to it.

Then you can have your dbt code inside a folder `{DBT_FOLDER}` in the dags folder on S3 and configure the dbt task like below:

```python
from airflow_dbt.operators.dbt_operator import DbtRunOperator 

dbt_run=DbtRunOperator(
  task_id='dbt_run',
  dbt_bin='/usr/local/airflow/.local/bin/dbt',
  profiles_dir='/usr/local/airflow/dags/{DBT_FOLDER}/',
  dir='/usr/local/airflow/dags/{DBT_FOLDER}/'
)
```

## License & Contributing

* This is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
* Bug reports and pull requests are welcome on GitHub at https://github.com/gocardless/airflow-dbt.

GoCardless ♥ open source. If you do too, come [join us](https://gocardless.com/about/jobs).
