Metadata-Version: 2.1
Name: PyAthenaJDBC-qubole
Version: 1.3.4
Summary: Python DB API 2.0 (PEP 249) compliant wrapper for Amazon Athena JDBC driver
Home-page: https://github.com/tanaya1234/PyAthenaJDBC-1
Author: laughingman7743
Author-email: laughingman7743@gmail.com
License: MIT License
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Database :: Front-Ends
Classifier: Programming Language :: Java
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Provides-Extra: SQLAlchemy
Provides-Extra: Pandas
Requires-Dist: future
Requires-Dist: jpype1 (>=0.6.0)
Requires-Dist: botocore (>=1.0.0)
Provides-Extra: Pandas
Requires-Dist: pandas (>=0.19.0); extra == 'Pandas'
Provides-Extra: SQLAlchemy
Requires-Dist: SQLAlchemy (>=1.0.0); extra == 'SQLAlchemy'

.. image:: https://img.shields.io/pypi/pyversions/PyAthenaJDBC.svg
    :target: https://pypi.python.org/pypi/PyAthenaJDBC/

.. image:: https://travis-ci.org/laughingman7743/PyAthenaJDBC.svg?branch=master
    :target: https://travis-ci.org/laughingman7743/PyAthenaJDBC

.. image:: https://codecov.io/gh/laughingman7743/PyAthenaJDBC/branch/master/graph/badge.svg
    :target: https://codecov.io/gh/laughingman7743/PyAthenaJDBC

.. image:: https://img.shields.io/pypi/l/PyAthenaJDBC.svg
    :target: https://github.com/laughingman7743/PyAthenaJDBC/blob/master/LICENSE


PyAthenaJDBC
============

PyAthenaJDBC is a Python `DB API 2.0 (PEP 249)`_ compliant wrapper for `Amazon Athena JDBC driver`_.

.. _`DB API 2.0 (PEP 249)`: https://www.python.org/dev/peps/pep-0249/
.. _`Amazon Athena JDBC driver`: http://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html

Requirements
------------

* Python

  - CPython 2,7, 3,4, 3.5, 3.6

* Java

  - Java >= 8

Installation
------------

.. code:: bash

    $ pip install PyAthenaJDBC

Extra packages:

+---------------+------------------------------------------+----------+
| Package       | Install command                          | Version  |
+===============+==========================================+==========+
| Pandas        | ``pip install PyAthenaJDBC[Pandas]``     | >=0.19.0 |
+---------------+------------------------------------------+----------+
| SQLAlchemy    | ``pip install PyAthenaJDBC[SQLAlchemy]`` | >=1.0.0  |
+---------------+------------------------------------------+----------+

Usage
-----

Basic usage
~~~~~~~~~~~

.. code:: python

    from pyathenajdbc import connect

    conn = connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
                   region_name='us-west-2')
    try:
        with conn.cursor() as cursor:
            cursor.execute("""
            SELECT * FROM one_row
            """)
            print(cursor.description)
            print(cursor.fetchall())
    finally:
        conn.close()

Cursor iteration
~~~~~~~~~~~~~~~~

.. code:: python

    from pyathenajdbc import connect

    conn = connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
                   region_name='us-west-2')
    try:
        with conn.cursor() as cursor:
            cursor.execute("""
            SELECT * FROM many_rows LIMIT 10
            """)
            for row in cursor:
                print(row)
    finally:
        conn.close()

Query with parameter
~~~~~~~~~~~~~~~~~~~~

Supported `DB API paramstyle`_ is only ``PyFormat``.
``PyFormat`` only supports `named placeholders`_ with old ``%`` operator style and parameters specify dictionary format.

.. code:: python

    from pyathenajdbc import connect

    conn = connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
                   region_name='us-west-2')
    try:
        with conn.cursor() as cursor:
            cursor.execute("""
            SELECT col_string FROM one_row_complex
            WHERE col_string = %(param)s
            """, {'param': 'a string'})
            print(cursor.fetchall())
    finally:
        conn.close()

if ``%`` character is contained in your query, it must be escaped with ``%%`` like the following:

.. code:: sql

    SELECT col_string FROM one_row_complex
    WHERE col_string = %(param)s OR col_string LIKE 'a%%'

.. _`DB API paramstyle`: https://www.python.org/dev/peps/pep-0249/#paramstyle
.. _`named placeholders`: https://pyformat.info/#named_placeholders

JVM Options
~~~~~~~~~~~

In the connect method or connection object, you can specify JVM options with a string array.

You can increase the JVM heap size like the following:

.. code:: python

    from pyathenajdbc import connect

    conn = connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
                   region_name='us-west-2',
                   jvm_options=['-Xms1024m', '-Xmx4096m'])
    try:
        with conn.cursor() as cursor:
            cursor.execute("""
            SELECT * FROM many_rows
            """)
            print(cursor.fetchall())
    finally:
        conn.close()

SQLAlchemy
~~~~~~~~~~

Install SQLAlchemy with ``pip install SQLAlchemy>=1.0.0`` or ``pip install PyAthenaJDBC[SQLAlchemy]``.
Supported SQLAlchemy is 1.0.0 or higher.

.. code:: python

    import contextlib
    from urllib.parse import quote_plus  # PY2: from urllib import quote_plus
    from sqlalchemy.engine import create_engine
    from sqlalchemy.sql.expression import select
    from sqlalchemy.sql.functions import func
    from sqlalchemy.sql.schema import Table, MetaData

    conn_str = 'awsathena+jdbc://{access_key}:{secret_key}@athena.{region_name}.amazonaws.com:443/'\
               '{schema_name}?s3_staging_dir={s3_staging_dir}'
    engine = create_engine(conn_str.format(
        access_key=quote_plus('YOUR_ACCESS_KEY'),
        secret_key=quote_plus('YOUR_SECRET_ACCESS_KEY'),
        region_name='us-west-2',
        schema_name='default',
        s3_staging_dir=quote_plus('s3://YOUR_S3_BUCKET/path/to/')))
    try:
        with contextlib.closing(engine.connect()) as conn:
            many_rows = Table('many_rows', MetaData(bind=engine), autoload=True)
            print(select([func.count('*')], from_obj=many_rows).scalar())
    finally:
        engine.dispose()

The connection string has the following format:

.. code:: python

    awsathena+jdbc://{access_key}:{secret_key}@athena.{region_name}.amazonaws.com:443/{schema_name}?s3_staging_dir={s3_staging_dir}&driver_path={driver_path}&...

NOTE: ``s3_staging_dir`` requires quote. If ``access_key``, ``secret_key`` and other parameter contain special characters, quote is also required.

Pandas
~~~~~~

Minimal example for Pandas DataFrame:

.. code:: python

    from pyathenajdbc import connect
    import pandas as pd

    conn = connect(access_key='YOUR_ACCESS_KEY_ID',
                   secret_key='YOUR_SECRET_ACCESS_KEY',
                   s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
                   region_name='us-west-2',
                   jvm_path='/path/to/jvm')  # optional, as used by JPype
    df = pd.read_sql("SELECT * FROM many_rows LIMIT 10", conn)

As Pandas DataFrame:

.. code:: python

    import contextlib
    from pyathenajdbc import connect
    from pyathenajdbc.util import as_pandas

    with contextlib.closing(
            connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/'
                    region_name='us-west-2'))) as conn:
        with conn.cursor() as cursor:
            cursor.execute("""
            SELECT * FROM many_rows
            """)
            df = as_pandas(cursor)
    print(df.describe())

Examples
--------

Redash_ query runner example
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

See `examples/redash/athena.py`_

.. _Redash: https://github.com/getredash/redash
.. _`examples/redash/athena.py`: examples/redash/athena.py

Credential
----------

Support `AWS CLI credentials`_, `Properties file credentials`_ and `AWS credentials provider chain`_.

.. _`AWS CLI credentials`: http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html
.. _`Properties file credentials`: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/PropertiesFileCredentialsProvider.html
.. _`AWS credentials provider chain`: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html

Credential Files
~~~~~~~~~~~~~~~~

~/.aws/credentials

.. code:: cfg

    [default]
    aws_access_key_id=YOUR_ACCESS_KEY_ID
    aws_secret_access_key=YOUR_SECRET_ACCESS_KEY

~/.aws/config

.. code:: cfg

    [default]
    region=us-west-2
    output=json

Environment variables
~~~~~~~~~~~~~~~~~~~~~

.. code:: bash

    $ export AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY_ID
    $ export AWS_SECRET_ACCESS_KEY=YOUR_SECRET_ACCESS_KEY
    $ export AWS_DEFAULT_REGION=us-west-2

Additional environment variable:

.. code:: bash

    $ export AWS_ATHENA_S3_STAGING_DIR=s3://YOUR_S3_BUCKET/path/to/

Properties file credentials
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Create a property file of the following format.

/path/to/AWSCredentials.properties

.. code:: properties

    accessKeyId:YOUR_ACCESS_KEY_ID
    secretKey:YOUR_SECRET_ACCESS_KEY

Specify the property file path with ``credential_file`` of the connect method or connection object.

.. code:: python

    from pyathenajdbc import connect

    conn = connect(credential_file='/path/to/AWSCredentials.properties',
                   s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
                   region_name='us-west-2')

PyAthenaJDBC uses the property file to authenticate Amazon Athena.

AWS credentials provider chain
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

See `AWS credentials provider chain`_

    AWS credentials provider chain that looks for credentials in this order:

        * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (RECOMMENDED since they are recognized by all the AWS SDKs and CLI except for .NET), or AWS_ACCESS_KEY and AWS_SECRET_KEY (only recognized by Java SDK)
        * Java System Properties - aws.accessKeyId and aws.secretKey
        * Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI
        * Credentials delivered through the Amazon EC2 container service if AWS_CONTAINER_CREDENTIALS_RELATIVE_URI" environment variable is set and security manager has permission to access the variable,
        * Instance profile credentials delivered through the Amazon EC2 metadata service

In the connect method or connection object, you can connect by specifying at least ``s3_staging_dir`` and ``region_name``.
It is not necessary to specify ``access_key`` and ``secret_key``.

.. code:: python

    from pyathenajdbc import connect

    conn = connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
                   region_name='us-west-2')

Terraform_ Instance profile example:

See `examples/terraform/`_

.. _Terraform: https://github.com/hashicorp/terraform
.. _`examples/terraform/`: examples/terraform/


Testing
-------

Depends on the following environment variables:

.. code:: bash

    $ export AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY_ID
    $ export AWS_SECRET_ACCESS_KEY=YOUR_SECRET_ACCESS_KEY
    $ export AWS_DEFAULT_REGION=us-west-2
    $ export AWS_ATHENA_S3_STAGING_DIR=s3://YOUR_S3_BUCKET/path/to/

Run test
~~~~~~~~

.. code:: bash

    $ pip install pipenv
    $ pipenv install --dev
    $ pipenv run scripts/test_data/upload_test_data.sh
    $ pipenv run pytest
    $ pipenv run scripts/test_data/delete_test_data.sh

Run test multiple Python versions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: bash

    $ pip install pipenv
    $ pipenv install --dev
    $ pipenv run scripts/test_data/upload_test_data.sh
    $ pyenv local 3.6.5 3.5.5 3.4.8 2.7.14
    $ pipenv run tox
    $ pipenv run scripts/test_data/delete_test_data.sh


