Metadata-Version: 2.1
Name: bigquery-spark-streaming-connector
Version: 0.1.0
Summary: spark structured streaming connector utility for gcp bigquery using storage api services.
Author-email: Indranil Tarafdar <indranil.tarafdar@databricks.com>
Description-Content-Type: text/markdown
Requires-Dist: google-cloud-bigquery-storage ==2.25.0
Requires-Dist: google-cloud-bigquery[fastavro]
Requires-Dist: pyarrow
Requires-Dist: fastavro


bigquery-streaming-connector
=======
<b>This is a library for bigquery streaming connector for pyspark.</b> <b/>

The underlying connector uses [bigquery storage api services] to pull bigquery table data at scale using spark workers.

[bigquery storage api services]: https://cloud.google.com/bigquery/docs/reference/storage

The storage api services is cheaper and faster than the traditional Bigquery query api services enabling faster & cheaper Bigquery migration incrementally in a continuous fashion.

<b> Pre-requisite </b>

```
pip install build.whl
```
<b>Pyspark usage:</b>

```
from streaming_connector import bq_stream_register

query=(spark.readStream.format("bigquery-streaming")
 .option("project_id", <bq_project_id>)
 .option("incremental_checkpoint_field",<table_incremental_ts_based_col>)
 .option("dataset",<bq_dataset_name>)
 .option("table",<bq_table_name>)
 .load()
 ## The above will ingest table data incrementally using the provided timestamp based field and latest value is checkpointed using offset semantics.
 ## Without the incremental input field full table ingestion is done.
```

