Metadata-Version: 2.1
Name: SparkStream
Version: 1.3.0
Summary: A simple spark streaming handler.
Home-page: https://github.com/HassanRady/SparkStream
Author: Hassan Rady
Author-email: hassan.khaled.rady@gmail.com
License: MIT license
Keywords: Tweets
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6.0
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: nltk (==3.7)
Requires-Dist: pyspark (==3.2.0)
Requires-Dist: numpy (==1.23.1)
Requires-Dist: python-dotenv (==0.20.0)

# Spark Streaming Package
Package: <a href="https://pypi.org/project/SparkStream/#description">SparkStream-pypi</a>

## What is it?
It is a handler for processing streaming text data from a kafka topic into cassandra and redis.

## How it works?
The stream processing is done by the following steps:
1. Read data from kafka topic 
2. Parse the data into a spark dataframe with a schema
3. Clean the data: remove unwanted chars, fix abbreviations, remove stop-words, and remove empty fields
4. Save the data into cassandra and redis

## How to use it?
Use its API: <a href="https://github.com/HassanRady/Spark-Stream-Api">SparkStream-API github</a>

## Dependency
The package requires the following dependency:
- spark-redis_2.12-3.1.0-jar-with-dependencies.jar (<a href="https://mvnrepository.com/artifact/com.redislabs/spark-redis_2.12/3.1.0">mvn Repository</a>)

Its so to be able to write data into redis.
