Metadata-Version: 2.0
Name: baleen
Version: 0.2
Summary: An automated ingestion service for blogs to construct a corpus for NLP research.
Home-page: https://github.com/bbengfort/baleen
Author: Benjamin Bengfort
Author-email: benjamin@bengfort.com
License: MIT
Download-URL: https://github.com/bbengfort/baleen/tarball/v0.2
Keywords: nlp,baleen,ingestion,blogs,rss
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2.7
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Requires-Dist: PyYAML (==3.11)
Requires-Dist: beautifulsoup4 (==4.4.1)
Requires-Dist: blinker (==1.4)
Requires-Dist: confire (==0.2.0)
Requires-Dist: feedparser (==5.2.1)
Requires-Dist: lxml (==3.5.0)
Requires-Dist: mongoengine (==0.10.6)
Requires-Dist: pymongo (==3.2.1)
Requires-Dist: python-dateutil (==2.4.2)
Requires-Dist: requests (==2.9.1)
Requires-Dist: schedule (==0.3.2)
Requires-Dist: six (==1.10.0)

Baleen is a tool for ingesting formal natural language data from the discourse of professional and amateur writers: e.g. bloggers and news outlets. Rather than performing web scraping, Baleen focuses on data ingestion through the use of RSS feeds. It performs as much raw data collection as it can, saving data into a Mongo document store.

For more, please see the full documentation at: http://baleen-ingest.readthedocs.org/en/latest/


