Metadata-Version: 2.1
Name: NewsLookout
Version: 1.9.0
Summary: News scraping application
Home-page: https://github.com/sandeep-sandhu/NewsLookout
Author: Sandeep Singh Sandhu
Author-email: sandeep.sandhu@gmx.com
Maintainer: Sandeep Singh Sandhu
License: GPL-3
Keywords: Web-scraping,News,NLP,Information-Retrieval,crawler
Platform: Operating System :: MacOS :: MacOS X
Platform: Operating System :: Microsoft :: Windows
Platform: Operating System :: POSIX
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Environment :: No Input/Output (Daemon)
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Requires-Dist: newspaper3k
Requires-Dist: beautifulsoup4
Requires-Dist: lxml
Requires-Dist: nltk
Requires-Dist: requests
Requires-Dist: tqdm
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: tld
Requires-Dist: spacy
Requires-Dist: urllib3
Requires-Dist: configparser


NewsLookout is a web scraping application for financial events.  It is scalable, fault-tolerant, modular and configurable multi-threaded python console application. It is enterprise ready and can run behind a proxy environment via automated schedulers. The application is readily extended by adding custom modules via its 'plugin' architecture for additional news sources,  custom data pre-processing and NLP based news text analytics  (e.g. entity recognition, negative event classification, economy trends, industry trends, etc.). For more details, refer to https://github.com/sandeep-sandhu/NewsLookout
Installation: Although the applicaiton runs without any special configuration with default parameters, the parameters given in the default config file must be customized - especially the file and folder locations for data, config file, log file, PID file, etc.  Most importantly, certain model related data needs to be downloaded for NLTK and spacy NLP libraries as part of installation. For spacy, run the following command:
`python -m spacy download en_core_web_lg`
For nltk, run the following command within the python shell:
import nltk
nltk.download()
You can extend its functionality to add any additional website that you need scraped by using the template file 'template_for_plugin.py' and customising it.  Name your custom plugin file with the same name as the class object name. Place it in the plugins_contrib folder and add the plugin's name in the configuration file. It will be picked up automatically and run on the next application run. Take a look at one of the already implemented plugins code for examples of how a plugin can be written.

