Metadata-Version: 2.1
Name: WebXplore
Version: 1.0.2
Summary: Explore Web Pages - Scrapers and Crawlers
Home-page: https://github.com/arnavn101/WebXplore
Author: Arnav Nidumolu
Author-email: arnav.nidumolu@gmail.com
License: UNKNOWN
Keywords: web crawling scraping nlp
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Development Status :: 5 - Production/Stable
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: nltk
Requires-Dist: requests
Requires-Dist: beautifulsoup4
Requires-Dist: google
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: textblob
Requires-Dist: sklearn
Requires-Dist: newsapi-python
Requires-Dist: praw
Requires-Dist: tweepy
Requires-Dist: readability-lxml


## WebXplore (v1.0.2)

[![Build Status](https://travis-ci.org/arnavn101/WebXplore.svg?branch=master)](https://travis-ci.org/arnavn101/WebXplore)
![PyPI - License](https://img.shields.io/pypi/l/webxplore)
[![codecov](https://codecov.io/gh/arnavn101/WebXplore/branch/master/graph/badge.svg)](https://codecov.io/gh/arnavn101/WebXplore) 

WebXplore offers multitude of tools for web scraping, crawling
and performing computations on scraped information to determine sentiment
values or tone of the author.

This package helps in retrieving information from these sources:

1) **Google Search:** Get links from any *google search query*.

2) **Website Text:** Use an *intelligent parser* to strip all the HTML pages from webpage contents.

3) **Twitter:** Given a word or phrase, get *related tweets*.

4) **Reddit:** Get the *hottest posts* given the subreddit and a key phrase.

5) **NewsAPI:** Retrieve *News Articles* given topic or phrase.

## Installation
```bash
$ pip install webxplore
```

or clone the repository.

```bash
$ git clone https://github.com/arnavn101/WebXplore.git
```

## Getting Started

Here are steps for using *webxplore*. 

#### 	1. Get Links from Google Search

```python
from webxplore import WebSearcher

searchQuery = WebSearcher.SearchWeb("Artificial Intelligence", 5)
print(searchQuery.returnListLinks())
```

#### 	2. Scrape a Website

```python
from webxplore import WebScraper

webScraper = WebScraper.ScrapeWebsite("https://en.wikipedia.org/wiki/Artificial_intelligence")
print(webScraper.return_article())
```

#### 	3. Get Sentiments from Text

```python
from webxplore.utils import SentimentAnalyzer

sentimentAnalyzer = SentimentAnalyzer.RetrieveSentiments("I am a good person")
print(sentimentAnalyzer.returnFinalSentiment())
```

#### 	4. Get Summary of the Text

```python
from webxplore.utils import TextSummarizer

textSummarizer = TextSummarizer.SummarizeText("I am very scared. Please do not leave me.", 2)
print(textSummarizer.returnFinalSummary())
```

#### 	5. Get Tone of the Text (for each sentence)

```python
from webxplore.utils import ToneAnalyzer

textTone = ToneAnalyzer.ToneAnalysis("I am an incredibly gifted person. I am also a good man.", "watsonApiKey")
print(textTone.returnTone())

```

#### 	6. Use the news api to get the latest articles

```python
from webxplore.searchBeyond import SearchNews

newsArticles = SearchNews.RetrieveNewsArticle('Politics', 5, "newsApiKey")
print(newsArticles.return_articleSentences())

```

#### 	7. Get Posts from a SubReddit

```python
from webxplore.searchBeyond import SearchReddit

redditPosts = SearchReddit.CrawlSubReddit("stocks", "amazon", 10, "RedditClientId",
                                          "RedditClientSecret", "RedditUserAgent")
print(redditPosts.return_listSentences())

```

#### 	8. Get Tweets that have a key word

```python
from webxplore.searchBeyond import SearchTwitter

retrieveTweets = SearchTwitter.CrawlTwitter('tesla', 10, "TwitterConsumerKey", "TwitterConsumerSecret",
                                            "TwitterAccountKey", "TwitterAccountSecret")
print(retrieveTweets.return_tweets())

```

## Contributions

Anyone is welcome to add any contribution to this repository.
All good changes are welcome. Please create a pull request and ensure that it passes
all the CI tests.

## License

MIT License Copyright (c) 2020, Arnav Nidumolu



