Metadata-Version: 2.1
Name: PyPaperBot
Version: 1.1.0
Summary: PyPaperBot is a Python tool for downloading scientific papers using Google Scholar, Crossref, and SciHub.
Home-page: https://github.com/ferru97/PyPaperBot
Author: Vito Ferrulli
Author-email: vitof970@gmail.com
License: MIT
Download-URL: https://github.com/ferru97/PyPaperBot/archive/v1.1.0.tar.gz
Keywords: download-papers,google-scholar,scihub,scholar,crossref,papers
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Description-Content-Type: text/markdown
Requires-Dist: astroid (==2.4.2)
Requires-Dist: beautifulsoup4 (==4.9.1)
Requires-Dist: bibtexparser (==1.2.0)
Requires-Dist: certifi (==2020.6.20)
Requires-Dist: chardet (==3.0.4)
Requires-Dist: colorama (==0.4.3)
Requires-Dist: crossref-commons (==0.0.7)
Requires-Dist: future (==0.18.2)
Requires-Dist: HTMLParser (==0.0.2)
Requires-Dist: idna (==2.10)
Requires-Dist: isort (==5.4.2)
Requires-Dist: lazy-object-proxy (==1.4.3)
Requires-Dist: mccabe (==0.6.1)
Requires-Dist: numpy (==1.20.1)
Requires-Dist: pandas (==1.2.2)
Requires-Dist: pylint (==2.6.0)
Requires-Dist: pyparsing (==2.4.7)
Requires-Dist: python-dateutil (==2.8.1)
Requires-Dist: pytz (==2020.1)
Requires-Dist: ratelimit (==2.2.1)
Requires-Dist: requests (==2.24.0)
Requires-Dist: six (==1.15.0)
Requires-Dist: soupsieve (==2.0.1)
Requires-Dist: toml (==0.10.1)
Requires-Dist: urllib3 (==1.25.10)
Requires-Dist: wrapt (==1.12.1)

[![Donate](https://img.shields.io/badge/Donate-PayPal-green.svg)](https://www.paypal.me/ferru97)

# PyPaperBot

PyPaperBot is a Python tool for **downloading scientific papers** using Google Scholar, Crossref, and SciHub.
The tool tries to download papers from different sources such as PDF provided by Scholar, Scholar related links, and Scihub.
PyPaerbot is also able to download the **bibtex** of each paper.

## Features

- Download papers given a query
- Download papers given paper's DOIs
- Download papers given a Google Scholar link
- Generate Bibtex of the downloaded paper
- Filter downloaded paper by year, journal and citations number

## Installation

Use `pip` to install from pypi:

```bash
pip install PyPaperBot
```

## How to use

PyPaperBot arguments:

| Arguments          | Description                                                                              | Type   |
| ------------------ | ---------------------------------------------------------------------------------------- | ------ |
| \-\-query          | Query to make on Google Scholar or Google Scholar page link                              | string |
| \-\-doi            | DOI of the paper to download (this option uses only SciHub to download)                  | string |
| \-\-doi-file       | File .txt containing the list of paper's DOIs to download                                | string |
| \-\-scholar-pages  | Number or range of Google Scholar pages to inspect. Each page has a maximum of 10 papers | string |
| \-\-dwn-dir        | Directory path in which to save the result                                               | string |
| \-\-min-year       | Minimal publication year of the paper to download                                        | int    |
| \-\-max-dwn-year   | Maximum number of papers to download sorted by year                                      | int    |
| \-\-max-dwn-cites  | Maximum number of papers to download sorted by number of citations                       | int    |
| \-\-journal-filter | CSV file path of the journal filter (More info on github)                                | string |
| \-\-restrict       | 0:Download only Bibtex - 1:Down load only papers PDF                                     | int    |
| \-\-scihub-mirror  | Mirror for downloading papers from sci-hub. If not set, it is selected automatically     | string |
| \-h                | Shows the help                                                                           | --     |

### Note

You can use only one of the arguments in the following groups

- *\-\-query*, *\-\-doi-file*, and *\-\-doi* 
- *\-\-max-dwn-year* and *and max-dwn-cites*

One of the arguments *\-\-scholar-pages*, *\-\-query *, and* \-\-file* is mandatory
The arguments *\-\-scholar-pages* is mandatory when using *\-\-query *
The argument *\-\-dwn-dir* is mandatory

The argument *\-\-journal-filter*  require the path of a CSV containing a list of journal name paired with a boolean which indicates whether or not to consider that journal (0: don't consider /1: consider) [Example](https://github.com/ferru97/PyPaperBot/blob/master/file_examples/jurnals.csv)

The argument *\-\-doi-file*  require the path of a txt file containing the list of paper's DOIs to download organized with one DOI per line [Example](https://github.com/ferru97/PyPaperBot/blob/master/file_examples/papers.txt)

## SchiHub access

If access to SciHub is blocked in your country, consider using a free VPN service like [ProtonVPN](https://protonvpn.com/)

## Example

Download a maximum of 30 papers from the first 3 pages given a query and starting from 2018 using the mirror https://sci-hub.do:

```bash
python -m PyPaperBot --query="Machine learning" --scholar-pages=3  --min-year=2018 --dwn-dir="C:\User\example\papers" --scihub-mirror="https://sci-hub.do"
```

Download papers from pages 4 to 7 (7th included) given a query:

```bash
python -m PyPaperBot --query="Machine learning" --scholar-pages=4-7 --dwn-dir="C:\User\example\papers"
```

Download a paper given the DOI:

```bash
python -m PyPaperBot --doi="10.0086/s41037-711-0132-1" --dwn-dir="C:\User\example\papers"`
```

Download papers given a file containing the DOIs:

```bash
python -m PyPaperBot --doi-file="C:\User\example\papers\file.txt" --dwn-dir="C:\User\example\papers"`
```

If it doesn't work, try to use *py* instead of *python* i.e.

```bash
py -m PyPaperBot --doi="10.0086/s41037-711-0132-1" --dwn-dir="C:\User\example\papers"`
```

## Contributions

Feel free to contribute to this project by proposing any change, fix, and enhancement on the **dev** branch

### To do

- Tests
- Code documentation
- General improvements

## Disclaimer

This application is for educational purposes only. I do not take responsibility for what you choose to do with this application.

## Donation

If you like this project, you can give me a cup of coffee :) 

[![paypal](https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif)](https://www.paypal.me/ferru97)


