Metadata-Version: 2.1
Name: arxiv-summarizer
Version: 0.1.1
Summary: A happy toolkit for arxiv paper summarization and understanding.
Home-page: https://github.com/ArchanGhosh/Arxiv-Summarizer
Author: Archan Ghosh, Arnav Das
Author-email: gharchan@gmail.com, arnav.das88@gmail.com
Maintainer: Archan Ghosh, Arnav Das, Debgandhar Ghosh, Subhayu Bala
Maintainer-email: gharchan@gmail.com, arnav.das88@gmail.com, debgandhar4000@gmail.com, balasubhayu99@gmail.com
License: Apache
Project-URL: GitHub, https://github.com/ArchanGhosh/Arxiv-Summarizer
Project-URL: Homepage, https://github.com/ArchanGhosh/Arxiv-Summarizer
Keywords: arxiv sdk summarization summary
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Development Status :: 3 - Alpha
Classifier: Topic :: Communications
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Environment :: Console
Classifier: Environment :: GPU
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: langchain
Requires-Dist: arxiv
Requires-Dist: pymupdf
Requires-Dist: transformers
Requires-Dist: torch==2.1.2
Requires-Dist: click
Requires-Dist: rich
Requires-Dist: pytest
Provides-Extra: dev
Requires-Dist: nox; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx; extra == "docs"
Requires-Dist: sphinxemoji; extra == "docs"
Requires-Dist: pydata-sphinx-theme; extra == "docs"
Requires-Dist: numpydoc; extra == "docs"
Requires-Dist: sphinx_panels; extra == "docs"
Requires-Dist: matplotlib; extra == "docs"
Requires-Dist: Ipython; extra == "docs"
Requires-Dist: sphinx-hoverxref; extra == "docs"

# Arxiv Summarizer

The `ArxivSummarizer` is a Python class designed for summarizing ArXiv documents using Hugging Face's Transformers library. It can be configured with a custom `SummarizationModel` or with pre-trained models based on user preferences.

## Table of Contents

- [Arxiv Summarizer](#arxiv-summarizer)
  - [Table of Contents](#table-of-contents)
  - [Installation](#installation)
  - [Usage](#usage)
    - [As CLI](#as-cli)
    - [With Custom SummarizationModel](#with-custom-summarizationmodel)
    - [With Pre-trained Model by Name](#with-pre-trained-model-by-name)
    - [With Default Models](#with-default-models)
    - [Fetching a list of papers](#fetching-a-list-of-papers)
  - [Examples](#examples)
  - [Contributing](#contributing)
  - [License](#license)

## Installation

Make sure you have Python 3.8 or later installed. Install the required dependencies using the following command:

You can use Arxiv Summarizer by simply doing
```bash
pip install arxiv-summarizer
```

For developers looking to tinker you can simply `git clone` this repository and use:
```bash
pip install .
```

## Usage

### As CLI

Arxiv Summarizer can be easily used as a CLI tool to get papers summarized... 

```sh
$ python3 -m arxiv_summarizer 1234.56789v1
```

### With Custom SummarizationModel

If you have a custom `SummarizationModel` and `Tokenizer`, you can use them with `ArxivSummarizer` directly:

```python
from arxiv_summarizer import SummarizationModel, ArxivSummarizer

# Initialize your custom SummarizationModel
custom_model = SummarizationModel(
    model="your_custom_model", 
    tokenizer="your_custom_tokenizer", 
    max_length=512, do_sample=True
)

# Initialize ArxivSummarizer with your custom model
summarizer = ArxivSummarizer(summarizer=custom_model)

# Generate a summary
summary = summarizer(arxiv_id="1234.5678")
print(summary)
```

### With Pre-trained Model by Name

You can use a pre-trained model from Hugging Face's model hub by specifying its name:

```python
from arxiv_summarizer import ArxivSummarizer

# Initialize ArxivSummarizer with a pre-trained model by name
summarizer = ArxivSummarizer(model="facebook/bart-large-cnn")

# Generate a summary
summary = summarizer(arxiv_id="1234.5678")
print(summary)
```

### With Default Models

If you don't provide a specific model, `ArxivSummarizer` will use default models based on GPU availability:

```python
from arxiv_summarizer import ArxivSummarizer

# Initialize ArxivSummarizer with default models
summarizer = ArxivSummarizer()

# Generate a summary
summary = summarizer(arxiv_id="1234.5678")
print(summary)
```

### Fetching a list of papers

First we can search a list of papers directly using the `fetch_paper()` definition.
```python
from rich.progress import Progress
from rich.console import Console
from rich.table import Table

from typing import List
from arxiv_summarizer.fetch_paper import fetch_paper, ArxivPaper

# Get the list of papers
papers = fetch_paper("Yoshua Bengio", max_docs=15)
results : List[ArxivPaper] = [paper for paper in papers]

print(f"{len(results)} Papers Found !!!")
```

This will load the papers, their metadata and their summaries. Now we can download the content of the paper and show the progress using a progressbar from `rich`.
```python
# Download the papers
with Progress() as progress:
    task = progress.add_task("[cyan] Downloading content...", total = len(results))

    for index, paper in enumerate(results):
        progress.update(task, advance=1, description=f"Downloading content for paper {paper.arxiv_id}")
        _ = results[index].content # This will download the content automatically.
```

Once all the content has been downloaded, we can display the content in a tabular structure using a `rich` Table.
```python
# Print the data
console = Console()

table = Table(show_header=True, header_style="bold magenta")
table.add_column("ID", style="dim")
table.add_column("Title", style="dim")
table.add_column("Authors", style="dim")
table.add_column("Content Size", style="dim")

for entry in results:
    entry:ArxivPaper
    table.add_row(entry.arxiv_id, entry.name, ", ".join(entry.authors), str(len(entry.content)))

console.print(table)
```

```
15 Papers Found !!!
Downloading content for paper 1203.4416v1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ ID           ┃ Title                                      ┃ Authors                                    ┃ Content Size ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ 1206.5533v2  │ Practical recommendations for              │ Yoshua Bengio                              │ 134815       │
│              │ gradient-based training of deep            │                                            │              │
│              │ architectures                              │                                            │              │
│ 1207.4404v1  │ Better Mixing via Deep Representations     │ Yoshua Bengio, Grégoire Mesnil, Yann       │ 31767        │
│              │                                            │ Dauphin, Salah Rifai                       │              │
│ 1305.0445v2  │ Deep Learning of Representations: Looking  │ Yoshua Bengio                              │ 121365       │
│              │ Forward                                    │                                            │              │
│ 1212.2686v1  │ Joint Training of Deep Boltzmann Machines  │ Ian Goodfellow, Aaron Courville, Yoshua    │ 13806        │
│              │                                            │ Bengio                                     │              │
│ 1703.07718v1 │ Independently Controllable Features        │ Emmanuel Bengio, Valentin Thomas, Joelle   │ 19385        │
│              │                                            │ Pineau, Doina Precup, Yoshua Bengio        │              │
│ 1211.5063v2  │ On the difficulty of training Recurrent    │ Razvan Pascanu, Tomas Mikolov, Yoshua      │ 50908        │
│              │ Neural Networks                            │ Bengio                                     │              │
│ 1206.5538v3  │ Representation Learning: A Review and New  │ Yoshua Bengio, Aaron Courville, Pascal     │ 194906       │
│              │ Perspectives                               │ Vincent                                    │              │
│ 1207.0057v1  │ Implicit Density Estimation by Local       │ Yoshua Bengio, Guillaume Alain, Salah      │ 35635        │
│              │ Moment Matching to Sample from             │ Rifai                                      │              │
│              │ Auto-Encoders                              │                                            │              │
│ 1305.6663v4  │ Generalized Denoising Auto-Encoders as     │ Yoshua Bengio, Li Yao, Guillaume Alain,    │ 33769        │
│              │ Generative Models                          │ Pascal Vincent                             │              │
│ 1311.6184v4  │ Bounding the Test Log-Likelihood of        │ Yoshua Bengio, Li Yao, Kyunghyun Cho       │ 23711        │
│              │ Generative Models                          │                                            │              │
│ 1510.02777v2 │ Early Inference in Energy-Based Models     │ Yoshua Bengio, Asja Fischer                │ 26477        │
│              │ Approximates Back-Propagation              │                                            │              │
│ 1509.05936v2 │ STDP as presynaptic activity times rate of │ Yoshua Bengio, Thomas Mesnard, Asja        │ 22030        │
│              │ change of postsynaptic activity            │ Fischer, Saizheng Zhang, Yuhuai Wu         │              │
│ 1103.2832v1  │ Autotagging music with conditional         │ Michael Mandel, Razvan Pascanu, Hugo       │ 47698        │
│              │ restricted Boltzmann machines              │ Larochelle, Yoshua Bengio                  │              │
│ 2007.15139v2 │ Deriving Differential Target Propagation   │ Yoshua Bengio                              │ 63661        │
│              │ from Iterating Approximate Inverses        │                                            │              │
│ 1203.4416v1  │ On Training Deep Boltzmann Machines        │ Guillaume Desjardins, Aaron Courville,     │ 20531        │
│              │                                            │ Yoshua Bengio                              │              │
└──────────────┴────────────────────────────────────────────┴────────────────────────────────────────────┴──────────────┘
```

## Examples

For more detailed examples, refer to the [Examples](examples/) directory.

## Contributing

Contributions are welcome! Please refer to the [Contributing Guidelines](CONTRIBUTING.md) for details on how to contribute to this project.

## License

This project is licensed under the [Apache](LICENSE).
