Metadata-Version: 2.1
Name: article-extraction
Version: 0.3.0
Summary: Article text extraction library
Home-page: https://github.com/pmatigakis/article-extraction
License: MIT
Keywords: article extraction
Author: Matigakis Panagiotis
Author-email: pmatigakis@gmail.com
Requires-Python: >=3.8,<4.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: lxml (>=4.7.1)
Project-URL: Repository, https://github.com/pmatigakis/article-extraction
Description-Content-Type: text/markdown

# Article extraction library.

article-extraction is a package that can be used to extract the article content
from an HTML page.

# Installation

Use poetry to install the library from GitHub.

```bash
poetry add "git+https://github.com/pmatigakis/article-extraction.git"
```

# Usage

Extract the content of an article using article-extraction.

```python
from urllib.request import urlopen

from articles.mss.extractors import MSSArticleExtractor

document = urlopen("https://www.bbc.com/sport/formula1/64983451").read()
article_extractor = MSSArticleExtractor()
article = article_extractor.extract_article(document)
print(article)
```

