Metadata-Version: 2.1
Name: RAGScraper
Version: 11.4.2023
Summary: RAGScraper is a Python library designed for efficient and intelligent scraping of web documentation and content. Tailored for Retrieval-Augmented Generation systems, RAGScraper extracts and preprocesses text into structured, machine-learning-ready formats. It emphasizes precision, context preservation, and ease of integration with RAG models, making it an ideal tool for developers looking to enhance AI-driven applications with rich, web-sourced knowledge.
License: MIT
Author: kdcokenny
Author-email: kenny@elapse.ai
Requires-Python: >=3.10,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: beautifulsoup4 (>=4.12.2,<5.0.0)
Requires-Dist: html2text (>=2020.1.16,<2021.0.0)
Requires-Dist: requests (>=2.31.0,<3.0.0)
Description-Content-Type: text/markdown

# RAGScraper

RAGScraper is a simple Python package that scrapes webpages and converts them to markdown format for RAG usage.

## Installation

To install RAGScraper, simply run:

```bash
pip install ragscraper
```

## Usage

To use RAGScraper as a command-line tool:

```bash
rag-scraper <URL>
```

To use RAGScraper in a Python script:

```python
from rag_scraper.scraper import Scraper
from rag_scraper.converter import Converter

# Fetch HTML content
url = "https://example.com"
html_content = Scraper.fetch_html(url)

# Convert to Markdown
markdown_content = Converter.html_to_markdown(html_content)
print(markdown_content)
```

## Development

To run the tests for RAGScraper, navigate to the package directory and run:

```bash
python -m unittest discover tests
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.
