Metadata-Version: 2.4
Name: bitchute-scraper
Version: 1.0.0
Summary: A modern, API-based package to scrape BitChute platform data.
Home-page: https://github.com/bumatic/bitchute-scraper
Download-URL: https://github.com/bumatic/bitchute-scraper/archive/v1.0.0.tar.gz
Author: Marcus Burkhardt
Author-email: Marcus Burkhardt <marcus.burkhardt@gmail.com>
Maintainer: Marcus Burkhardt
Maintainer-email: Marcus Burkhardt <marcus.burkhardt@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/bumatic/bitchute-scraper
Project-URL: Repository, https://github.com/bumatic/bitchute-scraper
Project-URL: Documentation, https://github.com/bumatic/bitchute-scraper/blob/main/README.md
Project-URL: Bug Reports, https://github.com/bumatic/bitchute-scraper/issues
Project-URL: Changelog, https://github.com/bumatic/bitchute-scraper/blob/main/CHANGELOG.md
Keywords: bitchute,api,scraper,video,data-collection,download,media,research,social-media,content-analysis,web-scraping,data-science,automation,bulk-download
Platform: any
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Information Technology
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Operating System :: OS Independent
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Multimedia :: Video
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28.0
Requires-Dist: pandas>=1.5.0
Requires-Dist: python-dateutil>=2.8.0
Requires-Dist: retrying>=1.3.0
Requires-Dist: selenium>=4.10.0
Requires-Dist: webdriver-manager>=3.8.0
Requires-Dist: urllib3>=1.26.0
Requires-Dist: openpyxl>=3.0.0
Provides-Extra: full
Requires-Dist: tqdm>=4.64.0; extra == "full"
Requires-Dist: pyarrow>=10.0.0; extra == "full"
Requires-Dist: psutil>=5.8.0; extra == "full"
Requires-Dist: pyyaml>=6.0; extra == "full"
Provides-Extra: progress
Requires-Dist: tqdm>=4.64.0; extra == "progress"
Provides-Extra: fast
Requires-Dist: pyarrow>=10.0.0; extra == "fast"
Provides-Extra: monitoring
Requires-Dist: psutil>=5.8.0; extra == "monitoring"
Provides-Extra: config
Requires-Dist: pyyaml>=6.0; extra == "config"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: types-requests>=2.28.0; extra == "dev"
Requires-Dist: types-python-dateutil>=2.8.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=5.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "docs"
Requires-Dist: myst-parser>=0.18.0; extra == "docs"
Provides-Extra: test
Requires-Dist: pytest>=7.0.0; extra == "test"
Requires-Dist: pytest-mock>=3.10.0; extra == "test"
Requires-Dist: pytest-cov>=4.0.0; extra == "test"
Dynamic: author
Dynamic: download-url
Dynamic: home-page
Dynamic: license-file
Dynamic: maintainer
Dynamic: platform
Dynamic: requires-python

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5643102.svg)](https://doi.org/10.5281/zenodo.5643102)

# BitChute Scraper

Python scraper for the BitChute video platform. It allows you to query for videos and to retrieve platform recommendations such as trending videos, popular videos (now called "fresh") or trending tags. The release of version 1.0.0 is a major update using an API approach to data collection compared to the Selenium based scraper of now defunct previous versions. Since the codebase was completely rewritten in collaboration with Claude AI backwards compatibility is not provided.

## Features

- **Fast API-based data collection** - 10x faster than HTML parsing approaches
- **Automatic media downloads** - Thumbnails and videos with smart caching
- **Comprehensive data models** - Videos, channels, hashtags with computed properties
- **Concurrent processing** - Parallel requests with configurable rate limiting
- **Multiple export formats** - CSV, JSON, Excel, Parquet with timestamps
- **Command-line interface** - Easy automation and scripting support
- **Robust error handling** - Automatic retries and graceful fallbacks

## Installation

Install from PyPI:

```bash
pip3 install bitchute-scraper
```

For full functionality including progress bars and fast data formats:

```bash
pip install bitchute-scraper[full]
```

### System Requirements

- Python 3.7+
- Google Chrome or Chromium browser
- ChromeDriver (auto-managed)

## Quick Start

### Basic Usage

```python
import bitchute

# Initialize API client
api = bitchute.BitChuteAPI(verbose=True)

# Get trending videos
trending = api.get_trending_videos('day', limit=50)
print(f"Retrieved {len(trending)} trending videos")

# Search for videos
results = api.search_videos('climate change', limit=100)

# Get video details
video_info = api.get_video_info('VIDEO_ID', include_counts=True)
```

### Download Support

```python
# Initialize with downloads enabled
api = bitchute.BitChuteAPI(
    enable_downloads=True,
    download_base_dir="downloads",
    verbose=True
)

# Download videos with thumbnails
videos = api.get_trending_videos(
    'week',
    limit=20,
    download_thumbnails=True,
    download_videos=True
)
```

### Data Export

```python
from bitchute.utils import DataExporter

# Get data and export to multiple formats
videos = api.get_popular_videos(limit=100)

exporter = DataExporter()
exported_files = exporter.export_data(
    videos, 
    'popular_videos', 
    ['csv', 'json', 'xlsx']
)
```

### Command Line Interface

```bash
# Get trending videos
bitchute trending --timeframe day --limit 50 --format csv

# Search videos with details
bitchute search "bitcoin" --limit 100 --sort views --analyze

# Export to Excel
bitchute popular --limit 200 --format xlsx --analyze
```

## API Overview

### Core Methods

**Platform Recommendations:**
- `get_trending_videos(timeframe, limit)` - Trending by day/week/month
- `get_popular_videos(limit)` - Popular videos
- `get_recent_videos(limit)` - Most recent uploads
- `get_short_videos(limit)` - Short-form content

**Search Functions:**
- `search_videos(query, sensitivity, sort, limit)` - Video search
- `search_channels(query, sensitivity, limit)` - Channel search

**Individual Items:**
- `get_video_info(video_id, include_counts, include_media)` - Single video details
- `get_channel_info(channel_id)` - Channel information

**Hashtags:**
- `get_trending_hashtags(limit)` - Trending hashtags
- `get_videos_by_hashtag(hashtag, limit)` - Videos by hashtag

### Configuration Options

```python
api = bitchute.BitChuteAPI(
    verbose=True,                    # Enable logging
    enable_downloads=True,           # Enable media downloads
    download_base_dir="data",        # Download directory
    max_concurrent_downloads=5,      # Concurrent downloads
    rate_limit=0.3,                 # Seconds between requests
    timeout=60                      # Request timeout
)
```

### Data Models

All methods return pandas DataFrames with consistent schemas:

- **Video**: Complete metadata with engagement metrics and download paths
- **Channel**: Channel information with statistics and social links
- **Hashtag**: Trending hashtags with rankings and video counts

## Advanced Usage

### Bulk Data Collection

```python
# Get large datasets efficiently
all_videos = api.get_all_videos(limit=5000, include_details=True)

# Process with filtering
from bitchute.utils import ContentFilter
filtered = ContentFilter.filter_by_views(all_videos, min_views=1000)
crypto_videos = ContentFilter.filter_by_keywords(filtered, ['bitcoin', 'crypto'])
```

### Performance Monitoring

```python
# Track download performance
stats = api.get_download_stats()
print(f"Success rate: {stats['success_rate']:.1%}")
print(f"Total downloaded: {stats['total_bytes_formatted']}")
```

## Documentation

- **API Reference**: Complete method documentation with examples
- **User Guide**: Detailed tutorials and best practices
- **CLI Reference**: Command-line usage and automation examples

## Contributing

We welcome contributions! Please see our contributing guidelines:

1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request

### Development Setup

```bash
git clone https://github.com/bumatic/bitchute-scraper.git
cd bitchute-scraper
pip install -e .[dev]
pytest
```

## License

MIT License - see LICENSE file for details.

## Support

- **Issues**: [GitHub Issues](https://github.com/bumatic/bitchute-scraper/issues)
- **Discussions**: [GitHub Discussions](https://github.com/bumatic/bitchute-scraper/discussions)


## Disclaimer

This software is intended for educational and research purposes only.
Users are responsible for complying with Terms of Service and all applicable laws. 
The software authors disclaim all liability for any misuse of this software.
