Metadata-Version: 2.4
Name: casts_down
Version: 2.1.8
Summary: Cross-platform CLI for downloading and transcribing podcasts with local Whisper speech-to-text
Author: Casts Down Contributors
License-Expression: MIT
Project-URL: Homepage, https://github.com/clemente0731/casts_down
Project-URL: Repository, https://github.com/clemente0731/casts_down
Project-URL: Issues, https://github.com/clemente0731/casts_down/issues
Keywords: podcast,downloader,cli,transcribe,speech-to-text,whisper,srt,subtitle,apple-podcasts,xiaoyuzhou,rss,faster-whisper,mlx-whisper,audio,asr
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: End Users/Desktop
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Environment :: Console
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: aiohttp>=3.8.0
Requires-Dist: beautifulsoup4>=4.11.0
Requires-Dist: click>=8.1.0
Requires-Dist: feedparser>=6.0.10
Requires-Dist: tqdm>=4.65.0
Requires-Dist: faster-whisper<2.0.0,>=1.0.0
Provides-Extra: metal
Requires-Dist: mlx-whisper<1.0.0,>=0.4.0; extra == "metal"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21; extra == "dev"

```
   ____          _         ____
  / ___|__ _ ___| |_ ___  |  _ \  _____      ___ __
 | |   / _` / __| __/ __| | | | |/ _ \ \ /\ / / '_ \
 | |__| (_| \__ \ |_\__ \ | |_| | (_) \ V  V /| | | |
  \____\__,_|___/\__|___/ |____/ \___/ \_/\_/ |_| |_|

      Intelligent Podcast Downloader & Transcriber
```

A cross-platform CLI tool for downloading and transcribing podcasts. Supports Apple Podcasts, Xiaoyuzhou, and RSS feeds with built-in local speech-to-text powered by Whisper.

---

## Disclaimer

> **This tool is for EDUCATIONAL and PERSONAL USE ONLY.**
>
> By using this software, you agree to: use for personal learning and research only; respect copyright laws and intellectual property; support content creators through official channels; comply with platform terms of service.
>
> **Prohibited:** commercial redistribution, mass downloading for public sharing, bypassing paid subscriptions, any activity that harms content creators or platforms. The developers fully support and uphold the rights of content creators and platforms.

> **本工具仅供学习和个人使用。**
>
> 使用本软件即表示您同意：仅用于个人学习和研究；尊重版权法律和知识产权；通过官方渠道支持内容创作者；遵守平台服务条款。
>
> **禁止：** 商业性再分发、大规模下载用于公开传播、绕过付费订阅服务、任何损害创作者或平台的行为。开发者拥护并尊重内容创作者和平台的所有权利。

---

## Features

- **Smart URL Detection** - Automatically identifies platform from URL, no need to specify downloader
- **Multi-Platform Support**
  - Apple Podcasts (single episodes and podcast pages)
  - Xiaoyuzhou / 小宇宙 (single episodes and podcast feeds)
  - Standard RSS 2.0 feeds
- **Async Concurrent Downloads** - Configurable concurrency for faster batch downloads
- **Auto Transcription** - Downloads are automatically transcribed to text after completion
- **Built-in Speech-to-Text** - Local transcription via faster-whisper (CUDA/CPU), with optional mlx-whisper (Metal) for Mac
- **Subtitle Output** - Generates SRT (millisecond precision) and timestamped TXT files
- **Progress Display** - Real-time download and transcription progress tracking
- **Episode Selection** - Download all, latest N, or specific episodes from Apple Podcasts links
- **Smart File Management** - Auto-naming, skip existing files, resume-safe temp files

## Installation

### Install via pip

```bash
pip install casts_down
```

Includes all dependencies — download, transcription, and Whisper model auto-download. Ready to use immediately.

### macOS Apple Silicon (Metal acceleration)

```bash
pip install "casts_down[metal]"
```

Adds mlx-whisper for Metal GPU acceleration. Falls back to faster-whisper CPU if unavailable.

### Install from source

```bash
git clone https://github.com/clemente0731/casts_down.git
cd casts_down
pip install -e ".[dev]"
```

### Build & Publish

```bash
git clone https://github.com/clemente0731/casts_down.git
cd casts_down

make build          # .pyz standalone executable (<1s)
make dist           # wheel + sdist for PyPI
make publish        # build + upload to PyPI
make publish-test   # build + upload to TestPyPI
make release        # clean + build all (.pyz + wheel + sdist)
```

See [BUILD.md](BUILD.md) for details.

## Quick Start

```bash
# Download and transcribe (transcription is automatic)
casts-down "https://podcasts.apple.com/podcast/id123"

# Download all episodes
casts-down "https://feeds.example.com/podcast.rss" --all

# Download without transcription
casts-down "https://feeds.example.com/podcast.rss" --no-transcribe

# Xiaoyuzhou
casts-down "https://www.xiaoyuzhoufm.com/episode/xxx"

# Transcribe existing audio files
casts-down transcribe ./podcasts/episode.mp3
casts-down transcribe ./podcasts/          # entire directory
```

## Usage

### Download (+ Auto Transcribe)

```bash
casts-down <URL> [OPTIONS]
```

| Option | Short | Description | Default |
|--------|-------|-------------|---------|
| `--all` | `-a` | Download all episodes | latest 1 |
| `--latest N` | `-l N` | Download latest N episodes | 1 |
| `--output DIR` | `-o DIR` | Output directory | `./podcasts` |
| `--concurrent N` | `-c N` | Parallel downloads | 3 |
| `--skip-existing` | `-s` | Skip already downloaded files | off |
| `--transcribe/--no-transcribe` | `-t` | Transcribe after download | **on** |
| `--model NAME` | `-m` | Whisper model for transcription | `small` |

### Transcribe

```bash
casts-down transcribe <FILE>... [OPTIONS]
```

Transcribe audio files or directories. Outputs `.srt` (subtitle) and `.txt` (timestamped text) alongside each audio file.

| Option | Short | Description | Default |
|--------|-------|-------------|---------|
| `--model NAME` | `-m` | Whisper model (`tiny`, `base`, `small`, `medium`, `large-v3`) | `small` |
| `--language CODE` | | Language code (`zh`, `en`, etc.) | auto-detect |
| `--skip-transcribed` | | Skip files already transcribed | on |
| `--overwrite` | | Force re-transcribe existing outputs | off |

### Setup (Optional)

```bash
casts-down setup-transcribe
```

Pre-downloads the Whisper model so the first transcription has zero wait. Also installs mlx-whisper on Mac Apple Silicon for Metal GPU acceleration.

| Platform | Engine | Acceleration |
|----------|--------|-------------|
| macOS Apple Silicon | mlx-whisper + faster-whisper | Metal GPU |
| macOS Intel | faster-whisper | CPU |
| Linux + NVIDIA | faster-whisper | CUDA |
| Linux (no GPU) | faster-whisper | CPU |

## Platform Support

### Fully Supported

**Apple Podcasts**
- [x] Podcast homepage (download all or latest N episodes)
- [x] Single episode links (smart matching and download)
- [x] Automatic RSS extraction via iTunes API

**Xiaoyuzhou / 小宇宙**
- [x] Single episode links
- [x] Podcast links (first 15 episodes)
- [ ] Full podcast list (requires additional reverse engineering)

**RSS Feeds**
- [x] Standard RSS 2.0 podcast feeds (most reliable method)

### Not Supported

**Pocket Casts** - Client application, does not host audio files. Use the original podcast RSS feed instead.

## Output Example

```
podcasts/
  My Podcast - Episode 1.mp3
  My Podcast - Episode 1.srt     # SRT subtitle (00:01:23,456 --> 00:01:27,890)
  My Podcast - Episode 1.txt     # [00:01:23] Timestamped plain text
```

## Examples

### Download NPR's "Up First" podcast

```bash
casts-down "https://feeds.npr.org/510318/podcast.xml" --latest 3
```

### Download from Apple Podcasts

```bash
casts-down "https://podcasts.apple.com/us/podcast/the-daily/id1200361736" --all
```

### Download only (no transcription)

```bash
casts-down "https://feeds.example.com/podcast.rss" --latest 5 --no-transcribe
```

### Batch download with skip existing

```bash
casts-down "https://feeds.example.com/podcast.rss" --all -o ./downloads --skip-existing
```

### Transcribe a directory of audio files

```bash
casts-down transcribe ./podcasts/ --model medium --language zh
```

## Technical Stack

| Component | Technology |
|-----------|-----------|
| Language | Python 3.10+ |
| CLI Framework | click |
| HTTP Client | aiohttp (async concurrent) |
| RSS Parsing | feedparser |
| HTML Parsing | BeautifulSoup4 |
| Progress Display | tqdm |
| ASR Engine | faster-whisper (built-in) / mlx-whisper (optional Metal) |

## Notes

> **Important considerations:**
> 1. **RSS Feed Expiration** - Some feeds may require authentication or contain expired URLs
> 2. **Audio URL Validity** - Some audio URLs contain time-limited tokens that may expire
> 3. **Rate Limiting** - Frequent requests may trigger platform restrictions
> 4. **Copyright** - Ensure all downloads are for personal use only
> 5. **Model Download** - First transcription auto-downloads the Whisper model (~466 MB for `small`). Run `casts-down setup-transcribe` to pre-download.

## Troubleshooting

### Cannot extract Apple Podcasts RSS

- Ensure URL format is correct (must contain podcast ID, e.g. `/id1234567`)
- Check network connection
- Try using the RSS feed URL directly if available

### Download timeout

- Reduce concurrency: `--concurrent 1`
- Check network connection and proxy settings
- Some servers may have rate limiting

### Transcription fails

- Try a smaller model: `--model base` or `--model tiny`
- Check available disk space (models are 75MB - 3GB)
- For Chinese content, specify language: `--language zh`
- On Mac Apple Silicon, install Metal support: `pip install "casts_down[metal]"`

### Abnormal file names

- Tool automatically cleans illegal characters from filenames
- If issues persist, please submit an [Issue](https://github.com/clemente0731/casts_down/issues)

## Quick Test

```bash
# Test download + transcription
casts-down "https://feeds.npr.org/510318/podcast.xml" --latest 1

# Test download only
casts-down "https://podcasts.apple.com/us/podcast/the-daily/id1200361736" --latest 1 --no-transcribe

# Test standalone transcription
casts-down transcribe ./podcasts/episode.mp3 --model tiny
```

## License

MIT License. Copyright (c) 2024 Casts Down Contributors.

## Contributing

Contributions are welcome! Please submit [Issues](https://github.com/clemente0731/casts_down/issues) and Pull Requests.

---

```
Made with <3 by open source contributors
```
