Metadata-Version: 2.4
Name: eqpredict
Version: 0.1.0
Summary: Research-grade probabilistic earthquake nowcasting and forecasting
License-Expression: AGPL-3.0-or-later
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: scipy>=1.11
Requires-Dist: tqdm>=4.66
Requires-Dist: requests>=2.31
Requires-Dist: joblib>=1.3
Requires-Dist: scikit-learn>=1.3
Requires-Dist: matplotlib>=3.8
Requires-Dist: skyfield>=1.49
Provides-Extra: tectonics
Requires-Dist: shapely>=2.0; extra == "tectonics"
Requires-Dist: pyproj>=3.6; extra == "tectonics"
Requires-Dist: geopandas>=0.14; extra == "tectonics"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Dynamic: license-file

# eqpredict — Research-grade global earthquake **probabilistic** nowcasting

> **Important scientific caveat**  
> As of 2025-10-13, no agency or research group can *predict* the exact time, place, and magnitude of a future large earthquake. This repository provides an **exploratory, research-grade** pipeline for *probabilistic nowcasting/forecasting* of earthquake occurrence rates using historical seismicity plus optional solar and planetary features. **Do not** use outputs for public alerts, emergency decisions, or safety-critical applications. See the sources linked below and `DISCLAIMER.md`.

---

## What this project does

- Downloads global **historical earthquakes** (USGS ComCat by default), with optional ISC‐GEM and Global CMT catalogs for robustness.
- Builds a global spatio‑temporal grid (default 1° × 1°, 7‑day time steps) and constructs features:
  - classical seismicity features (recent counts, Gutenberg–Richter summaries),
  - simple **ETAS-like** cluster intensity proxy,
  - optional **tectonic context** (distance to plate boundaries, active faults; requires GeoPandas),
  - **solar/space‑weather** features (sunspot number, F10.7 radio flux),
  - **Sun–Moon–and planets** geometry features using NASA JPL ephemerides via `skyfield` (lunar phase, Earth–Moon/Sun distances, crude "alignment index").  
- Trains baseline probabilistic models (logistic regression, gradient boosting) to forecast the **probability that ≥1 event with M ≥ M0 occurs in the next Δt** in each grid cell.
- Evaluates with CSEP‑style metrics (log‑likelihood, Brier score, information gain vs. uniform) and rare‑event metrics (ROC‑AUC, PR‑AUC).

This is intended for **hypothesis testing** (e.g., "do solar or planetary features add information *beyond* recent seismicity?") — not for operational prediction.

---

## Installation

### Prerequisites

- Python 3.9+ (tested on 3.11)
- pip or conda for package management

### Option 1: pip (virtualenv)

```bash
# Clone the repository
git clone https://github.com/yourusername/eqpredict.git
cd eqpredict

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install in development mode (recommended)
pip install -e .

# For tests and tooling
pip install -e .[dev]

# Optional tectonics/geospatial features
pip install -e .[tectonics]
# or: pip install -r requirements-tectonics.txt
```

### Option 2: Conda (recommended for GeoPandas)

GeoPandas can be tricky to install with pip on some systems. Conda handles the spatial dependencies more reliably:

```bash
# Create conda environment
conda create -n eqpredict -c conda-forge python=3.11 \
    geopandas shapely pyproj scikit-learn numpy pandas scipy \
    requests skyfield tqdm matplotlib pytest

conda activate eqpredict

# Clone and install
git clone https://github.com/yourusername/eqpredict.git
cd eqpredict
pip install -e .
```

---

## Quickstart

```bash
# 1) Activate your environment
source .venv/bin/activate  # or: conda activate eqpredict

# 2) Download data (defaults: 2010..2024, global, M>=5.0)
# Note: date-only --end is treated as inclusive.
python scripts/download_usgs.py --start 2010-01-01 --end 2024-12-31 --minmag 5.0

# Optional: sunspots + daily F10.7
python scripts/download_solar.py --start 2009-01-01 --end 2025-12-31

# 3) Build grid dataset (1° grid, 7-day steps, target=M>=5.0)
python scripts/build_dataset.py --grid_deg 1.0 --timestep_days 7 --minmag 5.0

# 4) Train a model
python scripts/train.py --model gradient_boosting --n_estimators 300

# 5) Evaluate (CSEP-like and rare-event metrics)
python scripts/evaluate.py

# 6) Produce a probabilistic forecast map for the next week (illustrative)
python scripts/predict.py
```

Outputs are stored under `artifacts/` (trained models, metrics, figures, and per‑cell probability grids as CSV).

Planetary features prefer the packaged `src/eqpredict/data/de440s.bsp` file (if present). If it is missing, Skyfield will download the ephemeris to `~/.skyfield`.

---

## Project Structure

```
eqpredict/
├── README.md              # This file
├── DISCLAIMER.md          # Scientific/ethical disclaimers
├── DATA_SOURCES.md        # Data sources, licenses, APIs
├── LICENSE                # GNU AGPL-3.0-or-later
├── NOTICE                 # Attribution notice for derivatives
├── requirements.txt       # Python dependencies
├── pytest.ini             # pytest configuration
│
├── src/eqpredict/         # Main package
│   ├── __init__.py
│   ├── data/              # Data fetching modules
│   │   ├── usgs.py        # USGS earthquake catalog
│   │   ├── solar.py       # Solar/space weather data
│   │   └── planetary.py   # Planetary geometry features
│   ├── features/          # Feature engineering
│   │   └── build.py       # Grid creation, ETAS, dataset building
│   ├── models/            # ML models
│   │   ├── ml.py          # Gradient boosting, logistic regression
│   │   └── baselines.py   # Uniform rate baseline
│   ├── evaluation/        # Evaluation metrics
│   │   └── csep.py        # CSEP-style information gain
│   └── utils/             # Utilities
│       ├── io.py          # File I/O
│       └── logger.py      # Structured logging
│
├── scripts/               # CLI entry points
│   ├── download_usgs.py   # Download earthquake data
│   ├── download_solar.py  # Download solar data
│   ├── build_dataset.py   # Build training dataset
│   ├── train.py           # Train models
│   ├── evaluate.py        # Evaluate model performance
│   └── predict.py         # Generate forecasts
│
├── tests/                 # Unit tests
│   ├── test_features.py
│   ├── test_models.py
│   ├── test_evaluation.py
│   └── test_utils.py
│
└── artifacts/             # Created on first run
    ├── data/              # Downloaded and processed data
    ├── models/            # Trained models (.joblib)
    ├── metrics/           # Evaluation metrics (.json)
    └── figures/           # Generated plots
```

---

## Running Tests

```bash
# Install test dependencies
pip install -e .[dev]

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_features.py -v
```

---

## Data Sources

See `DATA_SOURCES.md` for complete information. Key sources:

- **USGS Earthquake Catalog (ComCat, FDSN event service)** — global historical earthquake parameters
- **ISC‑GEM Global Instrumental Earthquake Catalogue** (optionally merged)
- **Global CMT** (moment tensors for M≥~5 since 1976; optional)
- **Plate boundaries (PB2002)** and **GEM Global Active Faults** (optional tectonic context)
- **Solar & geomagnetic**: SILSO daily sunspot number; NOAA SWPC F10.7; GFZ/NOAA Kp index
- **Planetary geometry**: NASA JPL DE ephemerides via `skyfield`

> ⚠️ **Licenses** vary by dataset (e.g., ISC‑GEM is CC‑BY‑SA, GFZ Kp is CC‑BY‑4.0). Review `DATA_SOURCES.md` and each dataset's terms before redistribution.

---

## Contributing

Contributions are welcome! Here's how to get started:

### Development Setup

```bash
# Fork and clone the repository
git clone https://github.com/yourusername/eqpredict.git
cd eqpredict

# Create development environment
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -e .[dev]

# Install development dependencies
pip install pytest pytest-cov black isort mypy
```

### Code Style

- Format code with `black` and `isort`
- Add type hints to all functions
- Write docstrings for public functions
- Keep line length under 100 characters

### Submitting Changes

1. Create a feature branch: `git checkout -b feature/your-feature`
2. Make your changes with clear commit messages
3. Run tests: `pytest tests/ -v`
4. Push and open a pull request

### Reporting Issues

Please include:
- Python version and OS
- Steps to reproduce
- Expected vs actual behavior
- Error messages/tracebacks

---

## Security note on model loading

The CLI scripts `scripts/evaluate.py` and `scripts/predict.py` load models via `joblib`, which uses Python pickle under the hood. Only load models you created yourself or otherwise fully trust.

---

## Scientific/Ethical Notes

- The **USGS** states plainly that earthquakes **cannot be predicted**; only **probabilities** can be estimated over windows of time and space. This project therefore focuses on **probabilistic** outputs and CSEP‑style evaluation.
- Evidence for **lunar/solar tidal triggering** exists in specific contexts, but effects are small and not reliably predictive. Evidence for **planetary alignments** beyond Sun/Moon is weak or disputed. Treat those features as exploratory; expect them to contribute little beyond recent seismicity.
- Always defer to official sources (USGS, EMSC, JMA, GFZ) for situational awareness, and never use these outputs for safety decisions.

See `DISCLAIMER.md` for more detail and citations.

---

## Reproducibility

- Deterministic seeds are used where applicable
- All preprocessing logic lives under `src/eqpredict/` and is covered by unit tests
- The pipeline uses only public endpoints and avoids fragile scrapers

---

## License

GNU AGPL-3.0-or-later. See `LICENSE` for the full license text and `NOTICE` for
the project attribution notice that derivatives must preserve.

---

## Changelog

### 2026-01-02

- Added comprehensive unit tests (44 tests)
- Implemented structured logging system
- Added input validation to all CLI scripts
- Vectorized ETAS intensity calculation for performance
- Added memory-efficient chunked dataset building
- Added retry logic with exponential backoff to solar fetchers
- Added planetary feature caching
- Added comprehensive type hints
- Added model validation for secure loading
- Improved documentation

---

## Citations (selected)

- Ogata, Y. "Space–time ETAS models…" (see `DATA_SOURCES.md` for links)  
- CSEP testing framework (pyCSEP)  
- USGS FAQs on earthquake prediction; SILSO; NOAA SWPC; GFZ Kp; JPL DE ephemerides / Skyfield

---
