Metadata-Version: 2.4
Name: libbee
Version: 0.0.7
Summary: US (American) public-library funding, usage, and operations + community civics data (1992–2023) — integrated from IMLS, Census, and HUD into one tidy `facts` table.
Project-URL: Homepage, https://github.com/postphotos/libbee
Project-URL: Documentation, https://github.com/postphotos/libbee#readme
Project-URL: Repository, https://github.com/postphotos/libbee
Project-URL: Source Code, https://github.com/postphotos/libbee/tree/main/libbee
Project-URL: Bug Tracker, https://github.com/postphotos/libbee/issues
Project-URL: Changelog, https://github.com/postphotos/libbee/releases
Project-URL: Discussions, https://github.com/postphotos/libbee/discussions
Author-email: Leo Postovoit <leo@leopostovoit.com>
Maintainer-email: Leo Postovoit <leo@leopostovoit.com>
License-Expression: MIT
License-File: LICENSE
Keywords: HUD,IMLS,budget-allocation,census,civic-data,civic-tech,data-product,homelessness,open-data,polars,policy-analysis,public-libraries,resource-allocation
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Office/Business
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Sociology
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: adbc-driver-sqlite>=1.1.0
Requires-Dist: fastexcel>=0.10.0
Requires-Dist: lxml>=5.0.0
Requires-Dist: numpy>=2.0
Requires-Dist: pandas>=2.2.0
Requires-Dist: polars>=1.41.2
Requires-Dist: pyarrow>=24.0.0
Requires-Dist: pydantic-settings>=2.0
Requires-Dist: pydantic>=2.0
Requires-Dist: requests>=2.32
Requires-Dist: ruff>=0.15.15
Provides-Extra: all
Requires-Dist: anywidget>=0.11.0; extra == 'all'
Requires-Dist: duckdb>=1.1; extra == 'all'
Requires-Dist: marimo>=0.23.8; extra == 'all'
Requires-Dist: nbformat>=5.10.4; extra == 'all'
Requires-Dist: scikit-learn>=1.5; extra == 'all'
Requires-Dist: scipy>=1.13.0; extra == 'all'
Requires-Dist: shap>=0.50; extra == 'all'
Requires-Dist: statsmodels>=0.14.0; extra == 'all'
Provides-Extra: analysis
Requires-Dist: scikit-learn>=1.5; extra == 'analysis'
Requires-Dist: scipy>=1.13.0; extra == 'analysis'
Requires-Dist: shap>=0.50; extra == 'analysis'
Requires-Dist: statsmodels>=0.14.0; extra == 'analysis'
Provides-Extra: dev
Requires-Dist: altair>=5.0.0; extra == 'dev'
Requires-Dist: anywidget>=0.11.0; extra == 'dev'
Requires-Dist: duckdb>=1.1; extra == 'dev'
Requires-Dist: marimo>=0.23.8; extra == 'dev'
Requires-Dist: nbformat>=5.10.4; extra == 'dev'
Requires-Dist: pyright>=1.1.0; extra == 'dev'
Requires-Dist: pytest-cov>=6.0.0; extra == 'dev'
Requires-Dist: pytest>=8.3.0; extra == 'dev'
Requires-Dist: ruff>=0.9.0; extra == 'dev'
Requires-Dist: scikit-learn>=1.5; extra == 'dev'
Requires-Dist: scipy>=1.13.0; extra == 'dev'
Requires-Dist: statsmodels>=0.14.0; extra == 'dev'
Provides-Extra: duckdb
Requires-Dist: duckdb>=1.1; extra == 'duckdb'
Provides-Extra: notebook
Requires-Dist: anywidget>=0.11.0; extra == 'notebook'
Requires-Dist: docstring-to-markdown>=0.13; extra == 'notebook'
Requires-Dist: marimo>=0.23.8; extra == 'notebook'
Requires-Dist: nbformat>=5.10.4; extra == 'notebook'
Description-Content-Type: text/markdown

# 🐝 libbee

[![PyPI Version](https://img.shields.io/pypi/v/libbee.svg)](https://pypi.org/project/libbee/)
[![Python Version](https://img.shields.io/pypi/pyversions/libbee.svg)](https://pypi.org/project/libbee/)
[![License](https://img.shields.io/pypi/l/libbee.svg)](https://github.com/postphotos/libbee/blob/master/LICENSE)
[![Ruff Style](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Test Coverage](https://img.shields.io/badge/coverage-100%25-green.svg)](https://github.com/postphotos/libbee)

An unofficial Python package for US public library and community civics data.

`libbee` consolidates over 30 years of public data into a single, unified, cleanly structured `facts` table. It pulls from four primary data sources:
* **IMLS (Institute of Museum and Library Services)**: The authoritative Public Libraries Survey (PLS) tracking funding, visits, staff, and circulation for every library system in America (1992–2023).
* **US Census Bureau (ACS)**: The American Community Survey 5-Year estimates, supplying cross-sectional community need metrics like median household income, poverty rates, and broadband access.
* **HUD (Dept. of Housing and Urban Development)**: Point-in-Time (PIT) estimates and Continuum of Care (CoC) funding awards, providing local homelessness statistics.
* **California State Library (LibPAS)**: Highly detailed annual library statistics and budget allocations specifically for California library systems.

## Why does this exist?
Working directly with the raw source data for public libraries is notoriously painful. If you try to do this from scratch, you have to deal with:
* **Shifting schemas:** Over 32 years, column names shift constantly (e.g., `HOURS` to `OP_HRS` to `HRS_OPEN`).
* **Silent data corruption:** Missing fields use arbitrary negative numeric codes (`-1`, `-3`, `-9`) instead of standard nulls. If loaded directly, these ruin your sums and averages.
* **Geospatial nightmares:** Joining library panel data against Census or HUD datasets requires cross-walking mismatched boundaries.
* **Formatting mess:** The raw data is scattered across 100+ MB of ASCII, Windows-1252, and UTF-8 encoded ZIPs, DBFs, and binary Excel files.

`libbee` automates the downloading, cleaning, and conforming. It maps the entities, fixes the nulls, normalizes the encodings, and packs everything into fast, columnar Parquet tables (via Polars/PyArrow) that take up about 11 MB on disk. (Note: the raw data assets may be upwards of 200MB total.)

## Install

```bash
pip install libbee                 # Core data loading
pip install "libbee[analysis]"     # Adds scikit-learn, statsmodels, shap
pip install "libbee[duckdb]"       # Adds DuckDB SQL engine
pip install "libbee[notebook]"     # Adds interactive marimo dashboard
pip install "libbee[all]"          # Everything
```
*Note: Python ≥ 3.11 is required. The package ships code-only; data is built and cached locally.*

## Quickstart

Because we don't ship the raw data in the wheel, you need to build the local cache once:

```bash
# Downloads, cleans, and caches everything (takes a minute or two)
libbee build
```

Then, use it in Python:

```python
import libbee
import polars as pl

# 1. Load the unified facts table (720k+ rows, conformed to one schema)
df_facts = libbee.facts()

# 2. Or query a specific conformed frame
df_equity = libbee.load("county_equity")

# 3. Use lazy evaluation for fast filtered queries
lf = libbee.scan("facts")
df_summary = (
    lf.filter(
        (pl.col("geo_level") == "state") &
        (pl.col("metric") == "visits_pc") &
        (pl.col("year") == 2019)
    )
    .sort("value", descending=True)
    .collect()
)
```

## Documentation & Advanced Usage
Looking for the Data Dictionary, DuckDB integration, inflation adjustments, or analysis examples (like Difference-in-Differences models)?

👉 **[Read the Docs](DOCS.md)**

## Citation & License
When using these conformed datasets in publications, please cite the primary publishing agencies: IMLS (Public Libraries Survey), US Census Bureau (ACS), HUD (PIT/CoC), and the CA State Library.

Licensed under the MIT License.
