Metadata-Version: 2.3
Name: Sp00kyVectors
Version: 0.1.15
Summary: A spooky vector analysis library
License: MIT
Author: Lila James
Author-email: lilaresearch@gmail.com
Requires-Python: >=3.10,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: appnope (==0.1.4)
Requires-Dist: asttokens (==3.0.0)
Requires-Dist: certifi (==2025.4.26)
Requires-Dist: charset-normalizer (==3.4.2)
Requires-Dist: comm (==0.2.2)
Requires-Dist: contourpy (==1.3.2)
Requires-Dist: cycler (==0.12.1)
Requires-Dist: debugpy (==1.8.14)
Requires-Dist: decorator (==5.2.1)
Requires-Dist: docutils (==0.21.2)
Requires-Dist: executing (==2.2.0)
Requires-Dist: fonttools (==4.58.0)
Requires-Dist: id (==1.5.0)
Requires-Dist: idna (==3.10)
Requires-Dist: iniconfig (==2.1.0)
Requires-Dist: ipykernel (==6.29.5)
Requires-Dist: ipython (==9.2.0)
Requires-Dist: ipython-pygments-lexers (==1.1.1)
Requires-Dist: jaraco-classes (==3.4.0)
Requires-Dist: jaraco-context (==6.0.1)
Requires-Dist: jaraco-functools (==4.1.0)
Requires-Dist: jedi (==0.19.2)
Requires-Dist: joblib (==1.5.0)
Requires-Dist: jupyter-client (==8.6.3)
Requires-Dist: jupyter-core (==5.7.2)
Requires-Dist: keyring (==25.6.0)
Requires-Dist: kiwisolver (==1.4.8)
Requires-Dist: markdown-it-py (==3.0.0)
Requires-Dist: matplotlib (==3.10.1)
Requires-Dist: matplotlib-inline (==0.1.7)
Requires-Dist: mdurl (==0.1.2)
Requires-Dist: more-itertools (==10.7.0)
Requires-Dist: nest-asyncio (==1.6.0)
Requires-Dist: nh3 (==0.2.21)
Requires-Dist: numpy (==1.26.4)
Requires-Dist: opencv-python (>=4.9.0.80,<5.0.0.0)
Requires-Dist: packaging (==25.0)
Requires-Dist: pandas (==2.2.3)
Requires-Dist: parso (==0.8.4)
Requires-Dist: pexpect (==4.9.0)
Requires-Dist: pillow (==11.2.1)
Requires-Dist: platformdirs (==4.3.8)
Requires-Dist: pluggy (==1.6.0)
Requires-Dist: prompt-toolkit (==3.0.51)
Requires-Dist: psutil (==7.0.0)
Requires-Dist: ptyprocess (==0.7.0)
Requires-Dist: pure-eval (==0.2.3)
Requires-Dist: pygments (==2.19.1)
Requires-Dist: pyparsing (==3.2.3)
Requires-Dist: pytest (==8.3.5)
Requires-Dist: python-dateutil (==2.9.0.post0)
Requires-Dist: pytz (==2025.2)
Requires-Dist: pyzmq (==26.4.0)
Requires-Dist: readme-renderer (==44.0)
Requires-Dist: requests (==2.32.3)
Requires-Dist: requests-toolbelt (==1.0.0)
Requires-Dist: rfc3986 (==2.0.0)
Requires-Dist: rich (==14.0.0)
Requires-Dist: scikit-learn (==1.6.1)
Requires-Dist: scipy (==1.15.2)
Requires-Dist: six (==1.17.0)
Requires-Dist: stack-data (==0.6.3)
Requires-Dist: threadpoolctl (==3.6.0)
Requires-Dist: torch (>=2.3.0,<3.0.0)
Requires-Dist: tornado (==6.5)
Requires-Dist: tqdm (==4.67.1)
Requires-Dist: traitlets (==5.14.3)
Requires-Dist: twine (==6.1.0)
Requires-Dist: tzdata (==2025.2)
Requires-Dist: urllib3 (==2.4.0)
Requires-Dist: wcwidth (==0.2.13)
Description-Content-Type: text/markdown

# Sp00kyVectors: Vector Analysis Wrapper for Python

Welcome to **Sp00kyVectors**, the software powering your Tricorder. 🛸

These eerily intuitive Python modules work seamlessly as one toolkit for:

- 🧲 **Data ingestion**
- 🧼 **Cleaning**
- 🧮 **Vector analysis**
- 📊 **Statistical computation**
- 🧠 **Bespoke neural net creation**
- 🌌 **Visualizations** 🪄👻

Perfect for any away missions 🖖

> 100% open-source and always summoning new engineers to help!

## 🧼 Analysis Examples

**on-the-go data manipulation** across space, time, and spreadsheets:

| Before | After |
|--------|-------|
| ![Before Cleaning](https://github.com/LilaShiba/sp00kyvectors/raw/main/imgs/temp_before_clean.png) | ![After Cleaning](https://github.com/LilaShiba/sp00kyvectors/raw/main/imgs/temp_after_clean.png) |
| ![Before Bin](https://github.com/LilaShiba/sp00kyvectors/raw/main/imgs/beforebin.png) | ![After Bin](https://github.com/LilaShiba/sp00kyvectors/raw/main/imgs/afterbin.png) |
| ![Vector Projections](https://github.com/LilaShiba/sp00kyvectors/raw/main/imgs/output_add.png) | ![Normalize](https://github.com/LilaShiba/sp00kyvectors/raw/main/imgs/output.png) |

## 🧹 Dirty Data
#### Load without worry
Easily load and align mismatched CSV files-**hello IoT**. This utility intelligently collects, normalizes, and organizes messy datasets — so you can focus on the analysis, not the cleanup. 🚀

``` Vector.load_folder(path) ``` loads a folder of CSV files with potentially mismatched or missing columns,  
aligns all columns based on their headers, and combines them into a single clean DataFrame.  
Missing columns in any file are automatically filled with `NaN` values to maintain consistency.

Perfect for messy datasets where CSVs don't share the exact same structure!

Cleaning is done one layer up with `sp00kyDF.get_clean_df()` ✨🧹

This method returns a cleaned version of the DataFrame by performing the following steps:

1. 🧩 Removes duplicate rows (performed twice to ensure thorough cleaning)  
2. 🚫📊 Clips outlier values based on the Z-score method *(an Interquartile Range (IQR) method is also available)*  
3. 🏷️ Standardizes column names for consistency  
4. ❌🕳️ *(Optionally drops null values — currently commented out)*

Finally, it returns the cleaned DataFrame ready for analysis. 🎯


# 🎛️⚙️✨ sp.Vectors
## 🧠 Features

- 🧮 **Vector Magic**:
  - Load 1D or 2D arrays into `Vector` objects
  - X/Y decomposition for 2D data
  - Linear algebra methods like magnitude, angle, dot, and projection

- 📊 **Statistical Potions**:
  - Mean, median, standard deviation 💀  
  - Probability vectors and PDFs 🧪  
  - Z-score normalization 🧼  
  - Entropy between aligned vectors 🌀  
  - Internal entropy of a vector  

- 🖼️ **Visualizations**:
  - Linear and log-scale histogramming  
  - Vector plots with tails, heads, and haunted trails  
  - Optional "entropy mode" that colors plots based on mysterious disorder 👀  

- 🔧 **Tools of the Craft**:
  - Gaussian kernel smoothing for smoothing out your nightmares  
  - Elementwise operations: `.normalize()`, `.project()`, `.difference()`, and more  
  - Pretty `__repr__` so your print statements conjure elegant summaries


# 📚 Documentation

## 🌙 Pipeline 🔮

This guide shows how to take messy tabular data, purify it with sp.DF, explore it with sp.vector , and train a custom neural network —  using the sp.nn. This package is a wrapper for scientific modules and open-source education project!

## Abstraction
sp.DF sits ontop of pandas, numpy, and matplotlib
sp.NN sit ontop of sp.DF and py.torch 

---

## **1. Imports and Cleaning
<pre><code>
import sp00kyvectors as sp  # ✨ The full spooky toolbox
# Your standard np, pd, and plt cmds work as this wrapper sits on top of them all 

df = sp.df(path_to_messy_csv_folder)
df.drop_nulls(threshold=0.4)       # Drop columns with >40% nulls
df.fill_nulls(strategy='median')   # Fill remaining nulls with median
df.standardize_column_names()      # Lowercase + underscores
df.clip_outliers(z_thresh=3)       # Remove extreme outliers
df_clean = sp.get_clean_df()       # Fully cleaned DataFrame
</code></pre>

---

## **3. Vectorize Columns**
Each numeric column becomes a **Vector** for statistical exploration & visualization. A vector is a numpy array within a pandas dataframe to represent dimensions. Pretty cool. 


Now each column can be **plotted**, **scaled**, **combined**, or **compared** using `Vector` operations which means fast.

---


## 🔮 Phase 2: Custom Neural Network (`NN`) in `sp00kyvectors` 🌙

The `sp.NN` module provides a simple, customizable feed‑forward network with **random activation layers**. It's a py-torch model, with a few peer-reviewed optimization tricks, and easier layer control. Use it to turn your cleaned & vectorized features into predictions.

---

#### __init__ **Arguments**
- **`input_size`** (*int*): Number of input features (dimensionality of your `X`).  
- **`hidden_sizes`** (*List[int]*): Amount and Sizes of each hidden layer, e.g. `[...,64, 32, ...]`.  
- **`output_size`** (*int*): Number of outputs (e.g. `1` for a single regression target).

---

### ✨ Description  
- Stacks `Linear` → *RandomActivation* pairs for each hidden layer.  
- Final `Linear` projects to your desired output size.  
- Random activations chosen per layer from `[ReLU, Tanh, Sigmoid, ELU]`.


---

## **1. Build & Train the Neural Network 🌙**
<pre><code>
model = sp.NN(input_size=X.shape[1], hidden_sizes=[64, 32], output_size=1)
model.train_model(train_loader, epochs=20, lr=0.001)
</code></pre>

---

## **2. Evaluate the Model**
<pre><code>
test_loss = model.test_model(train_loader)
print(f"Test Loss: {test_loss:.4f}")
</code></pre>

## **3. Predict
<pre><code>

model.forward(input)
</code></pre>

---

## 📈 Plotting
Every col in sp.DF is a numpy vector. Represented with v below.

### `.histogram(log=False)`

Plots a histogram of the vector values. Set `log=True` for logarithmic scale.

<pre><code>
v.histogram()
v.histogram(log=True)
</code></pre>

---

### `.plot_vectors(mode="line", entropy=False)`

Plots 2D vectors.

- `mode`: `"line"`, `"arrow"`, or `"trail"`
- `entropy`: if `True`, colorizes vectors by entropy

<pre><code>
v2d.plot_vectors(mode="arrow", entropy=True)
</code></pre>

---

## 🔮 Utilities

### `.gaussian_smooth(sigma=1.0)`

Applies Gaussian smoothing to the vector.

<pre><code>
v_smooth = v.gaussian_smooth(sigma=2.0)
</code></pre>

---

## 💀 Dunder Methods

### `__repr__()`

Pretty string representation.

<pre><code>
print(v)  # Vector(mean=3.0, std=1.58, ...)
</code></pre>

---

## 🛠 Developer Notes

- Internal data is stored as `numpy.ndarray`
- Methods use `scipy.stats`, `numpy`, and `matplotlib`
- Entropy assumes aligned distributions (normalized first)

---

## 🧛 License

MIT — haunt and hack as you please.

---

## 🕸️ Coming Soon

- 3D support
- More spooky plots
- CLI interface: `spookify file.csv --plot`

---

## 👻 Contributing

Spirits and sorcerers of all levels are welcome. Open an issue, fork the repo, or summon a pull request.

---

## 🧛 License

MIT — you’re free to haunt this code as you wish as long as money is never involved! 

---

✨ Stay spooky, and may your vectors always point toward the unknown. 🕸️

# Student Opportunities 🎓💻

Learning to code, using GitHub, or just curious? Reach out and join the team!  
We’re currently looking for volunteers of all skill levels. Everyone’s welcome!

