Metadata-Version: 2.4
Name: CGMissingData
Version: 0.1.7
Summary: MICE + ARIMA + XGBoost to handle missing values of CGM device
Author: HS Shad, Shubh Saraswat, Dr. Xiaohua Douglas Zhang
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: scikit-learn
Requires-Dist: statsmodels
Requires-Dist: xgboost
Requires-Dist: rpy2

# CGMissingData

CGMissingData is a simple missing-data benchmarking package that runs:

MICE imputation (IterativeImputer)

ARIMA

XGBoost




Your CSV must include at least these columns:

Glucose value (target) = glucose_col

TimeSeries = time series data

Timestamp data= timestamp_col 

subject ID = subjectid

How to Run?
1. Envirionment Setup- 
cd "C:\Path\To\Your\Project"

Create a virtual environment
python -m venv .venv

Activate the environment
.\.venv\Scripts\activate

2. Install python
python -m pip install --upgrade pip
pip install -e .

3. .\.venv\Scripts\python.exe -c "from CGMmissingData import run_missingness_benchmark; r=run_missingness_benchmark('MyData.csv', mask_rates=[0.05, 0.10, 0.20, 0.30, 0.40]); print(r); r.to_csv('results.csv', index=False)"
#Ensure your dataset (e.g., MyData.csv) is located in the project file. Execute the benchmark directly from the CLI to generate a results.csv file:


Using Google Colab?
1. !pip -q install CGMissingData==0.1.2 (change the version number depending our new release. You can also try with !pip -q install CGMissingData)
2. from CGMissingData import run_missingness_benchmark
3. df = "/content/drive/MyDrive/CGMExampleData.csv"  # your dataset path
4. results = run_missingness_benchmark(
    "CGMExampleData.csv",  # or df
    mask_rates=[0.05, 0.10, 0.20, 0.30, 0.40]
)

5. print(results)
6. results.to_csv("results.csv", index=False)



# cgmmissing

CGM missing value imputation pipeline:

1) Convert `timestamp` to a numeric equal-interval `TimeSeries` using `CGManalyzer::equalInterval.fn` via `rpy2`
2) Add lag features per `subjectid`: `lag1`, `lag2`, `lag3`, `rollmean`
3) Compute missing rate on `glucose_value`
4) If missing rate <= 5%: MICE + ARIMA (segmentwise over missing blocks)
   Else: MICE + XGBoost
5) Output original dataset plus `imputed_glucose_value`

## Install
```bash
pip install -e .

