Metadata-Version: 2.1
Name: EDAExcelReport
Version: 0.1.3
Summary: A package for generating EDA reports
Home-page: https://github.com/rohit180497/EDAExcelReport
Author: Rohit Kosamkar
Author-email: rohitkosamkar97@gmail.com
Keywords: EDA Excel exploratory data analysis report pandas numpy openpyxl machine learning data science data analysis rohit kosamkar EDAExcelReport
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Visualization
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas
Requires-Dist: openpyxl
Requires-Dist: numpy
Requires-Dist: scikit-learn
Requires-Dist: datetime

# EDAExcelReport

EDAExcelReport is a Python package for generating detailed exploratory data analysis (EDA) reports specifically for datasets with binary target variables. The package creates comprehensive EDA reports in Excel format, which include statistics and visualizations in the form of table that help in understanding the distribution and relationship of various features with the target variable.

## Features

- Calculates frequency and distribution of feature values.
- Computes target rate, percentage of total target, and lift for each feature value.
- Automatically handles numeric and categorical data.
- Generates Excel reports with well-formatted tables and conditional formatting.
- Removes gridlines and adds borders for better readability.

## Installation

You can install the package via pip:

```sh
pip install EDAExcelReport
```

```python

# How to import?
from EDAR.excel_report import EDAExcelReport

```


```python
# Import necessary libraries
import pandas as pd
import numpy as np
import os
from EDAR.excel_report import EDAExcelReport

```

```python
# Loading the credit dataset
df = pd.read_csv(r"tests\credit_data.csv")
```

```python
df.columns
```
    Index(['ID', 'CODE_GENDER', 'FLAG_OWN_CAR', 'FLAG_OWN_REALTY', 'CNT_CHILDREN',
           'AMT_INCOME_TOTAL', 'NAME_INCOME_TYPE', 'NAME_EDUCATION_TYPE',
           'NAME_FAMILY_STATUS', 'NAME_HOUSING_TYPE', 'DAYS_BIRTH',
           'DAYS_EMPLOYED', 'FLAG_MOBIL', 'FLAG_WORK_PHONE', 'FLAG_PHONE',
           'FLAG_EMAIL', 'OCCUPATION_TYPE', 'CNT_FAM_MEMBERS', 'target'],
          dtype='object')


```python
df.isna().sum()
```
    ID                         0
    CODE_GENDER                0
    FLAG_OWN_CAR               0
    FLAG_OWN_REALTY            0
    CNT_CHILDREN               0
    AMT_INCOME_TOTAL           0
    NAME_INCOME_TYPE           0
    NAME_EDUCATION_TYPE        0
    NAME_FAMILY_STATUS         0
    NAME_HOUSING_TYPE          0
    DAYS_BIRTH                 0
    DAYS_EMPLOYED              0
    FLAG_MOBIL                 0
    FLAG_WORK_PHONE            0
    FLAG_PHONE                 0
    FLAG_EMAIL                 0
    OCCUPATION_TYPE        11323
    CNT_FAM_MEMBERS            0
    target                     0
    dtype: int64


```python
ignore_feats = ["ID", "OCCUPATION_TYPE", "DAYS_BIRTH", "DAYS_EMPLOYED", "FLAG_MOBIL"]
```

```python
EDAExcelReport(df, 'target',r'tests\test_eda_report.xlsx', ignore_cols= ignore_feats)
```

    Your EDA report is ready at tests\test_eda_report_20240610_153828.xlsx
    
    <ed_report.excel_report.EDAExcelReport at 0x188c09ee9f0>

### Exploratory Data Analysis Excel File for above Credit Data you can download from here: 

[Download Excel File](https://github.com/rohit180497/EDAExcelReport/blob/main/tests/test_eda_report_20240610_153828.xlsx)



