Metadata-Version: 2.4
Name: MalwareClassifier
Version: 0.1.0
Summary: A malware classifier template with built-in logging.
Author-email: cchunhuang <cchunhuang147@gmail.com>
License: MIT License
        
        Copyright (c) 2025 cchunhuang
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/cchunhuang/MalwareClassifier
Project-URL: Issues, https://github.com/cchunhuang/MalwareClassifier/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# MalwareClassifier

MalwareClassifier is a Python package that provides a **template** for building a malware classification system with a **built-in logging system** and **configurable settings**.
It is designed to be **modular**, **extensible**, and **easy to install** using `pip` or `conda`.

---

## Table of Contents

* [Features](#features)
* [Project Structure](#project-structure)
* [Installation](#installation)
* [Quick Start](#quick-start)
* [Configuration (`config.json`)](#configuration-configjson)
* [Logging Usage](#logging-usage)
* [Development Guide](#development-guide)
* [Publishing (PyPI / conda-forge)](#publishing-pypi--conda-forge)
* [License](#license)

---

## Features

* **Malware Classification Template**
  A structured skeleton for implementing training and prediction workflows.
* **Built-in Logging**
  Provides flexible logging with console and file outputs, automatic timestamped log filenames, and customizable log directories.
* **Centralized Configuration**
  Uses `config.json` to define dataset paths, output folders, and caching behavior.
* **Packaging Ready**
  Supports `pip install -e .` for development mode and is prepared for publishing to PyPI and conda-forge.

---

## Project Structure

```text
MalwareClassifier/
├─ pyproject.toml
├─ LICENSE
├─ README.md
├─ requirements.txt
└─ src/
   └─ MalwareClassifier/
      ├─ __init__.py
      ├─ Logging.py
      ├─ MalwareClassifier.py
      ├─ config.json
      └─ config_loader.py
```

> * The package name is **`MalwareClassifier`** (case-sensitive).
> * `config.json` defines default dataset paths, output directories, and behavior flags.

---

## Installation

> It is recommended to use a virtual environment (`conda` or `venv`).

### Option A: Development mode (recommended)

```bash
# Clone the repository
git clone https://github.com/yourname/MalwareClassifier.git
cd MalwareClassifier

# Install in editable mode
pip install -e .

# Install additional dependencies if needed
pip install -r requirements.txt
```

### Option B: Standard installation

```bash
pip install .
```

> **Note:**
>
> * Check `requirements.txt` for additional dependencies.
> * Example: `python-box==7.3.2` (imported as `from box import Box`).

---

## Quick Start

### 1) Minimal example (using the default `config.json`)

```python
import MalwareClassifier as MC

# Initialize logging (by default, logs are written to ./output/log/)
MC.setup_logging()

# Create the classifier instance
clf = MC.MalwareClassifier(config_path="./config.json")

# Typical workflow (override these methods in subclasses if needed)
clf.feature()                 # Feature extraction
clf.vectorize()               # Feature vectorization
clf.model(action="train")     # Train or load model
clf.predict()                 # Run predictions
```

> The `MalwareClassifier` class in `MalwareClassifier.py` defines the **workflow skeleton**.
> Subclass it to override `feature()`, `vectorize()`, `model()`, and `predict()`.

### 2) Specify a custom log directory

```python
MC.setup_logging(log_dir="./output/log")
```

Or via environment variables (lower priority than the function argument):

```bash
# Linux/macOS
export MALCLASS_LOG_DIR=./output/log

# Windows PowerShell
$env:MALCLASS_LOG_DIR="./output/log"
```

---

## Configuration (`config.json`)

The package includes a default `config.json`:

```json
{
  "file": {
    "label": "./dataset/label.csv"
  },
  "folder": {
    "log": "./output/log/",
    "dataset": "./dataset/",
    "feature": "./output/feature/",
    "vectorize": "./output/vectorize/",
    "model": "./output/model/",
    "predict": "./output/predict/"
  },
  "parameter": {
    "feature": { "save": true, "load": false },
    "vectorize": { "save": true, "load": false },
    "model": { "save": true, "load": false },
    "predict": { "save": true, "load": false }
  }
}
```

* **folder** → Defines output directories for logs, models, features, etc.
* **parameter.**\* → Flags to control whether intermediate results are saved or loaded.
* You can provide your own `config.json` when creating a classifier:

```python
clf = MC.MalwareClassifier(config_path="./my_config.json")
```

---

## Logging Usage

The logging system is defined in `src/MalwareClassifier/Logging.py`.

### Available functions

* `setup_logging(config=None, reset_handlers=True, log_dir=None)`
  Initialize logging with optional config overrides.
* `get_logger(name=None)`
  Retrieve a logger for any module.

### Default behavior

* Logs are written both to **console** and **file**.
* Log files are automatically named as:
  `malware_classifier-YYYYMMDD-HHMMSS.log`
* The log directory can be configured:

  1. Using the `log_dir` argument in `setup_logging`
  2. Using the environment variable `MALCLASS_LOG_DIR`
  3. Defaults to `./output/log/`

### Environment variables

| Variable                 | Description                                  | Example         |
| ------------------------ | -------------------------------------------- | --------------- |
| `MALCLASS_LOG_LEVEL`     | Set log level                                | `DEBUG`, `INFO` |
| `MALCLASS_LOG_FILE`      | Full path for the log file                   | `/tmp/log.txt`  |
| `MALCLASS_LOG_DIR`       | Directory for log files                      | `./output/log`  |
| `MALCLASS_LOG_FORMATTER` | Choose formatter: `basic`, `verbose`, `json` | `verbose`       |

**Note:** JSON logging requires installing [`python-json-logger`](https://pypi.org/project/python-json-logger/).

### Example usage in modules

```python
from MalwareClassifier import get_logger

logger = get_logger(__name__)
logger.info("This is an info message")
logger.debug("This is a debug message")
```

---

## Development Guide

### Setup development environment

```bash
# Create a virtual environment
conda create -n malclass python=3.10
conda activate malclass

# Install the package in editable mode
pip install -e .
pip install -r requirements.txt
```

### Verify installation

```bash
python -c "import MalwareClassifier as MC; MC.setup_logging(); print(MC.__version__)"
```

### Clean build artifacts

```bash
rm -rf build/ dist/ *.egg-info
```

---

## Publishing (PyPI / conda-forge)

### Publish to PyPI

```bash
pip install build twine
python -m build
twine upload dist/*
```

### Publish to conda-forge (summary)

1. Publish to PyPI first.
2. Submit a PR to [conda-forge/staged-recipes](https://github.com/conda-forge/staged-recipes) with a `meta.yaml`.
3. Once merged, conda-forge will handle future version tracking automatically.

---

## License

This project is licensed under the terms of the [MIT License](LICENSE).

---

## Contact

* **Homepage:** [https://github.com/yourname/MalwareClassifier](https://github.com/yourname/MalwareClassifier)
* **Issues:** [https://github.com/yourname/MalwareClassifier/issues](https://github.com/yourname/MalwareClassifier/issues)
