Metadata-Version: 2.4
Name: XER_Technologies_metadata_extractor
Version: 0.3
Summary: Internal data extraction utilities
Author-email: Jakob Wiren <jakob.wiren@xer-tech.com>
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pandas>=2.0.0
Requires-Dist: scipy>=1.11.0
Requires-Dist: boto3>=1.34.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: ruff>=0.0.290; extra == "dev"
Requires-Dist: types-boto3>=1.0.2; extra == "dev"

# XER Technologies Metadata Extractor

A Python package for extracting comprehensive flight metadata from CSV files generated by XER Technologies' flight controllers. The package processes flight telemetry data and extracts key performance metrics, timing information, and system statistics.

## Features

- **Flight Data Processing**: Extracts metadata from flight telemetry CSV files
- **Intelligent Filtering**: All statistics calculated only during actual flight time (`droneInFlight == 1`)
- **Unix Timestamp Support**: Handles Unix timestamps in milliseconds and standard datetime formats
- **Power Calculations**: Calculates PMU power, engine power, and system efficiency
- **Duration Tracking**: Tracks engine runtime, flight time, and total log duration
- **Serial Number Detection**: Automatically finds 3-digit serial numbers in data
- **Robust Validation**: Validates data quality and handles missing columns gracefully

## Installation

```bash
pip install XER_Technologies_metadata_extractor
```

## Quick Start

```python
from XER_Technologies_metadata_extractor import extract_csv_metadata

# Extract metadata from a CSV file
with open("flight_data.csv", "rb") as f:
    metadata = extract_csv_metadata(
        csv_data=f,
        csv_filename="Flight_Test_20240516_084236.csv",
        verbose=False
    )

print(metadata)
```

## Core Function

### `extract_csv_metadata()`

```python
def extract_csv_metadata(
    csv_data: Union[str, BytesIO, Path],
    csv_filename: str,
    verbose: bool = False
) -> Dict[str, Any]
```

**Parameters:**
- `csv_data`: CSV content as string, BytesIO object, or file path
- `csv_filename`: Original filename for metadata extraction
- `verbose`: Enable detailed logging (default: False)

**Returns:** Dictionary containing comprehensive flight metadata

## Metadata Output

The package extracts the following metadata categories:

### Timing Information
- `log_duration`: Total log duration (HH:MM:SS)
- `start_time`: Flight start time (HH:MM:SS)
- `end_time`: Flight end time (HH:MM:SS)
- `flight_date`: Flight date (YYYY-MM-DD)

### Power Data
- `max_pmu_power`: Maximum PMU power in Watts
- `avg_pmu_power`: Average PMU power in Watts
- `max_engine_power`: Maximum engine power in Watts
- `avg_engine_power`: Average engine power in Watts
- `avg_system_efficiency`: Average system efficiency in %

### Generator Data
- `max_rpm`: Maximum generator RPM
- `avg_rpm`: Average generator RPM
- `total_engine_hours`: Total engine runtime in hours
- `total_flight_hours`: Total flight time in hours

### Flight Summary
- `num_flights`: Number of distinct flights
- `engine_starts`: Number of engine start cycles
- `serial_number`: Device serial number (3-digit format)

## Data Processing Features

### Flight Data Filtering
All statistical calculations (max, min, avg) are performed only on data points where `droneInFlight == 1`, ensuring metrics reflect actual flight performance.

### Timestamp Handling
- **Unix Timestamps**: Automatically detects and converts Unix timestamps in milliseconds
- **Standard Formats**: Supports ISO datetime strings and other standard formats
- **Time Formatting**: Start and end times formatted as HH:MM:SS for readability

### Serial Number Detection
Automatically finds the first 3-digit serial number in the data, skipping over zeros and other values.

### Derived Columns
The package automatically creates:
- `isGeneratorRunning`: 1 if `generator_rpm > 2000`, else 0
- `droneInFlight`: 1 if `generator_rpm > 5100`, else 0

## Usage Examples

### Basic File Processing
```python
from XER_Technologies_metadata_extractor import extract_csv_metadata

# Process a local CSV file
metadata = extract_csv_metadata(
    csv_data="path/to/flight_data.csv",
    csv_filename="Flight_Test_20240516_084236.csv"
)
```

### BytesIO Processing (for S3 integration)
```python
from io import BytesIO

csv_buffer = BytesIO(csv_content.encode('utf-8'))
metadata = extract_csv_metadata(
    csv_data=csv_buffer,
    csv_filename="flight_data.csv"
)
```

### Verbose Processing
```python
metadata = extract_csv_metadata(
    csv_data="flight_data.csv",
    csv_filename="flight_data.csv",
    verbose=True  # Enable detailed logging
)
```

## Data Requirements

### CSV Format
- **Encoding**: UTF-8
- **Minimum Rows**: At least 100 data points required
- **Required Columns**: `time` (Unix timestamp in milliseconds)
- **Optional Columns**: All other columns handled gracefully with warnings

### Column Mapping
The package automatically maps legacy column names to standard formats and creates derived columns for analysis.

## Development

### Setup
```bash
git clone <repo-url>
cd XERMetaDataExtractor
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -e ".[dev]"
```

### Testing
```bash
# Run comprehensive tests
python test_package.py

# Run individual test modules
pytest tests/
```

### Quality Checks
```bash
pytest
black .
mypy .
ruff check .
```

## Configuration

The package uses a flexible metadata configuration system that defines:
- Field names and categories
- Calculation methods (max, min, avg, duration, etc.)
- Source columns and validation rules
- Conditional calculations based on flight status

## Error Handling

The package gracefully handles:
- Missing columns (with warnings)
- Invalid data formats
- Empty or corrupted files
- Insufficient data points

All errors are captured in the metadata output for debugging and monitoring.

## Performance

- **Memory Efficient**: Processes large files without loading entire dataset into memory
- **Fast Processing**: Optimized pandas operations for quick metadata extraction
- **Robust**: Handles various data formats and edge cases

## License

[Add your license information here]

## Contributing

[Add contribution guidelines here]
