Metadata-Version: 2.1
Name: PositionData
Version: 0.1.10
Summary: Georeferenced CSV data processing
Home-page: https://github.com/ugcs/positiondata
Author: SPH Engineering
Author-email: ayankelevich@ugcs.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Description-Content-Type: text/markdown
License-File: LICENSE

# PositionData package

**PositionData** is a Python package specifically tailored for surveyors and geophysicists engaged in aerial data collection. This user-friendly tool is designed to streamline the processing of positional CSV data generated by [SkyHub](https://www.sphengineering.com/integrated-systems/skyhub), an advanced onboard computer system used in drones.

SkyHub is renowned for its versatility in data logging from a variety of drone-mounted sensors. These sensors range from methane detectors to wind sensors, magnetometers, echo sounders, and more, each providing valuable scalar georeferenced readings essential for a wide array of applications.

Our package simplifies the handling of this rich dataset, offering efficient ways to interpret, analyze, and visualize the collected geospatial data. Whether it's for environmental monitoring, resource exploration, or geographical mapping, **PositionData** enhances the capabilities of professionals in extracting meaningful insights from their aerial surveys.

Key features include trajectory analysis, data cleaning, spatial interpolation, and export functionalities that convert raw data into actionable intelligence. Designed with the needs of surveyors and geophysicists in mind, this package is an indispensable tool in the era of drone-assisted geophysical exploration and surveying.


Package is being maintained by [SPH Engineering](www.sphengineering.com) .

# Classes/Features
- [PositionData](#positiondata-class) - base methods for loading, filtering, clipping, exporting data
- [MethaneData](#methanedata-class) - methane map generation
- [Trajectory](#trajectory-class) - creating and exporting geographic trajectories 
- [WindData](#winddata-class) - true wind vector processing, wind rose generation

# Examples
## Mapping methane leaks

```python
from PositionData import PositionData
from PositionData import MethaneData

# Assuming PositionData is already loaded with necessary columns
position_data = PositionData('path/to/your/data.csv')

# Create a MethaneData instance
methane_data = MethaneData(position_data)

# Path where the GeoTIFF will be saved
output_map_path = 'path/to/save/methane_map.tif'

# EPSG code for the area's coordinate system
area_epsg = '32635'

# Generate the methane concentration map
methane_data.map_methane(map_path=output_map_path, 
                         area_epsg=area_epsg, 
                         grid_rows=100, 
                         grid_columns=100, 
                         environment_methane_perc=95, 
                         ignore_invalid=True)
```

## Making the trajectory polyline from sensor readings
```python
from PositionData import Trajectory

# Assuming you have a PositionData instance named 'position_data'
# which includes 'Date' and 'Time' columns
trajectory = Trajectory(position_data, 'Date', 'Time', tolerance=5.0, projection='EPSG:32635')

# Calculating duration
duration_in_minutes = trajectory.duration(unit='minutes')
print(f"Duration of the trajectory: {duration_in_minutes} minutes")

# Generating polyline and estimating length
polyline_gdf, length = trajectory.polyline()
print(f"Length of the simplified trajectory: {length} meters")
```

## Wind data processing

This example demonstrates how to process wind data using the `PositionData` and `WindData` classes. The steps include loading data from a CSV file, clipping by a polygon, calculating the platform direction, and generating a windrose.

```python
from PositionData import PositionData
from PositionData  import WindData

# Assuming 'data.csv' is your CSV file with wind data
position_data = PositionData('data.csv')

# Assuming 'clip_polygon.geojson' is your GeoJSON file with the clipping polygon
clipped_data = position_data.clip_by_polygon('clip_polygon.geojson')

# Calculate platform direction relative to north as a Direction column
data_with_direction = clipped_data.calculate_direction('Direction')

# Initialize wind data and generate true wind as TrueWindSpeed and TrueWindDirection
wind_data = WindData(clipped_data, 'Air:Speed', 'Air:Direction', 'Velocity', 'Direction', 'TrueWindSpeed', 'TrueWindDirection')

# Save the windrose plot as 'windrose.png'
wind_data.build_windrose('TrueWindSpeed', 'TrueWindDirection', 'windrose.png')
```

# Reference
## PositionData Class

The `PositionData` class is designed for handling and processing geospatial data from CSV or GeoJSON files. It provides methods for cleaning data, filtering, clipping, computing statistics, and more.

## Initialization
### `PositionData(input_file, file_format='csv', latitude_prop='Latitude', longitude_prop='Longitude', crs="epsg:4326")`
Initializes the `PositionData` object with data from a CSV or GeoJSON file.

#### Parameters:
- `input_file`: Path to the CSV or GeoJSON file.
- `file_format`: The format of the input file ('csv' or 'geojson').
- `latitude_prop`: Name of the latitude column (default 'Latitude').
- `longitude_prop`: Name of the longitude column (default 'Longitude').
- `crs`: Coordinate reference system for the GeoDataFrame (default 'epsg:4326').

#### Example:
```python
position_data = PositionData("data.csv")
```

## Methods

### `clean_nan(columns)`
Cleans the data by removing rows with NaN values in the specified columns. This method is useful for ensuring data quality and integrity.

#### Parameters:
- `columns`: A list of column names to check for NaN values.

#### Example:
```python
# Assuming position_data is an instance of PositionData
cleaned_data = position_data.clean_nan(['Latitude', 'Longitude'])
```

### `shape()`
Returns the shape of the data, which includes the number of rows and columns in the GeoDataFrame. This method is essential for understanding the dimensions of your dataset.

#### Example:
```python
# Assuming position_data is an instance of PositionData
data_shape = position_data.shape()
print("Number of rows and columns:", data_shape)
```

### `filter_range(column_name, min, max)`
Filters the data by column value within a specified range. This method is particularly useful for narrowing down the dataset to a specific range of values in a given column, which can be essential for focused analysis or data visualization.

#### Parameters:
- `column_name`: Name of the column to apply the filter on.
- `min`: The minimum value of the range. If `None`, no lower limit is applied.
- `max`: The maximum value of the range. If `None`, no upper limit is applied.

#### Example:
```python
# Assuming position_data is an instance of PositionData
# Filter data where the values in 'Velocity' column are between 10 and 20
filtered_data = position_data.filter_range('Velocity', 10, 20)
print(filtered_data)
```
### `clip_by_polygon(clip_polygon_geojson)`
Clips the internal data to the boundaries of a provided polygon, as specified in a GeoJSON file. This method is useful for spatially subsetting the data to a specific geographic area, allowing for focused analysis within that area.

#### Parameters:
- `clip_polygon_geojson`: The path to the GeoJSON file containing the polygon against which the data will be clipped.

#### Example:
```python
# Assuming position_data is an instance of PositionData
# Clip the data using the boundaries defined in 'clip_polygon.geojson'
clipped_data = position_data.clip_by_polygon('clip_polygon.geojson')
print(clipped_data)
```
### `filter_noize(property_name, filter_type, window_size=3)`
Applies a moving window filter to a specified property of the GeoDataFrame. This method is useful for smoothing or reducing noise in the data, particularly in cases where the data contains fluctuations or irregularities that can obscure underlying trends or patterns.

#### Parameters:
- `property_name`: The name of the property (column) on which to apply the filter.
- `filter_type`: The type of filter to apply ('average' or 'median').
- `window_size`: The size of the moving window, defaulting to 3.

#### Example:
```python
# Assuming position_data is an instance of PositionData
# Apply a moving average filter with a window size of 5 to the 'Velocity' property
filtered_data = position_data.filter_noize('Velocity', 'average', 5)
print(filtered_data)
```
### `columns()`
Retrieves an array of column names from the GeoDataFrame within the `PositionData` instance. This method provides a quick way to access and review the columns present in the geospatial dataset, aiding in data exploration and analysis.

#### Returns:
- **Array of Column Names**: An array containing the names of all columns in the GeoDataFrame.

#### Example:
```python
# Assuming position_data is an instance of PositionData
# Retrieve and print the column names
column_names = position_data.columns()
print("Column names:", column_names)
```

### `statistics(column, bins=10)`
Calculates and returns key statistics and a probability distribution for a selected column in the GeoDataFrame. This method is instrumental for understanding the distribution and central tendencies of data in a particular column, which is crucial for data analysis and decision-making.

#### Parameters:
- `column`: The name of the column for which statistics are to be calculated.
- `bins`: The number of bins to use for the probability distribution histogram, with a default value of 10.

#### Example:
```python
# Assuming position_data is an instance of PositionData
# Calculate statistics for the 'Velocity' column
velocity_stats = position_data.statistics('Velocity')
print(velocity_stats)
```
### `calculate_direction(direction_property)`
Calculates the direction between consecutive points in the GeoDataFrame and stores it in a specified property. This method is valuable for analyzing the directional trends in spatial data, such as determining the course of movement in tracking data or understanding directional patterns.

#### Parameters:
- `direction_property`: The name of the property (column) where the calculated direction values will be stored.

#### Example:
```python
# Assuming position_data is an instance of PositionData
# Calculate the direction between consecutive points and store in a new column 'Direction'
direction_data = position_data.calculate_direction('Direction')
print(direction_data)
```
### `export_as_geojson(self, output_path)`
Exports the current state of the GeoDataFrame to a GeoJSON file. This method is useful for saving processed or analyzed geospatial data in a standardized format, which can then be used in various GIS applications or further data analysis tools.

#### Parameters:
- `output_path`: The file path where the GeoJSON file will be saved.

#### Example:
```python
# Assuming position_data is an instance of PositionData
# Export the data to 'exported_data.geojson'
position_data.export_as_geojson('exported_data.geojson')
```

### `export_as_csv(self, output_path)`
Exports the current state of the GeoDataFrame to a CSV file. This method is useful for saving processed or analyzed geospatial data in a standardized format, which can then be used in various GIS applications or further data analysis tools.

#### Parameters:
- `output_path`: The file path where the CSV file will be saved.

#### Example:
```python
# Assuming position_data is an instance of PositionData
# Export the data to 'exported_data.geojson'
position_data.export_as_csv('exported_data.geojson')
```

### `deduplicate_skyhub_data()`
Deduplicates the GeoDataFrame stored in the `PositionData` instance. This method specifically targets a predefined set of columns related to skyhub data (like 'GAS:Methane', 'GAS:Status', 'AIR:Speed', 'AIR:Direction', along with latitude and longitude properties) for the deduplication process. It filters out the duplicates based on the intersection of these predefined columns and the columns actually present in the data. The method ensures that only unique records are retained, making the dataset more concise and relevant for analysis.

#### Returns:
- A new instance of `PositionData` containing the deduplicated data.

#### Example:
```python
# Assuming position_data is an instance of PositionData
# Deduplicate the data and store in a new instance
deduplicated_data = position_data.deduplicate_skyhub_data()
```

### `cut_useless_skyhub_columns()`
Streamlines the GeoDataFrame within the `PositionData` instance by retaining only a specified subset of columns. This method focuses on the columns listed in `self.skyhub_columns` and ensures that the essential 'geometry' column is also included. By filtering out unnecessary columns, this method helps in creating a more focused and relevant dataset, particularly useful in scenarios where only specific data points are of interest.

- Keeps only the columns that are both listed in `self.skyhub_columns` and present in the GeoDataFrame.
- Ensures the inclusion of the 'geometry' column, crucial for maintaining geospatial data integrity.
- Excludes all other columns not specified in `self.skyhub_columns` or absent from the DataFrame.

#### Returns:
- A new instance of `PositionData` containing the streamlined GeoDataFrame.

#### Example:
```python
# Assuming position_data is an instance of PositionData
# Streamline the data to include only specified columns
streamlined_data = position_data.cut_useless_skyhub_columns()
```

## MethaneData Class

## Class Overview

`MethaneData` is a Python class designed for processing and visualizing methane concentration data. It generates a GeoTIFF map based on methane readings, taking into account the location, status, and environmental thresholds of methane concentration.

## Initialization

### `MethaneData(position_data, methane_column='GAS:Methane', status_column='GAS:Status')`

Initializes the `MethaneData` object.

#### Parameters:

- `position_data` (`PositionData`): An instance of `PositionData` containing methane readings along with location data.
- `methane_column` (`str`): The name of the column in `PositionData` that contains methane readings. Default is `'GAS:Methane'`.
- `status_column` (`str`): The name of the column in `PositionData` that indicates the status of methane readings. Default is `'GAS:Status'`.

#### Description:

The constructor initializes the `MethaneData` instance, cleaning the data in `position_data` by removing NaN values from the specified methane and status columns. It also sets the `NO_DATA_MAX_LEVEL` and `NO_DATA_VALUE` for handling missing data in the interpolation process.

---

## Methods

### `map_methane(map_path, area_epsg, grid_rows=100, grid_columns=100, environment_methane_perc=95, ignore_invalid=True)`

Generates a GeoTIFF map representing methane concentration levels.

#### Parameters:

- `map_path` (`str`): File path where the GeoTIFF file will be saved.
- `area_epsg` (`str`): The EPSG code of the area for handling coordinate reference system conversions.
- `grid_rows` (`int`): Number of rows in the interpolation grid. Default is 100.
- `grid_columns` (`int`): Number of columns in the interpolation grid. Default is 100.
- `environment_methane_perc` (`int`): The percentage used to determine the environmental methane threshold. Default is 95.
- `ignore_invalid` (`bool`): If set to `True`, invalid readings (based on `status_column`) will be ignored. Default is `True`.

#### Description:

This method processes the methane data and generates a GeoTIFF map. It first filters out invalid readings if `ignore_invalid` is `True`. It then calculates an adjusted methane concentration by subtracting an environmental methane threshold (determined by `environment_methane_perc`) from the actual readings. The method interpolates these adjusted values over a specified grid and saves the result as a GeoTIFF file at `map_path`. 

#### Notes:

- The method checks if the coordinate reference system (CRS) of `position_data` is geographic (EPSG:4326). If it is not, CRS conversion is performed based on `area_epsg`.
- Zero values in the interpolated grid are replaced with `NO_DATA_VALUE` to represent areas with no data.
- The method handles NaN values and ensures that the output GeoTIFF correctly represents the methane concentration across the given area.

---
## Trajectory Class

## Class Overview

The `Trajectory` class provides functionalities for creating, managing, and exporting geographic trajectories. It inherits from `PositionBase` and utilizes geographical data to generate simplified trajectory polylines and calculate durations.

## Initialization

### `Trajectory(position_data, date_column, time_column, tolerance, projection)`


#### Description

The `__init__` method initializes the `Trajectory` object, setting up essential parameters and generating the trajectory polyline. This method processes positional data with specified columns for date and time, creating a simplified trajectory representation.

#### Parameters

- `position_data` (PositionData): An instance of `PositionData` containing the positional information for the trajectory.
- `date_column` (str): The name of the column in `position_data` that contains the date information.
- `time_column` (str): The name of the column in `position_data` that contains the time information.
- `tolerance` (float): The tolerance distance in meters for simplifying the trajectory. Determines how much deviation from the original path is allowed.
- `projection` (str): The EPSG code of the projected coordinate system for distance calculations. A projected CRS is crucial for accurate distance measurements.

### Example Usage

```python
from SkyHubDataProcessor import Trajectory

# Example: Creating a Trajectory instance
# Assume 'position_data' is an instance of PositionData with date and time columns
trajectory = Trajectory(position_data, 'DateColumn', 'TimeColumn', tolerance=5.0, projection='EPSG:32635')
```
---

## Methods

### `duration(unit='seconds')` 

#### Description

The `duration` method of the `Trajectory` class calculates the total duration between the first and last record in the trajectory data. This duration is useful for understanding the time span covered by the trajectory, which can be important for analyses like calculating average speeds, understanding usage patterns, or synchronizing with other time-dependent data.

#### Parameters

- `unit` (str, optional): Specifies the unit of time for the duration. The available options are `'seconds'`, `'minutes'`, and `'hours'`. The default is `'seconds'`.

#### Returns

- `float`: The duration between the first and last record in the specified unit of time.

#### Example Usage

```python
# Assuming 'trajectory' is an instance of the Trajectory class
duration_in_seconds = trajectory.duration(unit='seconds')
```

### `polyline()`

#### Description

The `polyline` method generates a simplified representation of the trajectory as a polyline. This method simplifies the trajectory data to a LineString geometry based on a specified tolerance, which can be helpful for visualizing or analyzing the path in a more concise form.

#### Parameters

- `output_path` (str): The file path where the simplified trajectory will be saved in GeoJSON format.
- `tolerance` (float): The tolerance distance for simplification in meters. Smaller values will result in a polyline closer to the original trajectory, while larger values will produce a more simplified representation.
- `projection` (str): The projection system to use for distance calculation during simplification. This should be a string representation of an EPSG code for a projected coordinate system.

#### Returns

- A tuple containing:
  - `GeoDataFrame`: A GeoDataFrame object containing the simplified polyline.
  - `float`: The length of the simplified polyline in the units of the specified projection system.

#### Example Usage

```python
# Assuming 'trajectory' is an instance of the Trajectory class
polyline_gdf, polyline_length = trajectory.polyline(output_path='simplified_trajectory.geojson', tolerance=5.0, projection='EPSG:32635')
```

### `export_as_geojson(output_path)`

#### Description

The `export_as_geojson` method exports the trajectory's simplified polyline as a GeoJSON file. This method is useful for creating a standard GeoJSON representation of the trajectory, which can be used in various GIS applications or for further geographic analyses. The method ensures that the exported GeoJSON is in the WGS 84 coordinate reference system (EPSG:4326), which is the standard for GeoJSON files.

#### Parameters

- `output_path` (str): The file path where the GeoJSON file will be saved.

#### Returns

This method does not return a value. It creates a GeoJSON file at the specified `output_path`.

#### Example Usage

```python
# Assuming 'trajectory' is an instance of the Trajectory class
trajectory.export_as_geojson('trajectory.geojson')
print(f"Trajectory exported as GeoJSON to 'trajectory.geojson'")
```
## WindData Class

The `WindData` class is designed for processing and analyzing wind data in a geospatial context. It includes methods for calculating true wind speed and direction, gridding measurements, and building windrose plots.

## Initialization
### `WindData(position_data, air_speed_prop, air_dir_prop, platform_speed_prop, platform_dir_prop, true_speed_prop, true_dir_prop, sensor_cw_rot=0, sensor_to_north=False)`
Initializes the `WindData` object with an instance of `PositionData` and properties related to wind and platform motion. it automatically calculates tru wind vectors. 

#### Parameters:
- `position_data`: An instance of `PositionData`.
- `air_speed_prop`: Property name for air speed.
- `air_dir_prop`: Property name for air direction.
- `platform_speed_prop`: Property name for platform speed.
- `platform_dir_prop`: Property name for platform direction.
- `true_speed_prop`: Property name for true wind speed.
- `true_dir_prop`: Property name for true wind direction.
- `sensor_cw_rot`: CW rotation of the sensor relative to the platform nose.
- `sensor_to_north`: If true, sensor readings are related to North; otherwise, relative to the platform nose.

#### Example:
```python
wind_data = WindData(position_data, 'Air:Speed', 'Air:Direction', 'Velocity', 'Direction', 'TrueWindSpeed', 'TrueWindDirection')
```
## Methods
### `build_windrose(speed_col, direction_col, output_path, bins=[0,2,4,6,8,10], nsector=16, title="Windrose")`
Builds and saves a windrose plot. This method is valuable for visually representing the distribution of wind speeds and directions, which is crucial in meteorological studies and applications such as sailing, aviation, and architecture.

#### Parameters:
- `speed_col`: Name of the wind speed column.
- `direction_col`: Name of the wind direction column.
- `output_path`: Path to save the generated windrose image.
- `bins`: Binning for wind speed (default is `[0,2,4,6,8,10]`).
- `nsector`: Number of sectors for the windrose (default is `16`).
- `title`: Title of the windrose plot (default is `"Windrose"`).

#### Example:
```python
# Assuming wind_data is an instance of WindData
wind_data.build_windrose('TrueWindSpeed', 'TrueWindDirection', 'windrose.png', bins=[0,2,4,6,8,10], nsector=16, title="Windrose")
```

### `grid_wind(speed_property, direction_property, method='linear', resolution=100)`
Creates a gridded representation of the wind measurements. This method is useful for visualizing and analyzing spatial variations in wind patterns, particularly in applications like meteorology, environmental monitoring, and renewable energy studies.

#### Parameters:
- `speed_property`: The name of the column representing wind speed.
- `direction_property`: The name of the column representing wind direction.
- `method`: The interpolation method for gridding (default is 'linear'). Other options are available as per `scipy.interpolate.griddata`.
- `resolution`: The resolution of the grid (default is `100`). Higher values provide finer grids.

#### Example:
```python
# Assuming wind_data is an instance of WindData
gridded_wind_data = wind_data.grid_wind('TrueWindSpeed', 'TrueWindDirection', method='linear', resolution=100)
```


