Metadata-Version: 2.1
Name: akerbp.mlpet
Version: 1.0.1
Summary: Package to prepare well log data for ML projects.
Home-page: https://bitbucket.org/akerbp/akerbp.mlpet/src/master/
License: Apache-2.0
Author: Flavia Dias Casagrande
Author-email: flavia.dias.casagrande@akerbp.com
Requires-Python: >=3.8,<3.11
Classifier: License :: OSI Approved :: Apache Software License
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: PyYAML (==5.4.1)
Requires-Dist: cognite-sdk (>=2.31.0)
Requires-Dist: imbalanced-learn (>=0.8.0)
Requires-Dist: joblib (==1.0.1)
Requires-Dist: numpy (>=1.19.5)
Requires-Dist: pandas (>=1.3.2)
Requires-Dist: scikit-learn (>=0.24.2)
Requires-Dist: scipy (>=1.7.1)
Project-URL: Repository, https://bitbucket.org/akerbp/akerbp.mlpet/src/master/
Description-Content-Type: text/markdown

# MLPet

Preprocessing tools for Petrophysics ML projects at Eureka

## Installation

- Install the package by running the following (requires python 3.8 or later)

        pip install mlpet


## Quick start

- For a short example of how to use the mlpet Dataset class for pre-processing data see below. Please refer to the tests folder of this repository for more examples:

        from akerbp.mlpet import Dataset
        from akerbp.mlpet import utilities

        # Instantiate an empty dataset object using the example settings and mappings provided
        ds = Dataset(
                settings=r"./support/settings_shear.yaml",
                mappings=r"./support/mappings.yaml",
                folder_path=r"./support/",
        )

        # Populate the dataset with data from a file (support for multiple file formats and direct cdf data collection exists)
        ds.load_from_pickle(r"./support/data/shear.pkl")

        # The original data will be kept in ds.df_original and will remain unchanged
        print(ds.df_original.head())

        # Split the data into train-validation sets
        df_train, df_test = utilities.train_test_split(
                df=ds.df_original,
                target_column=ds.label_column,
                id_column=ds.id_column,
                test_size=0.3,
        )

        # Preprocess the data for training according to default workflow
        # print(ds.default_preprocessing_workflow) <- Uncomment to see what the workflow does
        df_preprocessed = ds.preprocess(df_train)


The procedure will be exactly the same for any other dataset class. The only difference will be in the "settings". For a full list of possible settings keys see [the documentation for the main Dataset class](https://bitbucket.org/akerbp/mlpet/src/documentation/docs/mlpet/Datasets.html). Make sure that the curve names are consistent with those in the dataset.

The loaded data is NOT mapped at load time but rather at preprocessing time (i.e. when preprocess is called).

## API Documentation

Full API documentaion of the package can be found under [docs.](https://bitbucket.org/akerbp/mlpet/src/documentation/docs/)

## For developers

- to update the API documentation, from the root directory of the project run

        pip install pdoc
        pdoc --docformat google -o docs mlpet

- to install mlpet in editable mode for use in another project, there are two
  possible solutions dependent on the tools being used:
   1. If the other package uses poetry, please refer to this [guide](https://github.com/python-poetry/poetry/discussions/1135#discussioncomment-145756)
   2. If you are not using poetry (using conda, pyenv or something else), you can pip install -e the package in the relevant virtual environment after you have activated it (not tested <- assuming this will work!)
## License

MLPet Copyright 2021 AkerBP ASA

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

