Metadata-Version: 2.1
Name: KAVICA
Version: 1.3.4
Summary: KAVICA: Powerful Python Cluster Analysis and Inference Toolkit
Home-page: https://github.com/kavehmahdavi/kavica
Author: Kaveh Mahdavi
Author-email: kavehmahdavi74@yahoo.com
License: UNKNOWN
Project-URL: Documentation, http://kavehmahdavi.github.io/kavica/
Project-URL: Forum, http://kavehmahdavi.github.io/
Project-URL: Repository, https://github.com/kavehmahdavi/kavica
Project-URL: Issues, https://github.com/kavehmahdavi/kavica/issues
Project-URL: Author, http://kavehmahdavi.github.io/
Keywords: Cluster Inference System,Feature Selection,Factor Analysis,Parser,Clustering,Unsupervised,Self-organizing map,Organization Component Analysis,Feature Space Curvature Map,Multiline Transformation
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering 
Classifier: Natural Language :: English
Requires-Python: >=3
Description-Content-Type: text/markdown
License-File: LICENSE.rst

<div align="center">
  <img src="https://github.com/kavehmahdavi/kavica/raw/main/doc/web/icon.png"><br>
</div>

-----------------

# KAVICA: Powerful Python Cluster Analysis and Inference Toolkit

[![PyPI Latest Release](https://img.shields.io/pypi/v/pandas.svg)](https://pypi.org/project/kavica/)
[![Conda Latest Release](https://anaconda.org/conda-forge/pandas/badges/version.svg)](https://anaconda.org/anaconda/pandas/)
[![Package Status](https://img.shields.io/pypi/status/pandas.svg)](https://pypi.org/project/kavica/)
[![License](https://img.shields.io/pypi/l/pandas.svg)](https://github.com/kavehmahdavi/kavica_container/blob/main/LICENSE)
[![Downloads](https://pepy.tech/badge/kavica/month)](https://pepy.tech/project/kavica)
[![Downloads](https://pepy.tech/badge/kavica/month)](https://pepy.tech/project/kavica)
[![Stack Overflow](https://img.shields.io/badge/stackoverflow-Ask%20questions-blue.svg)](https://stackoverflow.com/questions/tagged/kavica)

## What is it?

**kavica** is a Python package that provides semi-automated, flexible, and expressive clustering
analysis designed to make working with "unlabeled" data easy and intuitive.
It aims to be the fundamental high-level building block for doing practical, **real world** cluster analysis in Python.
Additionally, it has the broader goal of becoming **A powerful and flexible open source AutoML unsupervised / clustering
analysis tool and pipeline**. It is already well on its way towards this goal.

## Main Features

Here are just a few of the things that kavica does well:

- Intelligent [**Density
  Maping**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/space_curvature_map.py)
  to model the density structuer of the data in analogy to
  [Einstein's theory of relativity](https://www.space.com/17661-theory-general-relativity.html),
  and automated [**Density
  Homogenizing**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/bilinear_transformation.py)
  to prepare the
  data for the density-based clustering (e.g DBSCAN)

- Automatic, and powerful [**Organization Component
  Analysis**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/OCA.py) to interpret
  the clustering result by understanding the topological structuer of each cluster

- Topological and powerful [**Self-Organizing Maps Inference
  System**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/somis.py) to
  use the self-learning ability of the SOM to understand the topological structuer of the data

- Automated and Bayesian-based [**DBSCAN Hyper-parameter
  Tuner**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/tuner) to select the optimal
  hyper-parameters configuration of the DBSCAN clustering algorithm

- Efficient handling of [**feature
  selection**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/feature_selection/) in a potentially
  high-dimensional and
  massive datasets

- Gravitational implementation of Kohonen [**Generational Self-Organizing Maps (
  GSOM)**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/som.py) useful
  for unsupervised learning and supper-clustering by providing an enriched graphics, plots and animations features.

- Computational geometrical model [**Polygonal
  Cage**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/polygon_cage.py) to transfer
  feature vectors from a curved non-euclidean feature space to a new euclidean one.

- Robust [**factor analysis**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/factor_analysis;) to reduce a
  large number of variables into fewer numbers

- Easy handling of [**missing data**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/imputation/) (represented
  as `NaN`, `NA`, or `NaT`) in floating point
  as well as non-floating point data

- Flexible implementation of directed and undirected [**graph data
  structuer**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/graph_data_structur.py) and
  algorithms.

- Intuitive [**resampling**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/resampling/) data sets

- Powerful, flexible [**parser**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/parser) functionality to
  perform parsing, manipulating, and generating
  operations on flat, massive and unstructured [Traces](https://tools.bsc.es/paraver/trace_generation) datasets
  which are generated by [MareNostrum](https://www.bsc.es/marenostrum/marenostrum)

- [**Utilities**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/utils) functionality: intuitive explanatory
  data analysis, plotting, load and generate
  data, and etc...

## Examples:

- [**Feature Space Curvature Map**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/space_curvature_map.py)
  
  <div align="center"> 
  <img src="https://github.com/kavehmahdavi/kavica/raw/main/doc/web/circel.gif" width="800"><br>
  </div>

- [**Density
  Homogenizing**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/bilinear_transformation.py)

  ![](https://github.com/kavehmahdavi/kavica/raw/main/doc/web/hemo.png)

  Application of Feature Space Curvature Map on a multi-density 2D dataset Synt10 containing ten clusters. (a) A scatter
  plot of clusters with varied densities. The legend shows the size/N(μ,σ2) per cluster, the colors represent the data
  original labeling and the
  red lines draw the initial FSF. (b) shows the FSC model that is computed with our FSCM method. Note that the red lines
  show the deformation of the FSF. (c) scatter plots the data (a) projected by applying our transformation through
  model (b).
  As a result, the diversity of the clusters’ density scaled appropriately to achieve a better density-based clustering
  performance.

  ![](https://github.com/kavehmahdavi/kavica/raw/main/doc/web/hemo_exam.png)


- [**Polygonal Cage**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/polygon_cage.py)
  Multilinear transformation

  Feature Space Curvetuer  | Feature Space Fabric 
  --- |---
  ![](https://github.com/kavehmahdavi/kavica/raw/main/doc/web/BLT1.png) | ![](https://github.com/kavehmahdavi/kavica/raw/main/doc/web/BLT00.png) 

Data point transformation between a bent FSC (a) and regular FSF (b) based on the Multi-linear transformation in
R<sup>2</sup>.

- [**Organization Component Analysis**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/OCA.py)
  <div align="center"> 
  <img src="https://github.com/kavehmahdavi/kavica/raw/main/doc/web/oca.png" width="800"><br>
  </div>
  Application of the OCA on the Iris dataset

## Video

  <div align="center"> 
   <a href="https://www.youtube.com/watch?v=lxL3niQmBcU&t=27s"> 
    <img src="https://github.com/kavehmahdavi/kavica/raw/main/doc/web/OCA_presentation.png" width="600">
   </a>
  </div>

## Where to get it

The source code is currently hosted on GitHub at: [kavica](https://github.com/kavehmahdavi/kavica)

Binary installers for the latest released version are available at the
[Python Package Index (PyPI)](https://pypi.org/project/KAVICA/) and on [Conda](https://docs.conda.io/en/latest/).

The recommended way to install kavica is to use:

```sh
# PyPI
pip install kavica
```

But it can also be installed using:

```sh
# or conda
conda config --add channels conda-forge
conda install kavica
```

To verify your setup, start Python from the command line and run the following:

```sh
import kavica
```

## Dependencies

See the [requirement.txt](/requirements.txt) for installing the required packages:

```sh
pip install -r requirements.txt
```

## Publications

[Unsupervised Feature Selection for Noisy Data](https://doi.org/10.1007/978-3-030-35231-8_6)

[Organization Component Analysis: The method for extracting insights from the shape of cluster](https://doi.org/10.1109/IJCNN52387.2021.9533650)

[Feature Space Curvature Map: A Method To Homogenize Cluster Densities](https://doi.org/10.1109/IJCNN55064.2022.9892921)

## Issue tracker

If you find a bug, please help us solve it by [filing a report](https://github.com/kavehmahdavi/kavica/issues).

## Contributing

If you want to contribute, check out the
[contribution guidelines](https://kavehmahdavi.github.io/kavica/main/contributions.html).

## License

The main library of **kavica** is
[released under the BSD 3 clause license](https://kavehmahdavi.github.io/kavica/main/license.html).



