Metadata-Version: 2.1
Name: binsel
Version: 0.3.1
Summary: Feature selection for Hard Voting classifier
Home-page: http://github.com/kmedian/binsel
Author: Ulf Hamster
Author-email: 554c46@gmail.com
License: Apache License 2.0
Requires-Python: >=3.6
License-File: LICENSE

|PyPI version| |Language grade: Python| |Total alerts|

binsel
======

Feature selection for Hard Voting classifier.

Usage
-----

Check the ```binsel_hardvote``
example <https://github.com/kmedian/binsel/blob/master/examples/binsel_hardvote.ipynb>`__
folder for notebooks.

Algorithm
---------

The task is to select e.g. ``n_select=3`` binary features from a pool of
many binary features. These binary features might be the prediction of
binary classifiers. The selected binary features are then combined into
one hard-voting classifier.

A voting classifier should have the following properties

-  each voter (a binary feature) should be highly correlated to the
   target variable
-  the selected binary features should be uncorrelated.

The algorithm works as follows

1. Generate multiple correlation matrices by bootstrapping (see
   ```korr.bootcorr`` <https://github.com/kmedian/korr/blob/master/korr/bootcorr.py>`__).
   This includes ``corr(X_i, X_j)`` as well as ``corr(Y, X_i)``
   computation. Also store the oob samples for evaluation.
2. For each correlation matrix do …

   a. Preselect the ``i*`` with the highest ``abs(corr(Y, X_i))``
      estimates (e.g. pick the ``n_pre=?`` highest absolute
      correlations)
   b. Slice a correlation matrix ``corr(X_i*, X_j*)`` and find the least
      correlated combination of ``n_select=?`` features. (see
      ```korr.mincorr`` <https://github.com/kmedian/korr/blob/master/korr/mincorr.py>`__)
   c. Compute the out-of-bag (OOB) performance (see step 1) of the
      hard-voter with the selected ``n_select=?`` binary features

3. Select the binary feature combination with the best OOB performance
   as final model.

Appendix
--------

Installation
~~~~~~~~~~~~

The ``binsel`` `git repo <http://github.com/kmedian/binsel>`__ is
available as `PyPi package <https://pypi.org/project/binsel>`__

.. code:: sh

   pip install binsel

Install a virtual environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: sh

   python3.7 -m venv .venv
   source .venv/bin/activate
   pip install --upgrade pip
   pip install -r requirements.txt
   pip install -r requirements-dev.txt
   pip install -r requirements-demo.txt

(If your git repo is stored in a folder with whitespaces, then don’t use
the subfolder ``.venv``. Use an absolute path without whitespaces.)

Python commands
~~~~~~~~~~~~~~~

-  Jupyter for the examples: ``jupyter lab``
-  Check syntax:
   ``flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g')``
-  Run Unit Tests: ``python -W ignore -m unittest discover``

Publish

.. code:: sh

   pandoc README.md --from markdown --to rst -s -o README.rst
   python setup.py sdist 
   twine upload -r pypi dist/*

Clean up
~~~~~~~~

::

   find . -type f -name "*.pyc" | xargs rm
   find . -type d -name "__pycache__" | xargs rm -r
   rm -r .venv

Support
-------

Please `open an issue <https://github.com/kmedian/binsel/issues/new>`__
for support.

Contributing
------------

Please contribute using `Github
Flow <https://guides.github.com/introduction/flow/>`__. Create a branch,
add commits, and `open a pull
request <https://github.com/kmedian/binsel/compare/>`__.

.. |PyPI version| image:: https://badge.fury.io/py/binsel.svg
   :target: https://badge.fury.io/py/binsel
.. |Language grade: Python| image:: https://img.shields.io/lgtm/grade/python/g/kmedian/binsel.svg?logo=lgtm&logoWidth=18
   :target: https://lgtm.com/projects/g/kmedian/binsel/context:python
.. |Total alerts| image:: https://img.shields.io/lgtm/alerts/g/kmedian/binsel.svg?logo=lgtm&logoWidth=18
   :target: https://lgtm.com/projects/g/kmedian/binsel/alerts/
