Metadata-Version: 1.1
Name: MLFeatureSelection
Version: 0.0.2
Summary: geenral select features based on selected algorithm
Home-page: https://github.com/duxuhao/Feature-Selection
Author: Xuhao(Peter) Du
Author-email: duxuhao88@gmail.com
License: MIT Licence
Description: Features Selection
        ==================
        
        This code is for general features selection based on 
        certain machine learning algorithm and evaluation methos
        
        How to run (see demo.py)
        ------------------------
        
        The demo is based on the IJCAI-2018 data moning competitions
        
        -  Import library from FeatureSelection.py and also other necessary
           library
        
        .. code:: python
        
            from MLFeatureSelection import FeatureSelection as FS
            from sklearn.metrics import log_loss
            import lightgbm as lgbm
            import pandas as pd
            import numpy as np
        
        -  Generate for dataset
        
        .. code:: python
        
            def prepareData():
                df = pd.read_csv('IJCAI-2018/data/train/trainb.csv')
                df = df[~pd.isnull(df.is_trade)]
                item_category_list_unique = list(np.unique(df.item_category_list))
                df.item_category_list.replace(item_category_list_unique, list(np.arange(len(item_category_list_unique))), inplace=True)
                return df
        
        -  Define your loss function
        
        .. code:: python
        
            def modelscore(y_test, y_pred):
                return log_loss(y_test, y_pred)
        
        -  Define the way to validate
        
        .. code:: python
        
            def validation(X,y,clf,lossfunction):
                totaltest = 0
                for D in [24]:
                    T = (X.day != D)
                    X_train, X_test = X[T], X[~T]
                    X_train, X_test = X_train, X_test
                    y_train, y_test = y[T], y[~T]
                    clf.fit(X_train,y_train, eval_set = [(X_train, y_train), (X_test, y_test)], eval_metric='logloss', verbose=False,early_stopping_rounds=200) #the train method must match your selected algorithm
                    totaltest += lossfunction(y_test, clf.predict_proba(X_test)[:,1])
                totaltest /= 1.0
                return totaltest
        
        -  Define the cross method (required when *Cross = True*)
        
        .. code:: python
        
            def add(x,y):
                return x + y
        
            def substract(x,y):
                return x - y
        
            def times(x,y):
                return x * y
        
            def divide(x,y):
                return (x + 0.001)/(y + 0.001)
        
            CrossMethod = {'+':add,
                           '-':substract,
                           '*':times,
                           '/':divide,}
        
        -  Initial the seacher with customized procedure (sequence + random +
           cross)
        
        .. code:: python
        
            sf = FS.Select(Sequence = False, Random = True, Cross = False) #select the way you want to process searching
        
        -  Import loss function
        
        .. code:: python
        
            sf.ImportLossFunction(modelscore,direction = 'descend')
        
        -  Import dataset
        
        .. code:: python
        
            sf.ImportDF(prepareData(),label = 'is_trade')
        
        -  Import cross method (required when *Cross = True*)
        
        .. code:: python
        
            sf.ImportCrossMethod(CrossMethod)
        
        -  Define non-trainable features
        
        .. code:: python
        
            sf.NonTrainableFeatures = ['used','instance_id', 'item_property_list', 'context_id', 'context_timestamp', 'predict_category_property', 'is_trade']
        
        -  Define initial features' combination
        
        .. code:: python
        
            sf.InitialFeatures(['item_category_list', 'item_price_level','item_sales_level','item_collected_level', 'item_pv_level','day'])
        
        -  Define algorithm
        
        .. code:: python
        
            sf.clf = lgbm.LGBMClassifier(random_state=1, num_leaves = 6, n_estimators=5000, max_depth=3, learning_rate = 0.05, n_jobs=8)
        
        -  Define log file name
        
        .. code:: python
        
            sf.logfile = 'record.log'
        
        -  Run with self-define validate method
        
        .. code:: python
        
            sf.run(validation)
        
        -  This code take a while to run, you can stop it any time and restart
           by replace the best features combination in temp sf.InitialFeatures()
        
        This features selection method achieved
        ---------------------------------------
        
        -  **1st** in Rong360
        
        -- https://github.com/duxuhao/rong360-season2
        
        -  **12nd** in IJCAI-2018 1st round
        
        Algorithm details
        -----------------
        
        .. figure:: https://github.com/duxuhao/Feature-Selection/blob/master/Procedure.png
           :alt: Procedure
        
           Procedure
        
Keywords: pypi easy_install pip
Platform: Linux
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: Implementation
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Software Development :: Libraries
