Metadata-Version: 1.1
Name: MLFeatureSelection
Version: 0.0.6.3
Summary: Features selection algorithm based on self selected algorithm, loss function and validation method
Home-page: https://github.com/duxuhao/Feature-Selection
Author: Xuhao(Peter) Du
Author-email: duxuhao88@gmail.com
License: MIT Licence
Description: Features Selection
        ==================
        
        This code is for general features selection based on 
        certain machine learning algorithm and evaluation methos
        
        You can modified you validation method and loss function
        all by yourself
        
        How to run
        ------------------------
        
        The demo is based on the IJCAI-2018 data moning competitions
        
        -  Import library from FeatureSelection.py and also other necessary
           library
        
        .. code:: python
        
            from MLFeatureSelection import FeatureSelection as FS
            from sklearn.metrics import log_loss
            import lightgbm as lgbm
            import pandas as pd
            import numpy as np
        
        -  Generate for dataset
        
        .. code:: python
        
            def prepareData():
                df = pd.read_csv('IJCAI-2018/data/train/trainb.csv')
                df = df[~pd.isnull(df.is_trade)]
                item_category_list_unique = list(np.unique(df.item_category_list))
                df.item_category_list.replace(item_category_list_unique, list(np.arange(len(item_category_list_unique))), inplace=True)
                return df
        
        -  Define your loss function
        
        .. code:: python
        
            def modelscore(y_test, y_pred):
                return log_loss(y_test, y_pred)
        
        -  Define the way to validate
        
        .. code:: python
        
            def validation(X,y, features, clf, lossfunction):
                totaltest = 0
                for D in [24]:
                    T = (X.day != D)
                    X_train, X_test = X[T], X[~T]
                    X_train, X_test = X_train[features], X_test[features]
                    y_train, y_test = y[T], y[~T]
                    clf.fit(X_train,y_train, eval_set = [(X_train, y_train), (X_test, y_test)], eval_metric='logloss', verbose=False,early_stopping_rounds=200) #the train method must match your selected algorithm
                    totaltest += lossfunction(y_test, clf.predict_proba(X_test)[:,1])
                totaltest /= 1.0
                return totaltest
        
        -  Define the cross method (required when *Cross = True*)
        
        .. code:: python
        
            def add(x,y):
                return x + y
        
            def substract(x,y):
                return x - y
        
            def times(x,y):
                return x * y
        
            def divide(x,y):
                return (x + 0.001)/(y + 0.001)
        
            CrossMethod = {'+':add,
                           '-':substract,
                           '*':times,
                           '/':divide,}
        
        -  Initial the seacher with customized procedure (sequence + random +
           cross)
        
        .. code:: python
        
            sf = FS.Select(Sequence = False, Random = True, Cross = False) #select the way you want to process searching
        
        -  Import loss function
        
        .. code:: python
        
            sf.ImportLossFunction(modelscore,direction = 'descend')
        
        -  Import dataset
        
        .. code:: python
        
            sf.ImportDF(prepareData(),label = 'is_trade')
        
        -  Import cross method (required when *Cross = True*)
        
        .. code:: python
        
            sf.ImportCrossMethod(CrossMethod)
        
        -  Define non-trainable features
        
        .. code:: python
        
            sf.InitialNonTrainableFeatures(['used','instance_id', 'item_property_list', 'context_id', 'context_timestamp', 'predict_category_property', 'is_trade'])
        
        -  Define initial features' combination
        
        .. code:: python
        
            sf.InitialFeatures(['item_category_list', 'item_price_level','item_sales_level','item_collected_level', 'item_pv_level','day'])
        
        -  Define features with potential that can be added later
        
        .. code:: python
        
            sf.AddPotentialFeatures(['user_age_level'])
        
        -  Define algorithm
        
        .. code:: python
        
            sf.clf = lgbm.LGBMClassifier(random_state=1, num_leaves = 6, n_estimators=5000, max_depth=3, learning_rate = 0.05, n_jobs=8)
        
        -  Define log file name
        
        .. code:: python
        
            sf.SetLogFile('record.log')
        
        -  Set maximum features quantity
        
        .. code:: python
        
            sf.SetFeaturesLimit(40) #maximum number of features
        
        -  Set maximum time limit (in minutes)
        
        .. code:: python
        
            sf.SetTimeLimit(100) #maximum running time in minutes
        
        -  Set sample ratio of total dataset, when samplemode equals to 0, running the same subset, when samplemode equals to 1, subset will be different each time
        
        .. code:: python
        
            sf.SetSample(0.1, samplemode = 0)
        
        -  Generate feature library, can specific certain key word and selection step
        
        .. code:: python
        
            sf.GenerateCol(key = 'mean', selectstep = 2) #can iterate different features set
        
        -  Run with self-define validate method
        
        .. code:: python
        
            sf.run(validation)
        
        -  This code take a while to run, you can stop it any time and restart
           by replace the best features combination in temp sf.InitialFeatures()
        
        This features selection method achieved
        ---------------------------------------
        
        -  **1st** in Rong360
        
        -- https://github.com/duxuhao/rong360-season2
        
        - **Temporary Top 10** in JData-2018
        
        -  **12nd** in IJCAI-2018 1st round
        
        Algorithm details
        -----------------
        
        .. figure:: https://github.com/duxuhao/Feature-Selection/blob/master/Procedure.png
           :alt: Procedure
        
           Procedure
        
Keywords: pypi easy_install pip
Platform: Linux
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: Implementation
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Software Development :: Libraries
