Metadata-Version: 2.1
Name: KTBoost
Version: 0.0.21
Summary: Implements several boosting algorithms in Python
Home-page: https://github.com/fabsig/KTBoost
Author: Fabio Sigrist
Author-email: fabiosigrist@gmail.com
License: UNKNOWN
Description: # KTBoost - A Python Package for Boosting
        
        This Python package implements several boosting algorithms with different combinations of base learners, optimization algorithms, and loss functions.
        
        ## Description
        
        Concerning **base learners**, KTboost includes:
        
        * Trees 
        * Reproducing kernel Hilbert space (RKHS) ridge regression functions (i.e., posterior means of Gaussian processes)
        * A combination of the two (i.e., the KTBoost algorithm) 
        
        
        Concerning the **optimization** step for finding the boosting updates, the package supports:
        
        * Gradient descent
        * Newton method (if applicable)
        * A hybrid version of the two for trees as base learners
        
        
        The package implements the following **loss functions**:
        
         * Continuous data ("regression"): quadratic loss (L2 loss), absolute error (L1 loss), Huber loss, quantile regression loss, Gamma regression loss, negative Gaussian log-likelihood with both the mean and the standard deviation as functions of features
        * Count data ("regression"): Poisson regression loss
        * (Unorderd) Categorical data ("classification"): logistic regression loss (log loss), exponential loss, cross entropy loss with softmax
        * Mixed continuous-categorical data ("censored regression"): negative Tobit likelihood (i.e., the Grabit model)
        
        
        
        
        ## Installation
        
        It can be **installed** using 
        ```
        pip install -U KTBoost
        ```
        and then loaded using 
        ```
        import KTBoost.KTBoost as KTBoost
        ```
        
        ## Usage and examples
        The package re-uses code from scikit-learn and its workflow is very similar to that of scikit-learn.
        
        The two main classes are `KTBoost.BoostingClassifier` and `KTBoost.BoostingRegressor`. 
        
        The following **code example** defines models, trains them, and makes predictions.
        
        ```python
        import KTBoost.KTBoost as KTBoost
        
        ################################################
        ## Define model (see below for more examples) ##
        ################################################
        ## Standard tree boosting for regression with quadratic loss and hybrid gradient-Newton updates as in Friedman (2001)
        model = KTBoost.BoostingRegressor(loss='ls')
        
        
        ##################
        ## Train models ##
        ##################
        model.fit(Xtrain,ytrain)
        
        
        ######################
        ## Make predictions ##
        ######################
        model.predict(Xpred)
        
        
        #############################
        ## More examples of models ##
        #############################
        ## Boosted Tobit model, i.e. Grabit model (Sigrist and Hirnschall, 2017), 
        ## with lower and upper limits at 0 and 100
        model = KTBoost.BoostingRegressor(loss='tobit',yl=0,yu=100)
        ## KTBoost algorithm (combined kernel and tree boosting) for classification with Newton updates
        model = KTBoost.BoostingClassifier(loss='deviance',base_learner='combined',
                                            update_step='newton',theta=1)
        ## Gradient boosting for classification with trees as base learners
        model = KTBoost.BoostingClassifier(loss='deviance',update_step='gradient')
        ## Newton boosting for classification model with trees as base learners
        model = KTBoost.BoostingClassifier(loss='deviance',update_step='newton')
        ## Hybrid gradient-Newton boosting (Friedman, 2001) for classification with 
        ## trees as base learners (this is the version that scikit-learn implements)
        model = KTBoost.BoostingClassifier(loss='deviance',update_step='hybrid')
        ## Kernel boosting for regression with quadratic loss
        model = KTBoost.BoostingRegressor(loss='ls',base_learner='kernel',theta=1)
        ## Kernel boosting with the Nystroem method and the range parameter theta chosen 
        ## as the average distance to the 100-nearest neighbors (of the Nystroem samples)
        model = KTBoost.BoostingRegressor(loss='ls',base_learner='kernel',nystroem=True,
                                          n_components=1000,theta=None,n_neighbors=100)
        ## Regression model where both the mean and the standard deviation depend 
        ## on the covariates / features
        model = KTBoost.BoostingRegressor(loss='msr')
        
        
        #########################
        ## Feature importances ## (only defined for trees as base learners)
        #########################
        Xtrain=np.random.rand(1000,10)
        ytrain=2*Xtrain[:,0]+2*Xtrain[:,1]+np.random.rand(1000)
        
        model = KTBoost.BoostingRegressor()
        model.fit(Xtrain,ytrain)
        ## Extract feature importances calculated as described in Friedman (2001)
        feat_imp = model.feature_importances_
        
        ## Alternatively, plot feature importances directly
        KTBoost.plot_feature_importances(model=model,feature_names=feature_names,maxFeat=10)
        
        
        ##############################
        ## Partial dependence plots ## (currently only implemented for trees as base learners)
        ##############################
        from KTBoost.partial_dependence import plot_partial_dependence
        import matplotlib.pyplot as plt
        features = [0,1,2,3,4,5]
        fig, axs = plot_partial_dependence(model,Xtrain,features,percentiles=(0,1),figsize=(8,6))
        plt.subplots_adjust(top=0.9)
        fig.suptitle('Partial dependence plots')
        
        ## Alternatively, get partial dependencies in numerical form
        from KTBoost.partial_dependence import partial_dependence
        kwargs = dict(X=Xtrain, percentiles=(0, 1))
        partial_dependence(model,[0],**kwargs)
        ```
        
        ## Author
        Fabio Sigrist
        
        ## References
        
        * Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting. The annals of statistics, 28(2), 337-407.
        * Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
        * Sigrist, F. (2018). Gradient and Newton Boosting for Classification and Regression. arXiv preprint arXiv:1808.03064.
        * Sigrist, F., & Hirnschall, C. (2017). Grabit: Gradient Tree Boosted Tobit Models for Default Prediction. arXiv preprint arXiv:1711.08695.
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
