Metadata-Version: 1.1
Name: AdvancedAnalytics
Version: 0.3.0
Summary: Python support for 'The Art and Science of Data Analytics'
Home-page: https://github.com/tandonneur/AdvancedAnalytics
Author: Edward R Jones
Author-email: ejones@tamu.edu
License: UNKNOWN
Description: AdvancedAnalytics
        ===================
        
        A collection of python modules, classes and methods for simplifying building machine learning solutions.  This was developed to simplify learning python, and it accompanies the book *The Art and Science of Data Analytics*.
        
        Description
        ===========
        
        Machine learning applications progress through three stages:
        
            1. Data Preprocessing
            2. Modeling or Analytics
            3. Postprocessing
        
        The classes and methods in **AdvancedAnalytics** primarily support the first and last stages of machine learning applications. 
        
        Surprisingly data scientists report they typically spend 80% of their total effort in data preprocessing and postprocessing. The first stage is concerned with preparing the data for analysis.
        
            1. identifying and correcting outliers, 
            2. imputing missing values, and 
            3. encoding data. 
        
        The last stage, solution postprocessing, involves displaying and graphing solution summaries as well as metrics and graphics used to evaluate the quality of the solution.
        
        Usage
        =====
        
        Currently the most popular usage is for supporting solutions developed using these popular machine learning packages:
        
            * Sci-Learn
            * StatsModels
            * NLTK
        
        Current Modules and Classes
        =============================
        
        ReplaceImputeEncode
            Classes for Data Preprocessing
                * DT defines new data types used in the data dictionary
                * ReplaceImputeEncode a class for data preprocessing
        
        Regression
            Classes for Linear and Logistic Regression
                * linreg support for linear regressino
                * logreg support for logistic regression
                * stepwise a variable selection class
        
        Tree
            Classes for Decision Tree Solutions
                * tree_regressor support for regressor decision trees
                * tree_classifier support for classification decision trees
        
        Forest
            Classes for Random Forests
                * forest_regressor support for regressor random forests
                * forest_classifier support for classification random forests
        
        NeuralNetwork
            Classes for Neural Networks
                * nn_regressor support for regressor neural networks
                * nn_classifier support for classification neural networks
        
        TextAnalytics
            Classes for Text Analytics
                * text_analysis support for topic analysis
                * sentiment_analysis support for sentiment analysis
        
        Internet
            Classes for Internet Applications
                * scrape support for web scrapping
                * metrics a class for solution metrics
        
        Documentation and Examples
        ============================
        
        The API and documentation for all classes and examples are available at https://github.com/tandonneur/AdvancedAnalytics . 
        
        Installation and Dependencies
        =============================
        
        **AdvancedAnalytics** is designed to work on any operating system running python 3.  It can be installed using **pip** or **conda**.
        
        .. code-block:: python
        
            pip install AdvancedAnalytics
            # or
            conda install -c conda-forge AdvancedAnalytics
        
        General Dependencies
            There are dependencies.  Most classes import one or more modules from    
            **Sci-Learn**, referenced as *sklearn* in module imports, and 
            **StatsModels**.  These are both installed in with current versions
            of **anaconda**, a popular application for coding python solutions.
        
        Decision Tree and Random Forest Dependencies
            The *Tree* and *Forest* modules plot decision trees and importance
            metrics using **pydotplus** and the **graphviz** packages.  If these
            are not installed and you are planning to use the *Tree* or *Forest*
            modules, they can be installed using the following code.
        
            .. code-block:: python
        
                conda install -c conda-forge pydotplus
                conda install -c conda-forge graphviz
                pip install graphviz
        
            One note, the second conda install does not complete the install of 
            the graphviz package.  To complete the graphviz install, it is 
            necessary to run the pip install after the conda graphviz install.
        
        Text Analytics Dependencies
            The *TextAnalytics* module is based on the **NLTK** and **Sci-Learn**
            text analytics packages.  They are both installed with the current 
            version of anaconda. 
        
            However, *TextAnalytics* includes options to produce word clouds, 
            which are graphic displays of the word collections associated with 
            topic or data clusters.  The **wordcloud** package is used to produce
            these graphs.  If you are using the *TextAnalytics* module you can
            install the **wordcloud** package with the following code.
        
            .. code-block:: console
        
                conda install -c conda-forge wordcloud
        
            In addition, data used by the **NLTK** package is not automatically 
            installed with this package.  These data include the text 
            dictionary and other data tables.
        
            The following nltk.download commands should be run before using 
            **TextAnalytics**. However, it is only necessary to run these once to 
            download and install the data NLTK uses for text analytics.
        
            .. code-block:: console
        
                #The following NLTK commands should be run once to 
                #download and install NLTK data.
                nltk.download(?punkt?)
                nltk.download(?averaged_preceptron_tagger?)
                nltk.download(?stopwords?)
                nltk.download(?wordnet?)
        
        Code of Conduct
        ---------------
        
        Everyone interacting in the AdvancedAnalytics project's codebases, issue trackers, chat rooms, and mailing lists is expected to follow the PyPA Code of Conduct: https://www.pypa.io/en/latest/code-of-conduct/ .
        
        
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
