Metadata-Version: 1.1
Name: APacheDEX
Version: 1.2
Summary: Compute APDEX from Apache-style logs.
Home-page: http://git.erp5.org/gitweb/apachedex.git
Author: Vincent Pelletier
Author-email: vincent@nexedi.com
License: GPL 2+
Description: .. contents::
        
        Compute APDEX from Apache-style logs.
        
        Overview
        ========
        
        Parses Apache-style logs and generates several statistics intended for a
        website developer audience:
        
        - APDEX (Application Performance inDEX, see http://www.apdex.org) ratio
          (plotted)
        
          Because you want to know how satisfied your users are.
        
        - hit count (plotted)
        
          Because achieving 100% APDEX is easy when there is nobody around.
        
        - HTTP status codes, with optional detailed output of the most frequent URLs
          per error status code, along with their most frequent referers
        
          Because your forgot to update a link to that conditionally-used browser
          compatibility javascript you renamed.
        
        - Hottest pages (pages which use rendering time the most)
        
          Because you want to know where to invest time to get highest user experience
          improvement.
        
        - ERP5 sites: per-module statistics, with module and document views separated
        
          Because module and document types are not born equal in usage patterns.
        
        Some parsing performance figures:
        
        On a 2.3Ghz Corei5, apachedex achieves 97000 lines/s (
        pypy-c-jit-62994-bd32583a3f11-linux64) and 43000 lines/s (CPython 2.7).
        Those were measures on a 3000000-hits logfile, with 3 --skip-base, 1
        --erp5-base, 3 --base and --default set. --*base values were similar in
        simplicity to the ones provided in examples below.
        
        What APacheDEX is not
        =====================
        
        APacheDEX does not produce website audience statistics like AWStats, Google
        Analytics (etc) could do.
        
        APacheDEX does not monitor website availability & resource usage like Zabbix,
        Cacti, Ganglia, Nagios (etc) could do.
        
        Requirements
        ============
        
        Dependencies
        ------------
        
        As such, apachedex has no strict dependencies outside of standard python 2.7
        installation.
        But generated output needs a few javascript files which come from other
        projects:
        
        - jquery.js
        
        - jquery.flot.js
        
        - jquery.flot.time.js (official flot plugin)
        
        - jquery.flot.axislabels.js (third-party flot plugin)
        
        If you installed apachedex (using an egg or with a distribution's package) you
        should have them already.
        If you are running from repository, you need to fetch them first::
        
          python setup.py deps
        
        Also, apachedex can make use of backports.lzma
        (http://pypi.python.org/pypi/backports.lzma/) if it's installed to support xz
        file compression.
        
        Input
        -----
        
        All default "combined" log format fields are supported (more can easily be
        added), plus %D.
        
        Mandatory fields are (in any order) `%t`, `%r` (for request's URL), `%>s`,
        `%{Referer}i`, `%D`. Just tell apachedex the value from your apache log
        configuration (see `--logformat` argument documentation).
        
        Input files may be provided uncompressed or compressed in:
        
        - bzip
        
        - gzip2
        
        - xz (if module backports.lzma is installed)
        
        Input filename "-" is understood as stdin.
        
        Output
        ------
        
        The output is HTML + CSS + JS, so you need a web browser to read it.
        
        Output filename "-" is understood as stdout.
        
        Usage
        =====
        
        A few usage examples. See embedded help (`-h`/`--help`) for further options.
        
        Most basic usage::
        
          apachedex --default website access.log
        
        Generate stand-alone output (suitable for inclusion in a mail, for example)::
        
          apachedex --default website --js-embed access.log --out attachment.html
        
        A log file with requests for 2 websites for which individual stats are
        desired, and hits outside those base urls are ignored::
        
          apachedex --base "/site1(/|$|\?)" "/site2(/|$|\?)"
        
        A log file with a site section to ignore. Order does not matter::
        
          apachedex --skip-base "/ignored(/|$|\?)" --default website
        
        A mix of both above examples. Order matters !::
        
          apachedex --skip-base "/site1/ignored(/|$|\?)"
          --base "/site1(/|$|\?)" "/site2(/|$|\?)"
        
        Saving the result of an analysis for faster reuse::
        
          apachedex --default foo --format json --out save_state.json --period day
          access.log
        
        Although not required, it is strongly advised to provide `--period` argument,
        as mixing states saved with different periods (fixed or auto-detected from
        data) give hard-to-read results and can cause problems if loaded data gets
        converted to a larger period.
        
        Continuing a saved analysis, updating collected data::
        
          apachedex --default foo --format json --state-file save_state.json
          --out save_state.json --period day access.2.log
        
        Generating HTML output from two state files, aggregating their content
        without parsing more logs::
        
          apachedex --default foo --state-file save_state.json save_state.2.json
          --period day --out index.html
        
        Performance
        ===========
        
        For better performance...
        
        - pipe decompressed files to apachedex instead of having apachedex decompress
          files itself::
        
            bzcat access.log.bz2 | apachedex [...] -
        
        - parse log files in parallel processes, saving analysis output and aggregating
          them in the end::
        
            for LOG in access*.log; do
              apachedex "$@" --format json --out "$LOG.json" "$LOG" &
            done
            wait
            apachedex "$@" --out access.html --state-file access.*.json
        
        Notes
        =====
        
        When there are no hits for more than a graph period, placeholders are
        generated for 0 hit (which is the reality) and 100% apdex (this is
        arbitrary). Those placeholders only affect graphs, and do not affect
        averages nor table content.
        
        Loading saved states generated with different sets of parameters is not
        prevented, but can produce nonsense/unreadable results. Or it can save the day
        if you do want to mix different parameters (ex: you have some logs generated
        with %T, others with %D).
        
        It is unclear how stable saved state format will evolve. Be prepared to have
        to regenerate saved states if you upgrade APacheDEX.
        
Platform: any
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: System :: Logging
Classifier: Topic :: Text Processing :: Filters
Classifier: Topic :: Text Processing :: Markup :: HTML
