TExtractor
==========

Extract text content from many filetypes in pure Python. This package extracts
pure text from many office filetypes. Only three external (pure Python)
libraries are needed to work. After extracting you get a list of words with
the most common stop words stripped out (only en, de).

Install with: `pip install TExtractor`

Usage::

    >>> from textractor import TExtractor
    >>> extractor = TExtractor()
    >>> extractor.index('test.docx', lang='en')
    ['workflow_history', 'portal_workflow', 'review_history',
     'implementation', 'organizations', 'Illustrations', ...]
    >>> extractor.index('test.pdf', lang='en')
    ['workflow_history', 'portal_workflow', 'review_history',
     'implementation', 'organizations', 'Illustrations', ...]
    >>>

