Metadata-Version: 2.1
Name: TExtractor
Version: 0.1.2
Summary: Extract text content from many filetypes.
Home-page: http://bitbucket.org/whitie/textractor-py3/
Author: Thorsten Weimann
Author-email: weimann.th@yahoo.com
License: MIT
Description: TExtractor
        ==========
        
        Extract text content from many filetypes in pure Python. This package extracts
        pure text from many office filetypes. Only three external (pure Python)
        libraries are needed to work. After extracting you get a list of words with
        the most common stop words stripped out (only en, de).
        
        Install with: `pip install TExtractor`
        
        Usage::
        
            >>> from textractor import TExtractor
            >>> extractor = TExtractor()
            >>> extractor.index('test.docx', lang='en')
            ['workflow_history', 'portal_workflow', 'review_history',
             'implementation', 'organizations', 'Illustrations', ...]
            >>> extractor.index('test.pdf', lang='en')
            ['workflow_history', 'portal_workflow', 'review_history',
             'implementation', 'organizations', 'Illustrations', ...]
            >>>
        
        
Keywords: text extract pdf docx
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Description-Content-Type: text/x-rst
