Metadata-Version: 2.1
Name: GraDiAn
Version: 0.0.0.1
Summary: A grammatical distribution analyser for NLP datasets.
Home-page: https://github.com/adamjhawley/GraDiAn
Author: Adam Hawley
Author-email: ajh651@york.ac.uk
License: UNKNOWN
Description: # GraDiAn
        The Grammatical Distribution Analyser (GraDiAn) is used for analysing grammatical distributions; particularly the distributions of popular NLP datasets.
        
        At the moment, GraDiAn does this by providing two abstract data types: the Syntactic Dependency Counter and the SentTree.
        
        ## SentTree
        `SentTree` represents a given sentence in a tree structure.
        Importantly, the `SentTree` can be used to analyse the parse-tree with regards to different properties of the text including part-of-speech tags, syntactic dependencies and (with the help of [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)) sentiment.
        
        ## Syntactic Dependency Counter (SDC)
        An `SDC` does what it says on the tin.
        Inheriting from python's `collections.Counter` class, it maintains a count of syntactic dependency labels.
        
        ## Usage
        
        ### Syntactic Dependency Counter
        Syntactic Dependency Counter from text:
        ```
        >>> from gradian import SDC
        >>> sdc = SDC.from_string('This is a test sentence!')
        >>> sdc
        SDC({'nsubj': 1, 'ROOT': 1, 'det': 1, 'compound': 1, 'attr': 1, 'punct': 1})
        ```
        
        Or from a series of texts:
        ```
        >>> from gradian import SDC
        >>> sdc = SDC.from_string_arr(['This is a test sentence!', 'This is another sentence',
                                       'How about another?'])
        >>> sdc
        SDC({'ROOT': 3, 'nsubj': 2, 'det': 2, 'attr': 2, 'punct': 2, 'compound': 1, 'advmod': 1, 'pobj': 1}
        ```
        
        ### SentTree
        SentTree from text:
        ```
        >>> from gradian import SentTree
        >>> sent_trees = SentTree.from_string('This is a test sentence! But this is another!')
        >>> # Sent_Tree.from_string produces a list of trees; one for each sentence
        >>> sent_trees[0].attr_tree('pos')  # Get the Tree with respect to the sentence's POS-Tags
        Tree('AUX', ['DET', Tree('NOUN', ['DET', 'NOUN']), 'PUNCT'])
        ```
        
        `attr_tree` can be used with any attribute of the tree including syntactic dependencies, POS-tags and (if spaCyTextBlob is enabled) sentiment.
        ```
        >>> sent_trees[0].attr_tree('dependency')
        Tree('ROOT', ['nsubj', Tree('attr', ['det', 'compound']), 'punct'])
        ```
        The function can be called with `token=True` to see the attributes alongside the relevant tokens:
        ```
        >>> # token is a positional argument so does not need to be explicitly provided by keyword
        >>> sent_trees[0].attr_tree('pos', token=True)  
        Tree('is:  AUX', ['This: DET', Tree('sentence:  NOUN', ['a: DET', 'test: NOUN']), '!: PUNCT'])
        ```
        
        `SentTrees` also come with the ability to create multi-attribute trees.
        ```
        >>> sent_trees[0].multi_attr_tree(['pos', 'dependency'], True)
        Tree('is:AUX:ROOT', ['This:DET:nsubj', Tree('sentence:NOUN:attr', ['a:DET:det', 'test:NOUN:compound']), '!:PUNCT:punct'])
        ```
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 1 - Planning
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Natural Language :: English
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.6
Description-Content-Type: text/markdown
