Metadata-Version: 2.1
Name: awlify
Version: 1.1.2
Summary: a simple utility to take in a sentence and output information about the AWL words in it
Home-page: https://github.com/lpmi-13/awlify-python
Author: Adam Leskis
Author-email: leskis@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Description-Content-Type: text/markdown
Requires-Dist: spacy (>=2.0.16)

# Awlify

[![made-with-python](https://img.shields.io/badge/Made%20with-Python3.6-1f425f.svg)](https://www.python.org/)
[![GitHub license](https://img.shields.io/github/license/Naereen/StrapDown.js.svg)](https://github.com/lpmi-13/awlify-python/blob/master/LICENSE)


A very basic tool that takes in a sentence of text and outputs
the same text, annotated with information about whether any of
its words are in the [Academic Word List](https://www.victoria.ac.nz/lals/resources/academicwordlist/information).

## installing
`pip install awlify`

and if you haven't used spacy on your system before, you'll need
to install the model we're using here with the command below:

`python -m spacy download en_core_web_sm`

## tests
`python -m unittest`

## usage inside a file
```
from awlify import awlify

result = awlify('please inform me of the academic words in this sentence')

print(result)
{"data": {"sentence": "please inform me of the academic words in this sentence", "awl_words": [{"index": 5, "word": "academic", "meta": {"head": "academy", "sublist": 5}}]}}
```

## usage from the command line
`python -m awlify 'this is a sentence to check'`

`{"data": {"sentence": "this is a sentence to check", "awl_words": []}}`

## expected input / output
format for output:
```
{
  "data": {
    "sentence": "THIS IS THE ORIGINAL SENTENCE",
    "awl_words": [
      {
        "index": INDEX_OF_AWL_WORD_FOUND,
        "word": "AWL_WORD_FOUND",
        "meta": {
          "head": "THE_HEADWORD_FROM_THE_AWL",
          "sublist": THE_AWL_SUBLIST_OF_THE_WORD
        }
      }
    ]
  }
}
```

example input for a simple sentence (no AWL words):
```
simple_sentence = awlify('this is a sentence')
```


example output for a simple sentence (no AWL words):
```
{
  "data": {
    "sentence": "this is a sentence",
    "awl_words": []
  }
}
```

example input for a complex sentence (a few AWL words):
```
complex_sentence = awlify('the economic recovery is ongoing and potentially problematic')
```

example output for a complex sentence (a few AWL words):
```
{
  "data": {
    "sentence": "the economic recovery is ongoing and potentially problematic",
    "awl_words": [
      {
        "index": 1,
        "word": "economic",
        "meta": {
          "head": "economy",
          "sublist": 1
        }
      },
      {
        "index": 2,
        "word": "recovery",
        "meta": {
          "head": "recover",
          "sublist": 6
        }
      },
      {
        "index": 6,
        "word": "potentially",
        "meta": {
          "head": "potential",
          "sublist": 2
        }
      }
    ]
  }
}
```

## NOTES

The current implementation of the sentence tokenization uses spacy,
and so it's a bit heavier than absolutely necessary, since we're
not taking advantage of any of the more advanced characteristics
of the package.

In theory, it could probably perform 98% as well with just a simple
regex, so I might add the option to do that in the future if there
aren't any real use cases for needing the full weight of spacy.

## REFERENCES
Coxhead, Averil (2000) A New Academic Word List. TESOL Quarterly, 34(2): 213-238.


