Metadata-Version: 2.1
Name: alpino-query
Version: 2.1.9
Summary: Generating XPATH queries based on a Dutch Alpino syntax tree and user-specified token properties.
Home-page: https://github.com/UUDigitalHumanitieslab/alpino-query
Author: Digital Humanities Lab, Utrecht University
Author-email: digitalhumanities@uu.nl
License: CC BY-NC-SA 4.0
Platform: UNKNOWN
Requires-Python: >=3.7, <4
Description-Content-Type: text/markdown
License-File: LICENSE

[![Actions Status](https://github.com/UUDigitalHumanitiesLab/alpino-query/workflows/Python%20package/badge.svg)](https://github.com/UUDigitalHumanitiesLab/alpino-query/actions)

# Alpino Query

```bash
pip install alpino-query
```

When running locally without installing, instead of `alpino-query` use `python -m alpino_query`.

## Parse

Parse a tokenized sentence using the Alpino instance running on [gretel.hum.uu.nl](https://gretel.hum.uu.nl).

For example:

```bash
alpino-query parse Dit is een voorbeeldzin .
```

Note that the period is a separate token.

It also works when the sentence is passed as a single argument.

```bash
alpino-query parse "Dit is een voorbeeldzin ."
```

## Mark

Mark which part of the treebank should selected for filtering. It has three inputs:

1. [Lassy/Alpino XML](https://www.let.rug.nl/~vannoord/Lassy/)
2. the tokens of the sentence
3. for each token specify the properties which should be marked

For example:

```bash
alpino-query mark "$(<tests/data/001.xml)" "Dit is een voorbeeldzin ." "pos pos pos pos pos"
```

It is also possible to mark multiple properties for a token, this is done by separating them with a comma. Each of these can also be specified to be negated. These will then be marked as 'exclude' in the tree.

```bash
alpino-query mark "$(<tests/data/001.xml)" "Dit is een voorbeeldzin ." "pos pos,-word,rel pos pos pos"
```

## Subtree

Generates a subtree containing only the marked properties. It will also contain additional attributes to mark that properties should be excluded and/or case sensitive.

The second argument can be empty, `cat`, `rel` or both (i.e. `catrel` or `cat,rel`). This indicates which attributes should be removed from the top node. When only one node is left in the subtree, this argument is ignored.

```bash
alpino-query subtree "$(<tests/data/001.marked.xml)" cat
```

## XPath

Generates an XPath to query a treebank from the generated subtree. Second argument indicates whether a query should be generated which is order-sensitive.

```bash
alpino-query xpath "$(<tests/data/001.subtree.xml)" 0
```

## Using as Module

```python
from alpino_query import AlpinoQuery

tokens = ["Dit", "is", "een", "voorbeeldzin", "."]
attributes = ["pos", "pos,-word,rel", "pos", "pos", "pos"]

query = AlpinoQuery()
alpino_xml = query.parse(tokens)
query.mark(alpino_xml, tokens, attributes)
print(query.marked_xml) # query.marked contains the lxml Element

query.generate_subtree(["rel", "cat"])
print(query.subtree_xml) # query.subtree contains the lxml Element

query.generate_xpath(False) # True to make order sensitive
print(query.xpath)
```

## Considerations

### Exclusive

When querying a node this could be exclusive in multiple ways.
For example:

* a node should not be a noun `node[@pos!="noun"]`
* it should not have a node which is a noun `not(node[@pos="noun"])`

The first statement does *require* the existence of a node, whereas the second also holds true if there is no node at all. When a token is only exclusive (e.g. not a noun) a query of the second form will be generated, if a token has both inclusive and exclusive properties a query of the first form will be generated.

### Relations

`@cat` and `@rel` are always preserved for nodes which have children. The only way for this to be dropped is for when all the children are removed by specifying the `na` property for the child tokens.

## Upload to PyPi

```bash
pip install twine
python setup.py sdist
twine upload dist/*
```


