Metadata-Version: 2.1
Name: bagpipes-spacy
Version: 0.1.2
Summary: A collection of spaCy components for rules-based detection.
Home-page: https://github.com/wjbmattingly/keyword-spacy
Author: WJB Mattingly
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Description-Content-Type: text/markdown

[![GitHub Stars](https://img.shields.io/github/stars/wjbmattingly/bagpipes-spacy?style=social)](https://github.com/wjbmattingly/bagpipes-spacy)
[![PyPi Version](https://img.shields.io/pypi/v/bagpipes-spacy)](https://pypi.org/project/bagpipes-spacy/0.0.1/)
[![PyPi Downloads](https://img.shields.io/pypi/dm/bagpipes-spacy)](https://pypi.org/project/bagpipes-spacy/0.0.1/)

# Bagpipes spaCy

![number spacy logo](https://github.com/wjbmattingly/bagpipes-spacy/blob/main/images/bagpipes-spacy-logo.png?raw=true)


Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities. These components include:

1. **Quote Detector**: Identifies and extracts quotes from the text.
2. **Phrases Extractor**: Extracts various types of phrases such as prepositional, noun, verb, adjective, and adverbial phrases.
3. **Normalizer**: Normalizes the text by expanding contractions, removing special characters, and more.
4. **Triple Detector**: Extracts triples (subject, predicate, object) from the text.
5. **Entity Similarity**: Computes similarity between entities in the text and maps similar entities.

## Table of Contents

- [Bagpipes spaCy](#bagpipes-spacy)
  - [Table of Contents](#table-of-contents)
  - [Installation](#installation)
  - [Usage](#usage)
    - [Integrating the Components into your spaCy Pipeline](#integrating-the-components-into-your-spacy-pipeline)
    - [Text Processing with the Pipeline](#text-processing-with-the-pipeline)
      - [Retrieving the Extracted Quotes](#retrieving-the-extracted-quotes)
      - [Retrieving the Extracted Phrases](#retrieving-the-extracted-phrases)
      - [Retrieving the Normalized Text](#retrieving-the-normalized-text)
      - [Retrieving the Extracted Triples](#retrieving-the-extracted-triples)
      - [Retrieving the Entity Similarities and Mappings](#retrieving-the-entity-similarities-and-mappings)

## Installation

To install Bagpipes spaCy, execute:

```
pip install bagpipes-spacy
```

## Usage

### Integrating the Components into your spaCy Pipeline

Begin by importing the components and then integrating them into your spaCy pipeline:

```python
import spacy
from bagpipes_spacy import QuoteDetector, PhrasesExtractor, Normalizer, TripleDetector, EntitySimilarity

# Initialize your preferred spaCy model
nlp = spacy.blank('en')

# Integrate the components into the pipeline
nlp.add_pipe('quote_detector')
nlp.add_pipe('phrases_extractor')
nlp.add_pipe('normalizer')
nlp.add_pipe('triple_detector')
nlp.add_pipe('entity_similarity')
```

### Text Processing with the Pipeline

After adding the components, you can process text as you typically would:

```python
text = "She said, \"I'm going to the store.\" The store is located near the river."
doc = nlp(text)
```

#### Retrieving the Extracted Quotes

You can access the extracted quotes from the `doc._.quotes` attribute:

```python
for quote in doc._.quotes:
    print(quote)
```

Output:

```
"I'm going to the store."
```

#### Retrieving the Extracted Phrases

You can access the extracted phrases from the `doc._` attributes:

```python
for prep_phrase in doc._.prep_phrases:
    print(prep_phrase)
```

Output:

```
near the river
```

Repeat for `doc._.noun_phrases`, `doc._.verb_phrases`, `doc._.adj_phrases`, and `doc._.adv_phrases`.

#### Retrieving the Normalized Text

The normalizer modifies the `doc` object itself, so you can access the normalized text as you usually would:

```python
print(doc.text)
```

Output:

```
She said, "I am going to the store." The store is located near the river.
```

#### Retrieving the Extracted Triples

You can access the extracted triples from the `doc._.triples` attribute:

```python
for triple in doc._.triples:
    print(triple)
```

Output:

```
('store', 'located near', 'river')
```

#### Retrieving the Entity Similarities and Mappings

You can access the entity similarities and mappings from the `doc._.ent_similarity` and `doc._.ent_mappings` attributes:

```python
for ent1, ent2, similarity in doc._.ent_similarity:
    print(ent1, ent2, similarity)

for ent, mappings in doc._.ent_mappings.items():
    print(ent, mappings)
