Metadata-Version: 2.1
Name: apsisocr
Version: 0.0.2
Summary: Bangla English Mixed Languge Scene / printed OCR Toolkit
Home-page: https://github.com/mnansary/apsis-ocr/
Author: Nazmuddoha Ansary
Author-email: nazmuddoha.ansary@apsissolutions.com
License: MIT
Keywords: ocr,multilingual-ocr,scene ocr,apsisnet
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Education
Classifier: Operating System :: POSIX :: Linux
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Description-Content-Type: text/markdown

# apsis-ocr
![](/deployment/images/apsis.png) 

Apsis-OCR is a Mixed language ocr system for Printed Documents developed at [Apsis Solutions limited](https://apsissolutions.com/)

The full system is build with 3 components: 
* Text detection : DBNet
* Text recognition:
    * Bangla Text : ApsisNet 
        * ApsisNet is a model developed at Apsis Solutions Limited. 
        * It is used by [bbOCR](https://github.com/BengaliAI/bbocr/blob/dev/modules.md) as the recognition model 
        * ApsisNet is found to be the best among other available recognition models (such as tesseract and easyOCR) in the linked [paper](https://arxiv.org/abs/2308.10647)
    * English Text : SVTR-LCNet
* Text classification : DenseNet121    


# **Installation**


## **As module/pypi package**
### **cpu installation**

```bash
pip install apsisocr
pip install onnxruntime
pip install fastdeploy-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
```

### **gpu installation**

It is recommended to use conda environment . Specially for GPU.

* **installing cudatoolkit and cudnn**: 

```bash
conda install cudatoolkit
conda install cudnn
```

* **installing packages**

```bash
pip install apsisocr
pip install onnxruntime-gpu
python -m pip install -U fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
```

* **exporting environment variables**

```bash
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
```

## **Building from source : Linux/Ubuntu**
It is recommended to use conda environment .

* **clone the repository** : 
```bash
git clone https://github.com/mnansary/apsis-ocr.git
```


* **create a conda environment**: 

```bash
conda create -n apsisocr python=3.9
```

* **activate conda environment**: 

```bash
conda activate apsisocr

```
* **cpu installation**  :

```bash
bash install.sh cpu
``` 
* **gpu installation**  :
    
```bash
bash install.sh gpu
``` 

# Useage


## Apsisnet : Bangla Recognizer

* useage
```python
from apsisocr import ApsisNet
bnocr=ApsisNet()
bnocr.infer(crops)
```
* docstring for ```ApsisNet.infer```

```python
"""
Perform inference on image crops.

Args:
    crops (list[np.ndarray]): List of image crops.
    batch_size (int): Batch size for inference (default: 32).
    normalize_unicode (bool): Flag to normalize unicode (default: True).

Returns:
    list[str]: List of inferred texts.
"""
```

## SVTR-LCNet : English Recognizer

```python
from apsisocr import SVTRLCNet
enocr=SVTRLCNet()
enocr.infer(crops)
```

* docstring for ```SVTRLCNet.infer```

```python
"""
Perform inference on image crops.

Args:
    crops (list[np.ndarray]): List of image crops.
    batch_size (int): Batch size for inference.

Returns:
    list[str]: List of recognized texts.
"""
```


## DenseNet121BnEnClassifier : Language classifier

```python
from apsisocr import DenseNet121BnEnClassifier
lang=DenseNet121BnEnClassifier()
lang.infer(crops)
```

* docstring for ```DenseNet121BnEnClassifier.infer```

```python
"""
Perform inference on image crops.

Args:
    crops (list[np.ndarray]): List of image crops.
    batch_size (int): Batch size for inference (default: 32).

Returns:
    list[str]: List of inferred languages.
"""
```

## PaddleDBNet : Text Detector

* check [paddleOCR](https://github.com/PaddlePaddle/PaddleOCR) official website for better understanding of the model

```python
# initialization
from apsisocr import PaddleDBNet
detector=PaddleDBNet()
# getting word boxes
word_boxes=detector.get_word_boxes(img)
# getting line boxes
line_boxes=detector.get_line_boxes(img)
# getting crop with either of the results
crops=detector.get_crops(img,word_boxes)
```

## ApsisOCR : Overall System

```python
from apsisocr import ApsisOCR
ocr=ApsisOCR()
results=ocr(img_path)
```

* docstring for ```ApsisOCR.__call__```

```python
"""
Perform OCR on an image.

Args:
    img_path (str): Path to the image file.

Returns:
    dict: OCR results containing recognized text and associated information. The dictionary has the following structre
            {
            "text" : multiline text with newline separators
            "result" : list a dictionaries that contains the following structre:
                        {
                        "line_no" : the line number of the word
                        "word_no" : the word number in the line 
                        "poly"    : the four point polygonal bounding box of the word in the image
                        "text"    : the recognized text 
                        "lang"    : the classified language code
                        }
            }
"""  
```

**check ```useage/useage.ipynb``` for examples**


# **Deployment**
* ```cd deployment```: change directory to deployment folder
* change the configs as required in ```config.py```

```python
# This port will be used by the api_ocr.py 
OCR_API_PORT=3032
# This api address is diplayed after deploying api_ocr.py and this is used in app.py  
OCR_API="http://172.20.4.53:3032/ocr"
```
* running the api and app:

```bash
python api_ocr.py # deploys at port 3032 by defautl
streamlit run app.py --server.port 3033 # deploys streamlit built frontend at 3033 port
```
* The **api_ocr.py** lets the api to be used without any UI (a postman screenshot is attached below)

![](/deployment/images/api_ocr.png) 

* The **app.py** runs a streamlit UI 

![](/deployment/images/app.png) 


**TESTED GPU INFERENCE SERVER CONFIG**  

```python
OS          : Ubuntu 20.04.6 LTS      
Memory      : 62.4 GiB 
Processor   : Intel® Xeon(R) Silver 4214R CPU @ 2.40GHz × 24    
Graphics    : NVIDIA RTX A6000/PCIe/SSE2
Gnome       : 3.36.8
```
# License
Contents of this repository are restricted to non-commercial research purposes only under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). 

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a>



Change Log
===========

0.0.1 (24/09/2023)
-------------------
- added useage and examples

0.0.2 (24/09/2023)
-------------------
- added liscencing
