Metadata-Version: 2.1
Name: bankstatementextractor-sau
Version: 1.4.0.0
Summary: This repository contains a Python program designed to extract Optical Character Recognition (OCR) data from bank statements in Saudi, detect income and classify expenses
Home-page: 
Author: Anjali
Author-email: anjalimenon217@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: setuptools
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: pandas
Requires-Dist: Unidecode
Requires-Dist: DateTime
Requires-Dist: Pillow
Requires-Dist: PyPDF2
Requires-Dist: Python-IO
Requires-Dist: pdf2image
Requires-Dist: torch
Requires-Dist: opencv-python
Requires-Dist: scikit-learn
Requires-Dist: transformers
Requires-Dist: pdfplumber
Requires-Dist: google-cloud-translate

# Saudi Bank Statement Extraction
This repository contains a Python program designed to extract data from Saudi based bank statements.

## Table of Contents
1. Introduction
2. Prerequisites
3. Usage
4. Modules Description

## Introduction
The Python program imports several packages necessary for extracting data from bank statements. It accepts a pdf in bytes format,checks for tampering, if no tampering is detected converts the first page to a image and runs a custom trained YOLOv5 model to detect the bank. This results in a  bank label which then runs the code corresponding to extracting data from that bank statement.  The ouput of the extraction is a json file containing the revenues, expenses and cash flows by month.

## Prerequisites
Ensure the following packages are installed:
setuptools 
numpy
scipy
pandas
Unidecode
DateTime
Pillow
PyPDF2
Python-IO
pdf2image
torch
opencv-python
scikit-learn
pdfplumber
google-cloud-translate
You can install these packages using pip:

## Usage
To use this program, you can clone the repository, place your images in the same directory and modify the IMAGES list accordingly. Run the program in your terminal or command prompt as:
python ocr_and_facial_recognition.py

Please note that this program does not include any user interface and does not handle any errors or exceptions beyond what is included in the code.

## Modules Description
Importing Necessary Packages:
The program begins by importing all the necessary packages used in Bank Statement Extraction.

## Data Introduction:
This section defines a list of image file names that will be used as input for the OCR and facial recognition steps of the program.

## Load easyocr and Anti-Spoofing Model:
Two functions to load the easyOCR package with English language support and the anti-spoofing model respectively.

## Data Preprocessing:
Several functions are defined here to open and read an image file, convert it to grayscale, perform a radon transform, find the busiest rotation, and rotate the image accordingly.

## Facial recognition:
This section is dedicated to detecting faces in an image using a HOG (Histogram of Oriented Gradients) face detector, extracting features, and computing the similarity between two sets of features using the cosine similarity metric.

## Information Extraction:
Finally, the program uses OCR to extract information from an image, computes the similarity between faces in different images, and outputs this information in a JSON file.

Please refer to the source code comments for more detailed explanations.

This is a basic explanation of the project and its usage. This project was last updated on 24th May 2023 and does not have any GUI or error handling beyond what is included in the code. For more details, please refer to the comments in the source code.
