Metadata-Version: 2.1
Name: ImageDuplicateFinder
Version: 0.6.0
Summary: Simple duplication finder for Images, matches on names and then compares image hashes.
Home-page: http://pypi.python.org/pypi/picture_duplicate_finder/
Author: Michael Hermelschmidt
Author-email: Michael Hermelschmidt <mail.hermel@gmail.com>
License:  Copyright (c) 2022 Michael Hermelschmidt
         
         Permission is hereby granted, free of charge, to any person obtaining a copy
         of this software and associated documentation files (the "Software"), to deal
         in the Software without restriction, including without limitation the rights
         to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
         copies of the Software, and to permit persons to whom the Software is
         furnished to do so, subject to the following conditions:
         
         The above copyright notice and this permission notice shall be included in all
         copies or substantial portions of the Software.
         
         THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
         EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
         MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
         IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
         DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
         OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
         OR OTHER DEALINGS IN THE SOFTWARE.
Project-URL: Homepage, https://github.com/zeronyk/image_duplication
Keywords: image,duplicates,duplicate-images,deletion,imagehash,remover,duplication-detection
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE

# ImageDuplicationFinder
Finds duplicated images in Folders. It finds duplicated images first matched on the name. So **both images have to have the same name**. After a match is found based on the name, it will compare image hashes to make sure the both images are identical!

-------

Usecase: You have multiple hard-drives, which all contain pictures. But are messy copied (for example after recovery). Copy both hard drives on a single one, or remove every duplicated picture from your hard-drives.


## Installation
-----
> pip install ImageDuplicationFinder 

or just clone this repository and run 

>pip install .

## Overview
-----
 - There are 3 stages : 
 
  - 0:Syntax match (find identical names), 
  - 1:Semantic match (compare images based on the pixelvalue)
  - 2:Deletion (delete Syntax AND Semantic matches)

 - If you only want to check for duplication, use -csv flag, it will print out a csv file with found dupications at the destination path given (skipping deletion stage)

 - This programm will remove all duplicates from path1 AND path2! If you have duplications in the path1 folder, they will be found!
 
 - This program is designed for big workloads (> 1tb ) in mind, it supports multithreading for speedup (will spawn as many threads as cores) and log the process to 

 - This program will output a log file on the log position, will create a logfile at default (duplication.log)

 - deletion is made at the very end, so if you break in comparison-stage, you wont delete anything


## Features in progress
---- 
- make Syntax matching optional (use lvl parameter)
- copy all data to a destination folder after duplication removal


## Formates 
----- 
Images and junk are destinct by formates, (only matters if run with the remove-other option) :

- **Not junk**: ('.wav', '.mp3', '.png', '.jpg', '.jpeg', '.gif', '.tiff', '.psd','.bmp', '.eps', '.ai', '.indd', '.raw', '.webm', '.mkv', '.flv', '.vob', '.ogv', '.ogg', '.drc', '.gif', '.mng', '.avi', '.mts', '.m2ts', '.ts', '.mov', '.qt', '.wmv', '.yuv', '.rm', '.rmvb', '.viv', '.asf', '.mp4', '.m4p', '.m4v', '.mpg', '.mp2', '.mpeg', '.mpe', '.mpv', '.mpg', '.mpeg', '.m2v', '.m4v', '.svi','.3gp', '.3g2', '.mxf', '.roq', '.nsv', '.flv', '.f4v', '.f4p','.f4a', '.f4b', '.doc', '.pdf', '.docx', '.docm', '.dot', '.odt', '.rtf', '.txt', '.csv', '.dif', '.xls')

- **Images**: (".png", ".jpg", ".jpeg", '.gif')


## Usage
-------

> idf -h

```
positional arguments:
  path1                 original path or list of paths
  path2                 path to check and optinal delete duplicates in

optional arguments:
  -h, --help            show this help message and exit
  -l LOG_FILE, --log-file LOG_FILE
                        path of the log file to be written, defaults to duplicates.log in current folder
  -o OUTPUT_CSV, --output-csv OUTPUT_CSV
                        ouputs csv list of duplicates
  -d, --delete          automatically deletes duplicates
  -t, --threading       use multithreading to help speedup the process
  -ts IMAGEHASH_THRESHOLD, --imagehash-threshold IMAGEHASH_THRESHOLD
                        if not used -a how much simularity must be on the imagehash of the pictues (values will be interpreted as percent) default is 100
  -rem, --remove-other  removes other files, that are not considerd documents (good if there is a lot of junk) only works with -d
```
or use as python function 

```
from image_duplicate_finder import find_duplicates


find_duplicates(path1, path2, csv = None, delete = False, t = False, ts = 100, lvl = 1, remove_others = False)

```
