Metadata-Version: 1.0
Name: PlagiarismDetector
Version: 0.1.3
Summary: Calculates percentage of N-Tuples from file1 found in another file2
Home-page: http://pypi.python.org/pypi/PlagiarismDetector/
Author: Ayush Singh
Author-email: singhay@ccs.neu.edu
License: MIT
Description-Content-Type: UNKNOWN
Description: ===================
        Plagiarism Detector
        ===================
        
        Calculates percentage of N-Tuples from file1 found in another file2::
        
            from plagiarismdetector.detector import Detector
        
            print Detector.detect(synonyms_file_name,
                                   eval_file_name,
                                   source_file_name,
                                   n_tuples_value=3)
        
        ======================
        Assumptions & Overview
        ======================
        * English language text, reason being it divides strings based off that might fail for e.g. in Hindi since sentence separators and punctuations are entirely different.
        * Used Penn TreeBank tokenizer for splitting up strings
        * The module is optimzed to be as fast as possible, some of the optimizations are:
            * Only n-grams for file 2 are generated and stored, file 1 n-tuples are generated but not stored.
            * Not holding generated n-grams in memory, a generator is used
            * Dictionary of n-grams is created from file2 n-grams for constant time lookup of file1 tuples
            * Keys in file2 n-gram dictionary contains hashes for tuples instead of actual tuples to reduce space complexity.
            * Since we are only concerned with percentage of file1 n-tuples found in file2, we do not need to store any tuples. Therefore, we first generate n-grams for file2 and then calculate count for file1 on the fly instead of generating all n-tuples of file1 and cross referencing it with those of file2.
        
        
        Testing
        =========
        ``python -m unittest discover tests``
        
        Help
        =========
        ``python main.py -h``
        
        positional arguments
        --------------------
          synonym_file_path     Path to file to be used for synonyms
          evaluation_file_path  Path to file to be evaluated
          source_file_path      Path to file to be used as source for matching
          n-tuples              Number of N-tuples, Optional and Defaults to 3
        
        optional arguments
        ------------------
          -h, --help            show this help message and exit
        
        Example
        =========
          ================   ===============
          Returns            100.0
          ================   ===============
          Synonyms           run jog spring
          Evaluation File    go for a run
          Source File        go for a jog
          N-tuples           3
          ================   ===============
Platform: UNKNOWN
