Metadata-Version: 2.4
Name: documentprocessinghub-ljd
Version: 0.6.0
Summary: File type identification and validation for document processing workflows
Author-email: LJD-UwU <himexpe.interns@hisense.com>
Maintainer-email: LJD-UwU <himexpe.interns@hisense.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/LJD-UwU/Document-Processing-Hub
Project-URL: Documentation, https://github.com/LJD-UwU/Document-Processing-Hub#readme
Project-URL: Repository, https://github.com/LJD-UwU/Document-Processing-Hub.git
Project-URL: Issues, https://github.com/LJD-UwU/Document-Processing-Hub/issues
Keywords: file-processing,file-validation,document-processing,file-type-detection
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: openpyxl>=3.1.5
Requires-Dist: pandas>=3.0.2
Requires-Dist: tqdm>=4.67.3
Dynamic: license-file

# Document Processing Hub

[![PyPI version](https://img.shields.io/pypi/v/documentprocessinghub-ljd.svg)](https://pypi.org/project/documentprocessinghub-ljd/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A Python library for searching, copying, and moving document files intelligently. Automatically find the latest file of a specific type and perform operations on it.

## 🌟 Features

- **Smart File Search**: Find the latest file of a specific type in predefined locations or custom folders
- **Intelligent Sorting**: Files are ordered by relevance (date, version, etc.) per type
- **Copy & Move Operations**: Perform file operations with simple method chaining
- **Two Search Modes**:
  - `search_file.exists` - Search in predefined system locations
  - `search_file.local` - Search in user-specified folders
- **Multiple File Types**: Support for ZJ, Manpower Budget, Manpower Documents, Manpower, Real-time Production
- **Zero Dependencies**: Pure Python implementation with no external dependencies

## 📦 Installation

### From PyPI

```bash
pip install documentprocessinghub-ljd
```

### From GitHub

```bash
git clone https://github.com/LJD-UwU/Document-Processing-Hub.git
cd Document-Processing-Hub
pip install .
```

## 🚀 Quick Start

```python
from documentprocessinghub import search_file

# Find latest file in predefined locations
ruta = search_file.exists.manpower_budget()
print(f"Found: {ruta}")

# Find and copy to another location
resultado = search_file.exists.manpower_budget(r"C:\Backup").copy()
print(f"Copied to: {resultado}")

# Find in local folder and move to another
resultado = search_file.local.zj(r"C:\Documentos", r"C:\Procesados").move()
print(f"Moved to: {resultado}")
```

## 🔍 API Reference: `search_file`

### Overview

The `search_file` API provides two modes for finding and manipulating files:

1. **`search_file.exists`** - Search in predefined system locations (fast & automatic)
2. **`search_file.local`** - Search in folders you specify (flexible & explicit)

**Return types:**
- Without destination: Returns `str` (file path)
- With destination: Returns `FileResult` object (for `.copy()` or `.move()`)

### Mode 1: `search_file.exists` - Predefined Locations

Only `manpower_budget` is available in exists mode.

#### Example 1: Get File Path Only

```python
from documentprocessinghub import search_file

# Returns the path of the latest Manpower Budget file found in predefined locations
ruta = search_file.exists.manpower_budget()

if ruta:
    print(f"Latest file: {ruta}")
    # Output: C:\Sistema\Archivos\Manpower Budget Rev 18.2.xlsx
else:
    print("No file found")

# Use it in your code
procesar_archivo(ruta)
```

#### Example 2: Find and Copy

```python
# Find latest file and copy it (original remains in place)
backup_path = search_file.exists.manpower_budget(r"C:\Backups").copy()

print(f"Backed up to: {backup_path}")
# Output: C:\Backups\Manpower Budget Rev 18.2.xlsx

# Practical use: Daily backup
from datetime import date
today = date.today().strftime("%Y%m%d")
backup_folder = f"C:\\Backups\\{today}"
search_file.exists.manpower_budget(backup_folder).copy()
```

#### Example 3: Find and Move

```python
# Find latest file and move it to another location
procesado_path = search_file.exists.manpower_budget(r"C:\Procesados").move()

print(f"Moved to: {procesado_path}")
# Output: C:\Procesados\Manpower Budget Rev 18.2.xlsx

# Note: The file is removed from original location
```

---

### Mode 2: `search_file.local` - Custom Folders

All file types are available in local mode:
- `manpower_budget`
- `manpower_documents`
- `zj`
- `manpower`
- `real_time_production`

#### Example 1: Get File Path from Local Folder

```python
# Search in a specific folder and get the latest file
ruta = search_file.local.zj(r"C:\Documentos")

if ruta:
    print(f"Latest ZJ file: {ruta}")
    # Output: C:\Documentos\ZJ26042912-8105.xlsx
else:
    print("No ZJ files found in folder")

# Multiple searches
zj_file = search_file.local.zj(r"C:\Docs")
budget_file = search_file.local.manpower_budget(r"C:\Docs")
docs_file = search_file.local.manpower_documents(r"C:\Docs")

print(f"ZJ: {zj_file}")
print(f"Budget: {budget_file}")
print(f"Documents: {docs_file}")
```

#### Example 2: Search Local and Copy

```python
# Find in one folder and copy to another
copia_path = search_file.local.manpower_budget(
    r"C:\Documentos\Entrada",
    r"C:\Documentos\Copia"
).copy()

print(f"Copied to: {copia_path}")
# Output: C:\Documentos\Copia\Manpower Budget Rev 18.1.xlsx

# Practical use: Process and backup
search_file.local.zj(
    r"C:\Entrada",
    r"C:\Backup"
).copy()  # Backup before processing
```

#### Example 3: Search Local and Move

```python
# Find in source folder and move to destination
procesado_path = search_file.local.manpower_documents(
    r"C:\Documentos\Entrada",
    r"C:\Documentos\Procesados"
).move()

print(f"Moved to: {procesado_path}")
# Output: C:\Documentos\Procesados\Manpower Documents Q1 2026.xlsx

# Practical use: Processing pipeline
for carpeta_entrada in [r"C:\Q1", r"C:\Q2", r"C:\Q3"]:
    resultado = search_file.local.manpower_documents(
        carpeta_entrada,
        r"C:\Procesados"
    ).move()
    if resultado:
        print(f"Procesado: {resultado}")
```

#### Example 4: Process Multiple File Types

```python
# Search for different types in the same folder
origen = r"C:\Documentos"
destino = r"C:\Procesados"

# Process each type
zj = search_file.local.zj(origen, destino).move()
budget = search_file.local.manpower_budget(origen, destino).move()
docs = search_file.local.manpower_documents(origen, destino).move()

# Log results
if zj:
    print(f"ZJ: {zj}")
if budget:
    print(f"Budget: {budget}")
if docs:
    print(f"Documents: {docs}")
```

---

### File Selection Criteria

The "latest" file is selected based on the file type:

**ZJ Files**: By date, version, and duplicate count
```
ZJ26042912-8105(4) > ZJ26042912-8105(1) > ZJ26042822-8005
     ↑ newer        ↑ same date         ↑ older date
                    ↑ higher duplicate
```

**MANPOWER_BUDGET**: By version and month
```
Rev 18.2 > Rev 18.1 > Rev 17.0 April
 ↑ newer   ↑ same major version
```

**MANPOWER_DOCUMENTS**: By year, month, and quarter
```
2026 Q2 > 2026 Q1 > 2025 Q4
↑ newer   ↑ newer in same year
```

**MANPOWER**: By month and day
```
April_29 > April_28 > March_28
 ↑ newer    ↑ newer in month
```

**REAL_TIME_PRODUCTION**: By modification date (most recent first)

---

## 💡 Common Patterns

### Pattern 1: Daily Backup

```python
from documentprocessinghub import search_file
from datetime import date

def daily_backup():
    today = date.today().strftime("%Y%m%d")
    backup_folder = f"C:\\Backups\\{today}"
    
    ruta = search_file.exists.manpower_budget(backup_folder).copy()
    if ruta:
        print(f"✓ Backup successful: {ruta}")
    else:
        print("✗ No file found to backup")

# Run daily
daily_backup()
```

### Pattern 2: Processing Pipeline

```python
from documentprocessinghub import search_file

def process_documents():
    entrada = r"C:\Entrada"
    procesados = r"C:\Procesados"
    
    # Process each type
    for tipo in ["zj", "manpower_budget", "manpower_documents"]:
        # Get the function dynamically
        search_func = getattr(search_file.local, tipo)
        
        resultado = search_func(entrada, procesados).move()
        if resultado:
            print(f"Procesado ({tipo}): {resultado}")

process_documents()
```

### Pattern 3: Safe Backup Before Processing

```python
from documentprocessinghub import search_file

def safe_process(carpeta_entrada):
    # Step 1: Backup the file
    backup = search_file.local.manpower_budget(
        carpeta_entrada,
        r"C:\Backup"
    ).copy()
    
    if not backup:
        print("✗ Error: No file found")
        return
    
    # Step 2: Process the original
    procesado = search_file.local.manpower_budget(
        carpeta_entrada,
        r"C:\Procesados"
    ).move()
    
    print(f"✓ Backed up: {backup}")
    print(f"✓ Processed: {procesado}")

safe_process(r"C:\Entrada")
```

---

## 🏗️ Project Structure

```
document-processing-hub/
├── documentprocessinghub/          # Main package
│   ├── __init__.py                # Package initialization
│   ├── fileSelector.py            # search_file API implementation
│   ├── scanNameFiles.py           # File type identification
│   ├── validators.py              # Format validation
│   └── paths_config.py            # Predefined paths configuration
├── examples/                       # Usage examples
│   ├── main.py                    # Interactive examples
│   └── USAGE.md                   # Detailed usage guide
├── pyproject.toml                 # Project configuration
├── README.md                       # This file
├── LICENSE                         # MIT License
└── .gitignore                      # Git ignore rules
```

---

## 📝 Changelog

### Version 0.4.0 (2026-04-30)

**Major Changes**

- **Renamed API**: `find_latest_file` → `search_file` (clearer intent)
- **Simplified behavior**: 
  - Without destination: Returns `str` (file path)
  - With destination: Returns `FileResult` for `.copy()` or `.move()`
- **Restricted exists mode**: Only `manpower_budget` available in `search_file.exists`
- **Enhanced documentation**: Complete docstrings for IDE support
- **All types in local**: All file types available in `search_file.local`

### Version 0.3.1 (2026-04-30)

**Fixes**

- Fixed FileResult.copy() missing destination argument
- Improved API parameter handling

### Version 0.3.0 (2026-04-29)

**New Features**

- Fluent API with dynamic methods for each file type
- FileResult class for file operations

### Version 0.2.0 (2026-04-29)

**New Features**

- Initial file search functionality
- Support for multiple file types
- Smart file selection based on date and version

---

## 📄 License

MIT License - See [LICENSE](LICENSE) file for details

## 👨‍💻 Author

**LJD-UwU**
- Email: himexpe.interns@hisense.com
- GitHub: [@LJD-UwU](https://github.com/LJD-UwU)

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

---

## 🔧 Data Processing: `process_file`

The `process_file` module provides tools for cleaning and processing Excel data.

### `clean_sheet()` - Clean and Flatten Excel Data

Automatically detects headers, cleans data, and creates a structured "Datos_Limpios" sheet.

#### Features:
- ✅ Automatic header detection
- ✅ Data normalization and cleaning
- ✅ Professional formatting (colors, borders, frozen headers)
- ✅ Intelligent column width adjustment
- ✅ Removes empty rows and duplicate columns
- ✅ Returns pandas DataFrame for further analysis

#### Usage:

```python
from documentprocessinghub import clean_sheet

# Option 1: Clean and overwrite
df = clean_sheet("datos.xlsx")

# Option 2: Save to new file
df = clean_sheet("entrada.xlsx", output_path="salida.xlsx")

# Option 3: Process specific sheet
df = clean_sheet("datos.xlsx", nombre_hoja="Producción")

# Result is a pandas DataFrame
print(df.shape)      # (rows, columns)
print(df.columns)    # Column names
print(df.head())     # First rows
```

#### What It Does:
1. Reads the Excel file
2. Detects headers and data rows
3. Cleans data (removes nulls, normalizes columns)
4. Applies professional formatting
5. Creates "Datos_Limpios" sheet with cleaned data
6. Returns DataFrame for analysis

---

## 📚 References

- [PyPI Package](https://pypi.org/project/documentprocessinghub-ljd/)
- [GitHub Repository](https://github.com/LJD-UwU/Document-Processing-Hub)
- [Usage Guide](examples/USAGE.md)

---

**Made with care for document processing automation**
