Metadata-Version: 2.4
Name: scrapely-client
Version: 1.0.1
Summary: Python client for Scrapely browser automation service - simple, intuitive web scraping and automation
Home-page: https://github.com/YahyaRehman6/scrapely_client
Author: Yahya Rehman
Author-email: Yahya Rehman <yahyarehman57@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/yourusername/scrapely-client
Project-URL: Documentation, https://github.com/yourusername/scrapely-client#readme
Project-URL: Repository, https://github.com/yourusername/scrapely-client
Project-URL: Bug Tracker, https://github.com/yourusername/scrapely-client/issues
Project-URL: Changelog, https://github.com/yourusername/scrapely-client/blob/main/CHANGELOG.md
Keywords: scraping,web-scraping,automation,browser-automation,websocket,selenium-alternative,playwright-alternative,web-crawler,bot
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: websocket-client>=1.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Requires-Dist: mypy>=0.990; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# Scrapely Client

A Python client library for the Scrapely browser automation service. Scrapely provides a simple, intuitive API for web scraping and browser automation through WebSocket connections.


## Installation

```bash
pip install scrapely-client
```

## Prerequisites

Before using the Scrapely client, you **must** have the Scrapely service running. This client is a lightweight wrapper that connects to your self-hosted Scrapely server.

### Step 1: Setup the Scrapely Service

The Scrapely service provides the browser automation backend with anti-bot capabilities. You need to deploy it first:

**🔗 Clone and setup the service from:** [https://github.com/YahyaRehman6/scrapely](https://github.com/YahyaRehman6/scrapely)

**Quick setup with Docker:**
```bash
# Clone the Scrapely service repository
git clone https://github.com/YahyaRehman6/scrapely.git
cd scrapely

# Build and run with Docker
docker build -t scrapely .
docker run -d \
  --name scrapely-server \
  -p 5050:5050 \
  --restart unless-stopped \
  scrapely

# Verify it's running
docker logs scrapely-server
```

### Step 2: Install the Client

Once your Scrapely service is running, install this client package:
```bash
pip install scrapely-client
```

## Quick Start

### Basic Usage with Context Manager (Recommended)

```python
from scrapely import Scrapely

# Connect to Scrapely service
with Scrapely("ws://localhost:5050/connect") as browser:
    # Navigate to a website
    browser.goto("https://example.com")
    
    # Find and interact with elements
    search_box = browser.find("#search")
    search_box.type("Python web scraping")
    
    # Click a button
    browser.click("#search-button")
    
    # Extract data
    results = browser.find_all(".result-item")
    for result in results:
        title = result.get_text()
        link = result.get_attr("href")
        print(f"{title}: {link}")
```

### Manual Connection Management

```python
from scrapely import Scrapely

browser = Scrapely("ws://localhost:5050/connect")
browser.connect()

try:
    browser.goto("https://example.com")
    text = browser.get_text("h1")
    print(text)
finally:
    browser.close()
```

## Core Features

### Navigation

```python
# Navigate to URL
browser.goto("https://example.com", timeout=15, wait_for_load=True)

# Wait for specific page
browser.wait_for_page("https://example.com/results")

# Get current URL
current = browser.current_url()
print(current)
```

### Finding Elements

```python
# Find single element
button = browser.find("#submit-btn")

# Find multiple elements
items = browser.find(".list-item", multiple=True)

# Check if element exists
if browser.exists("#optional-element"):
    print("Element found!")

# Find elements within a parent
container = browser.find("#main-container")
child = container.find(".child-element")
```

### Element Interactions

```python
# Click elements
browser.click("#button")
button = browser.find("#button")
button.click()

# Type into input fields
browser.type("user@example.com", "#email")
browser.type("password123", "#password", clear=True)

# Scroll to element
browser.scroll_to("#footer")
element = browser.find("#section")
element.scroll_to()
```

### Extracting Data

```python
# Get text content
title = browser.get_text("h1")
element = browser.find(".article-title")
title = element.get_text()

# Get attributes
link = browser.get_attr("href", "a.download")
image_src = browser.find("img").get_attr("src")
data_id = browser.get_attr("data-id", ".item")
```

### Advanced Features

#### JavaScript Execution

```python
# Execute JavaScript in page context
result = browser.run_js("return document.title")

# Execute JavaScript on specific element
browser.run_js_on_element("this.style.border = '2px solid red'", "#highlight")
```

#### XPath Support

```python
# Click element using XPath
browser.click_xpath("//button[contains(text(), 'Submit')]")
```

#### Dropdown Selection

```python
# Select by value
browser.select_option("#country", value="us")

# Select by index
browser.select_option("#country", index=2)
```

#### Shadow DOM

```python
# Access shadow DOM elements
shadow_root = browser.get_shadow_root("#shadow-host")
shadow_element = browser.find(element_id=shadow_root["text"])
text = shadow_element.get_text()
```

#### Drag and Drop

```python
# Drag one element to another
browser.drag_and_drop_to(".draggable-item", ".drop-zone")
```

### Batch Operations

Execute multiple operations efficiently in a single request:

```python
operations = [
    {
        "method": "goto",
        "kwargs": {"url": "https://example.com"}
    },
    {
        "method": "type",
        "kwargs": {"selector": "#email", "text": "user@example.com"}
    },
    {
        "method": "click",
        "kwargs": {"selector": "#submit"}
    },
    {
        "method": "get_text",
        "kwargs": {"selector": ".result"}
    }
]

results = browser.batch_operations(operations)
for result in results:
    print(f"Status: {result['status']}, Data: {result['data']}")
```

## Complete Examples

### E-commerce Scraping

```python
with Scrapely() as browser:
    browser.goto("https://shop.example.com")
    
    # Search for products
    search = browser.find("#search-input")
    search.type("laptop")
    browser.click("#search-button")
    
    # Wait for results
    browser.wait_for_page("https://shop.example.com/search")
    
    # Extract product information
    products = browser.find(".product-card", multiple=True)
    
    for product in products:
        name = product.find(".product-name").get_text()
        price = product.find(".price").get_text()
        link = product.find("a").get_attr("href")
        
        print(f"Product: {name}")
        print(f"Price: {price}")
        print(f"URL: {link}")
        print("-" * 40)
```

### Form Automation

```python
with Scrapely() as browser:
    browser.goto("https://forms.example.com")
    
    # Fill out form
    browser.type("John Doe", "#name")
    browser.type("john@example.com", "#email")
    browser.type("555-0123", "#phone")
    
    # Select dropdown
    browser.select_option("#country", value="us")
    
    # Check checkbox
    browser.click("#terms-checkbox")
    
    # Submit form
    browser.click("#submit-button")
    
    # Get confirmation message
    message = browser.get_text(".confirmation-message")
    print(message)
```

### Dynamic Content Handling

```python
with Scrapely() as browser:
    browser.goto("https://dynamic-site.example.com")
    
    # Scroll to load more content
    for _ in range(5):
        browser.scroll_to(".load-more-trigger")
        time.sleep(1)  # Wait for content to load
    
    # Extract all loaded items
    items = browser.find(".content-item", multiple=True)
    print(f"Loaded {len(items)} items")
    
    for item in items:
        title = item.find("h2").get_text()
        description = item.find(".description").get_text()
        print(f"{title}: {description}")
```

## Configuration

### Custom WebSocket URL

```python
# Connect to custom service URL
browser = Scrapely("ws://192.168.1.100:8080/connect")
```

### Connection Timeout

```python
# Set custom connection timeout (in seconds)
browser = Scrapely("ws://localhost:5050/connect", timeout=60)
```

## Error Handling

```python
from scrapely import Scrapely, ConnectionError, OperationError

try:
    with Scrapely() as browser:
        browser.goto("https://example.com")
        browser.click("#non-existent-button")
        
except ConnectionError as e:
    print(f"Failed to connect: {e}")
    
except OperationError as e:
    print(f"Operation failed: {e}")
```

## API Reference

### Scrapely Class

#### Connection Methods
- `connect()` - Establish WebSocket connection
- `close()` - Close connection and cleanup

#### Navigation Methods
- `goto(url, timeout, wait_for_load)` - Navigate to URL
- `wait_for_page(url, timeout)` - Wait for page URL to match
- `current_url()` - Get current page URL

#### Element Finding Methods
- `find(selector, element_id, timeout, multiple)` - Find element(s)
- `exists(selector, element_id)` - Check if element exists

#### Interaction Methods
- `click(selector, element_id, timeout)` - Click element
- `type(text, selector, element_id, timeout, clear)` - Type text
- `scroll_to(selector, element_id, timeout)` - Scroll to element

#### Data Extraction Methods
- `get_text(selector, element_id, timeout)` - Get element text
- `get_attr(attribute, selector, element_id, timeout)` - Get attribute value

#### Advanced Methods
- `run_js(script, *args)` - Execute JavaScript
- `run_js_on_element(script, selector, element_id, timeout)` - Execute JS on element
- `click_xpath(xpath)` - Click using XPath
- `select_option(selector, index, value, timeout)` - Select dropdown option
- `get_shadow_root(selector, element_id, timeout)` - Access shadow DOM
- `drag_and_drop_to(draggable, droppable, timeout)` - Drag and drop

#### Batch Methods
- `batch_operations(operations)` - Execute multiple operations

### Element Class

Element objects support chainable method calls:

- `click(timeout)` - Click this element
- `type(text, timeout, clear)` - Type into this element
- `get_text(timeout)` - Get text content
- `get_attr(attribute, timeout)` - Get attribute value
- `scroll_to(timeout)` - Scroll to this element
- `run_js(script, timeout)` - Execute JavaScript on this element
- `exists(timeout)` - Check if element still exists
- `find(selector, timeout)` - Find child element
- `find_all(selector, timeout)` - Find all child elements

## Requirements

- Python 3.7+
- websocket-client
- Running Scrapely service instance

## License

Apache License

## Support

For issues, questions, or contributions, please visit the project repository.

## Changelog

### Version 1.0.0
- Initial release
- Core browser automation features
- Element caching and chaining
- Batch operations support
- Shadow DOM support
- Drag and drop functionality
