Metadata-Version: 2.1
Name: ResearchMe
Version: 0.1.1
Summary: A Python library for programmatic access to Library Genesis
Author: Tabinda Touqeer
License: MIT
Project-URL: Homepage, https://github.com/tabinda-touqeer/ResearchMe
Project-URL: Repository, https://github.com/tabinda-touqeer/ResearchMe
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.25.0
Requires-Dist: beautifulsoup4>=4.9.0
Requires-Dist: urllib3>=1.26.0

# ResearchMe

`researchme` is a Python library that provides a programmatic interface for accessing [Library Genesis](https://libgen.li), allowing users to search and retrieve metadata about books, as well as download available resources. This library is useful for automating access to Library Genesis content, such as searching by title or author, retrieving detailed book information, and filtering results based on various criteria.

## Features

- **Search Library Genesis**: Search for books and other materials using fields like title, author, ISBN, year, etc.
- **Retrieve Metadata**: Get detailed information about each item, including author, publisher, year, language, page count, and download links.
- **Download Links**: Programmatically resolve direct download links for available content.
- **Flexible Filtering**: Apply filters based on fields and categories to refine search results.

## Installation

You can install `researchme` directly from PyPI:

```bash
pip install researchme
```

## Usage

Below are examples of how to use `researchme` for common tasks such as searching for books and retrieving metadata.

### Basic Example: Searching for Books

```python
from researchme.libgen import Mirror1

# Initialize the Mirror1 class
mirror = Mirror1()

# Define search fields (e.g., search in title and author fields)
fields = mirror.search_fields(title=True, authors=True)

# Perform a search by title and author
mirror.search(query="Artificial Intelligence", search_by=fields)

# Retrieve metadata for the first 50 results
metadata = mirror.get_metadata(max_entries=50)

# Display metadata
for entry in metadata:
    print(entry)
```

### Using `search_categories` to Filter by Categories

The `search_categories` method allows you to specify categories, such as fiction, scientific articles, or magazines, to filter your search results.

```python
from researchme.libgen import Mirror1

# Initialize the Mirror1 class
mirror = Mirror1()

# Define search categories (e.g., search in fiction and scientific articles)
categories = mirror.search_categories(fiction=True, scientific_articles=True)

# Perform a search with category filters
mirror.search(query="Machine Learning", categories=categories)

# Retrieve metadata for the first 20 results
metadata = mirror.get_metadata(max_entries=20)

# Display metadata
for entry in metadata:
    print(entry)
```

### Retrieving JSON Data

The `get_json` method retrieves additional metadata in JSON format, allowing for easy access to fields such as MD5 hash or other metadata identifiers.

```python
# Retrieve JSON data from the initialized session
json_data = mirror.get_json()

# Access and print each item ID with its MD5 hash
for item_id, item_data in json_data.items():
    print(item_id, item_data['md5'])
```

In this example, the `search_categories` method constructs category-specific URL filters to narrow the search results to specific types of content.


### Filtering Search Results

The `filtered` method allows for filtering metadata based on specific criteria, such as author name or publication year.

```python
# Example of filtering results by author and year
filtered_results = mirror.filtered(metadata, authors="John Shawe-Taylor", year="1999")
for entry in filtered_results:
    print(entry)
```

### Resolving a Download Link

To get a direct download link for a specific item:

```python
# Get the first download URL from first entry of the metadata list
url = metadata[0]['content_url'][0]

# Resolve the full download link by passing the URL to the resolve_download method
download_link = mirror.resolve_download(url)
print(f"Resolved Download Link: {download_link}")
```

## API Reference

### Mirror1

The `Mirror1` class provides core functionalities for interacting with Library Genesis, such as searching, retrieving metadata, and downloading content.

---

#### `search(query: str, search_by=None, categories=None, max_results=100, page=1)`

Initiates a search on Library Genesis based on a query and optional filters for specific fields or categories.

- **Parameters**:
  - **`query` (str, required)**: The main search term, such as a book title, author, or keyword.
  - **`search_by` (list, optional)**: A list of search field filters (e.g., title, author) generated by the `search_fields` method. If not provided, all fields will be searched by default.
  - **`categories` (list, optional)**: A list of category filters (e.g., fiction, scientific articles) generated by the `search_categories` method. If not specified, all categories are included in the search by default.
  - **`max_results` (int, optional)**: Limits the number of results to either 25, 50, or 100. Defaults to 100. If an invalid value is provided, the library defaults to 100 results.
  - **`page` (int, optional)**: Specifies the page number for pagination of results. Defaults to 1.

- **Returns**: Initializes the session and stores the parsed HTML content, which can later be accessed using `get_metadata()`.

---

#### `get_metadata(max_entries=100)`

Retrieves metadata from the previously initialized search results, including fields such as title, author, publisher, year, language, and download links.

- **Parameters**:
  - **`max_entries` (int, optional)**: The maximum number of metadata entries to retrieve, with a default of 100.

- **Returns**: A list of dictionaries, each containing metadata for an item found in the search results.

---

#### `get_json()`

Fetches and returns a formatted JSON representation of additional metadata if available. Each key in the JSON data represents an item ID, with values as dictionaries containing metadata fields, such as the MD5 hash.

- **Parameters**: None

- **Returns**: A dictionary in JSON format, where each key represents an item ID, and the value is a dictionary containing metadata details like `md5` and other fields.

- **Example Usage**:

 ---

#### `resolve_download(url: str)`

Resolves and returns the direct download link for a specified item URL from Library Genesis.

- **Parameters**:
  - **`url` (str, required)**: The URL of the Library Genesis page containing the item to be downloaded.

- **Returns**: A direct download link for the specified item, or `None` if the download link cannot be resolved.

---

#### `filtered(metadata, title='', authors='', language='', year='', publisher='')`

Filters a metadata list based on specified criteria, allowing further refinement of search results.

- **Parameters**:
  - **`metadata` (list, required)**: The metadata list returned by `get_metadata()`.
  - **`title` (str, optional)**: Filters results to include only items with titles that contain this string.
  - **`authors` (str, optional)**: Filters results to include only items with authors matching this string.
  - **`language` (str, optional)**: Filters results to include only items with languages matching this string.
  - **`year` (str, optional)**: Filters results to include only items published in this year.
  - **`publisher` (str, optional)**: Filters results to include only items from this publisher.

- **Returns**: A filtered list of metadata dictionaries based on the specified criteria.

---

### Static Methods for Search Parameters

The following static methods help create parameter filters to use with the `search()` method.

#### `search_fields(title=False, authors=False, series=False, year=False, publisher=False, isbn=False)`

Constructs a list of search field filters to be applied in a search. By default, no fields are filtered unless specified.

- **Parameters**:
  - **`title` (bool, optional)**: Filter by title field if `True`.
  - **`authors` (bool, optional)**: Filter by author field if `True`.
  - **`series` (bool, optional)**: Filter by series field if `True`.
  - **`year` (bool, optional)**: Filter by publication year if `True`.
  - **`publisher` (bool, optional)**: Filter by publisher field if `True`.
  - **`isbn` (bool, optional)**: Filter by ISBN field if `True`.

- **Returns**: A list of field filter strings to be used as the `search_by` parameter in the `search()` method.

#### `search_categories(libgen=False, comics=False, fiction=False, scientific_articles=False, magazines=False, fiction_russian=False, standards=False)`

Constructs a list of category filters to use in a search, allowing for more specific results based on content type.

- **Parameters**:
  - **`libgen` (bool, optional)**: Include the general Library Genesis database if `True`.
  - **`comics` (bool, optional)**: Include comics in the search if `True`.
  - **`fiction` (bool, optional)**: Include fiction books in the search if `True`.
  - **`scientific_articles` (bool, optional)**: Include scientific articles in the search if `True`.
  - **`magazines` (bool, optional)**: Include magazines in the search if `True`.
  - **`fiction_russian` (bool, optional)**: Include Russian fiction if `True`.
  - **`standards` (bool, optional)**: Include standards and technical documents if `True`.

- **Returns**: A list of category filter strings to be used as the `categories` parameter in the `search()` method.

### Error Handling

The library logs and handles common errors to ensure stable scraping and session management, including:

- **Request Errors**: HTTP errors are caught and logged to prevent interruptions.
- **Parsing Errors**: Any errors during HTML parsing are logged, allowing the library to continue functioning without breaking.

## Contributing

Contributions are welcome! If you have suggestions for improving `researchme`, please submit a pull request or open an issue.

## License

`researchme` is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.
