Metadata-Version: 2.4
Name: alawymdb
Version: 0.2.3
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Classifier: Topic :: Database
Classifier: Operating System :: OS Independent
Requires-Dist: numpy>=1.20.0
Requires-Dist: pandas>=1.3.0
Summary: Linear-scaling in-memory database optimized for ML workloads
Keywords: database,columnar,in-memory,performance
Author-email: Tomio Kobayashi <tomkob99@yahoo.co.jp>
License: MIT
Requires-Python: >=3.7
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM


  

# AlawymDB

  

[![PyPI version](https://badge.fury.io/py/alawymdb.svg)](https://badge.fury.io/py/alawymdb)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

  

**A**lmost **L**inear **A**ny **W**ay **Y**ou **M**easure - A high-performance in-memory database that achieves near-linear O(n) scaling for operations that traditionally suffer from O(n log n) complexity.

  

## 🚀 Breakthrough Performance

  

AlawymDB (pronounced "ah-LAY-wim") lives up to its name - delivering almost linear performance any way you measure it:

  

### Column Scaling Performance

```

Columns: 10 → 100 → 1000 → 2000

Cells/sec: 18.5M → 7.5M → 5.8M → 5.2M

Scaling: 1× → 10× → 100× → 200× (columns)

1× → 4.1× → 5.3× → 6.1× (time)

```

  

**Result: O(n) with minimal logarithmic factor - effectively linear!** 🎯

  

## 🔧 Installation

  

```bash

pip  install  alawymdb

```

  

## 💡 Quick Start

  

```python

import alawymdb as db

  

# Initialize database

db.create_database()

  

# Create schema and table

db.create_schema("main")

db.create_table(

"main",

"users",

[

("id", "UINT64", False),

("name", "STRING", False),

("age", "INT64", True),

("email", "STRING", True),

("score", "FLOAT64", True)

]

)

  

# Insert data

users = [

(1, "Alice", 30, "alice@example.com", 95.5),

(2, "Bob", 25, "bob@example.com", 87.3),

(3, "Charlie", 35, "charlie@example.com", 92.1),

]

  

for user_id, name, age, email, score in users:

db.insert_row("main", "users", [

("id", user_id),

("name", name),

("age", age),

("email", email),

("score", score)

])

  

# Query with SQL

result = db.execute_sql("SELECT * FROM main.users")

print(result)

  

# SQL with WHERE clause

young_users = db.execute_sql("SELECT * FROM main.users WHERE age = 25")

print(f"Young users:\n{young_users}")

  

# Direct API queries

all_users = db.select_all("main", "users")

print(f"Total users: {db.count_rows('main', 'users')}")

```

  
  
  

## 🎯 Working Example: Toy example demonstrating various JOIN operations in AlawymDB

  

```python

import alawymdb as db

import time

  

def  setup_toy_database():

"""Create toy database with customers and orders tables"""

print("🔧 Setting up toy database...")

db.create_database()

db.create_schema("toy")

# Create customers table

db.create_table(

"toy",

"customers",

[

("customer_id", "UINT64", False),

("name", "STRING", False),

("country", "STRING", False),

("join_date", "STRING", False),

]

)

# Create orders table

db.create_table(

"toy",

"orders",

[

("order_id", "UINT64", False),

("customer_id", "UINT64", False),

("product", "STRING", False),

("amount", "FLOAT64", False),

("order_date", "STRING", False),

]

)

# Insert customers

customers = [

(1, "Alice", "USA", "2023-01-15"),

(2, "Bob", "Canada", "2023-02-20"),

(3, "Charlie", "UK", "2023-03-10"),

(4, "Diana", "Germany", "2023-04-05"),

(5, "Eve", "France", "2023-05-12"),

]

for cust in customers:

db.insert_row("toy", "customers", [

("customer_id", cust[0]),

("name", cust[1]),

("country", cust[2]),

("join_date", cust[3]),

])

# Insert orders (some customers have multiple orders, some have none)

orders = [

(101, 1, "Laptop", 1200.0, "2024-01-10"),

(102, 1, "Mouse", 25.0, "2024-01-15"),

(103, 2, "Keyboard", 75.0, "2024-01-20"),

(104, 3, "Monitor", 350.0, "2024-02-01"),

(105, 1, "Headphones", 150.0, "2024-02-15"),

(106, 3, "Webcam", 80.0, "2024-03-01"),

(107, 2, "USB Drive", 30.0, "2024-03-10"),

# Note: Diana (4) and Eve (5) have no orders

]

for order in orders:

db.insert_row("toy", "orders", [

("order_id", order[0]),

("customer_id", order[1]),

("product", order[2]),

("amount", order[3]),

("order_date", order[4]),

])

print("✅ Toy database created successfully!")

print(f" - Customers: {db.count_rows('toy', 'customers')}")

print(f" - Orders: {db.count_rows('toy', 'orders')}")

  

def  demonstrate_joins():

"""Demonstrate various JOIN operations"""

print("\n" + "="*80)

print("JOIN DEMONSTRATIONS")

print("="*80)

# 1. INNER JOIN

print("\n1️⃣ INNER JOIN - Customers with their orders")

print("-" * 60)

sql = """

SELECT

c.name,

c.country,

o.product,

o.amount

FROM toy.customers c

INNER JOIN toy.orders o ON c.customer_id = o.customer_id

ORDER BY c.name, o.amount DESC

"""

result = db.execute_sql(sql)

print(result)

# 2. LEFT JOIN

print("\n2️⃣ LEFT JOIN - All customers, including those without orders")

print("-" * 60)

sql = """

SELECT

c.name,

c.country,

o.order_id,

o.product,

o.amount

FROM toy.customers c

LEFT JOIN toy.orders o ON c.customer_id = o.customer_id

ORDER BY c.name

"""

result = db.execute_sql(sql)

print(result)

# 3. RIGHT JOIN

print("\n3️⃣ RIGHT JOIN - All orders with customer details")

print("-" * 60)

sql = """

SELECT

c.name,

c.country,

o.order_id,

o.product,

o.amount

FROM toy.customers c

RIGHT JOIN toy.orders o ON c.customer_id = o.customer_id

ORDER BY o.order_id

"""

result = db.execute_sql(sql)

print(result)

# 4. CROSS JOIN (Cartesian product)

print("\n4️⃣ CROSS JOIN - Every customer with every order (Cartesian product)")

print("-" * 60)

sql = """

SELECT

c.name,

o.product

FROM toy.customers c

CROSS JOIN toy.orders o

"""

result = db.execute_sql(sql)

print(f"Total combinations: {result.count('Product_')} rows")

print("(Showing first few rows only...)")

# Show just first few lines

lines = result.split('\n')[:10]

print('\n'.join(lines))

# 5. Aggregation with JOIN

print("\n5️⃣ JOIN with Aggregation - Customer order summary")

print("-" * 60)

sql = """

SELECT

c.name,

c.country,

COUNT(o.order_id) as order_count,

SUM(o.amount) as total_spent

FROM toy.customers c

LEFT JOIN toy.orders o ON c.customer_id = o.customer_id

GROUP BY c.customer_id, c.name, c.country

ORDER BY total_spent DESC

"""

# Note: If GROUP BY is not yet supported, use alternative approach

try:

result = db.execute_sql(sql)

print(result)

except:

print("GROUP BY not yet supported, showing alternative approach...")

# Alternative: Show customer-wise data

sql_alt = """

SELECT

c.name,

c.country,

o.amount

FROM toy.customers c

LEFT JOIN toy.orders o ON c.customer_id = o.customer_id

ORDER BY c.name

"""

result = db.execute_sql(sql_alt)

print(result)

# 6. Self-JOIN example (if we had hierarchical data)

print("\n6️⃣ Complex JOIN - Customers who ordered multiple products")

print("-" * 60)

sql = """

SELECT DISTINCT

c.name,

c.country

FROM toy.customers c

INNER JOIN toy.orders o1 ON c.customer_id = o1.customer_id

INNER JOIN toy.orders o2 ON c.customer_id = o2.customer_id

WHERE o1.order_id < o2.order_id

"""

try:

result = db.execute_sql(sql)

print(result)

except:

print("Complex multi-join query - showing simpler version:")

sql_simple = """

SELECT

c.name,

COUNT(*) as order_count

FROM toy.customers c

INNER JOIN toy.orders o ON c.customer_id = o.customer_id

GROUP BY c.customer_id, c.name

HAVING COUNT(*) > 1

"""

# Fallback to simple join

sql_fallback = """

SELECT

c.name,

o.product

FROM toy.customers c

INNER JOIN toy.orders o ON c.customer_id = o.customer_id

ORDER BY c.name

"""

result = db.execute_sql(sql_fallback)

print(result)

  

def  main():

"""Run the toy JOIN example"""

print("🚀 AlawymDB Toy JOIN Example")

print("="*80)

setup_toy_database()

demonstrate_joins()

print("\n" + "="*80)

print("✅ Toy JOIN demonstration complete!")

  

if  __name__ == "__main__":

main()

```

  

## 🎯 Working Example: Analytics with Pandas Integration

  

```python

import alawymdb as db

import numpy as np

import pandas as pd

  

# Setup

db.create_database()

db.create_schema("test_schema")

  

# Create employees table

db.create_table(

"test_schema",

"employees",

[

("id", "UINT64", False),

("name", "STRING", False),

("age", "UINT64", True),

("salary", "FLOAT64", True),

("department", "STRING", True)

]

)

  

# Insert test data

employees = [

(1, "Alice Johnson", 28, 75000.0, "Engineering"),

(2, "Bob Smith", 35, 85000.0, "Sales"),

(3, "Charlie Brown", 42, 95000.0, "Engineering"),

(4, "Diana Prince", 31, 78000.0, "Marketing"),

(5, "Eve Adams", 26, 72000.0, "Sales"),

]

  

for emp in employees:

db.insert_row("test_schema", "employees", [

("id", emp[0]),

("name", emp[1]),

("age", emp[2]),

("salary", emp[3]),

("department", emp[4])

])

  

# Convert to Pandas DataFrame

df = db.to_pandas("test_schema", "employees")

print("DataFrame shape:", df.shape)

print("\nDataFrame head:")

print(df.head())

  

# Pandas operations

print(f"\nAverage salary: ${df['salary'].mean():,.2f}")

print("\nSalary by department:")

print(df.groupby('department')['salary'].agg(['mean', 'count']))

  

# Get as NumPy array

ages = db.to_numpy("test_schema", "employees", "age")

print(f"\nAges array: {ages}")

print(f"Mean age: {np.mean(ages):.1f}")

  

# Get data as dictionary

data_dict = db.select_as_dict("test_schema", "employees")

print(f"\nColumns available: {list(data_dict.keys())}")

```

  

## 📊 Create Table from Pandas DataFrame

  

```python

import alawymdb as db

import pandas as pd

import numpy as np

  

db.create_database()

db.create_schema("data")

  

# Create a DataFrame

df = pd.DataFrame({

'product_id': np.arange(1, 101),

'product_name': [f'Product_{i}'  for i in  range(1, 101)],

'price': np.random.uniform(10, 100, 100).round(2),

'quantity': np.random.randint(1, 100, 100),

'in_stock': np.random.choice([0, 1], 100) # Use 0/1 instead of True/False

})

  

# Import DataFrame to AlawymDB

result = db.from_pandas(df, "data", "products")

print(result)

  

# Verify by reading back

df_verify = db.to_pandas("data", "products")

print(f"Imported {len(df_verify)} rows with {len(df_verify.columns)} columns")

print(df_verify.head())

  

# Query the imported data

result = db.execute_sql("SELECT * FROM data.products WHERE price > 50.0")

print(f"Products with price > 50: {result}")

```

  

## 📈 Wide Table Example (Working Version)

  

```python

import alawymdb as db

  

db.create_database()

db.create_schema("wide")

  

# Create table with many columns

num_columns = 100

columns = [("id", "UINT64", False)]

columns += [(f"metric_{i}", "FLOAT64", True) for i in  range(num_columns)]

  

db.create_table("wide", "metrics", columns)

  

# Insert data

for row_id in  range(100):

values = [("id", row_id)]

values += [(f"metric_{i}", float(row_id * 0.1 + i)) for i in  range(num_columns)]

db.insert_row("wide", "metrics", values)

  

# Query using direct API (more reliable for wide tables)

all_data = db.select_all("wide", "metrics")

print(f"Inserted {len(all_data)} rows")

  

# Convert to Pandas for analysis

df = db.to_pandas("wide", "metrics")

print(f"DataFrame shape: {df.shape}")

print(f"Columns: {df.columns[:5].tolist()} ... {df.columns[-5:].tolist()}")

  

# Get specific column as NumPy array

metric_0 = db.to_numpy("wide", "metrics", "metric_0")

print(f"Metric_0 stats: mean={metric_0.mean():.2f}, std={metric_0.std():.2f}")

```

  

## 🚀 Performance Test

  

```python

import alawymdb as db

import pandas as pd

import numpy as np

import time

  

db.create_database()

db.create_schema("perf")

  

# Create a large DataFrame

n_rows = 10000

df_large = pd.DataFrame({

'id': np.arange(n_rows),

'value1': np.random.randn(n_rows),

'value2': np.random.randn(n_rows) * 100,

'category': np.random.choice(['A', 'B', 'C', 'D', 'E'], n_rows),

'flag': np.random.choice([0, 1], n_rows)

})

  

# Time the import

start = time.time()

db.from_pandas(df_large, "perf", "large_table")

import_time = time.time() - start

print(f"Import {n_rows} rows: {import_time:.3f}s ({n_rows/import_time:.0f} rows/sec)")

  

# Time the export

start = time.time()

df_export = db.to_pandas("perf", "large_table")

export_time = time.time() - start

print(f"Export to Pandas: {export_time:.3f}s ({n_rows/export_time:.0f} rows/sec)")

  

# Verify

print(f"Shape verification: {df_export.shape}")

```

  

## 🏗️ Why "Almost Linear Any Way You Measure"?

  

The name AlawymDB reflects our core achievement:

-  **Column scaling**: O(n) with tiny logarithmic factor (log₂₅₆)

-  **Row scaling**: Pure O(n) for scans

-  **Memory usage**: Linear with data size

-  **Wide tables**: Tested up to 5000 columns with maintained performance

  

## 📊 Current SQL Support

  

### ✅ Working SQL Features

-  `SELECT * FROM table`

-  `SELECT column1, column2 FROM table`

-  `SELECT * FROM table WHERE column = value`

-  `SELECT * FROM table WHERE column > value` (for INT64 columns)

-  `SELECT * FROM table WHERE text_column = 'string'`

  

### ⚠️ SQL Limitations

- Type matching is strict (use 50.0 for FLOAT64, 50 for INT64)

- Aggregations and set operations SQL coming soon 

  

## 🎨 API Reference

  

```python

# Core operations

db.create_database()

db.create_schema(schema_name)

db.create_table(schema, table, columns)

db.insert_row(schema, table, values)

  

# Query operations

db.select_all(schema, table)

db.select_where(schema, table, columns, where_col, where_val)

db.count_rows(schema, table)

db.execute_sql(sql_query) # Basic SQL support

  

# Data science integrations

db.to_pandas(schema, table) # Export to DataFrame

db.to_numpy(schema, table, column) # Export column to NumPy

db.from_pandas(df, schema, table) # Import from DataFrame

db.select_as_dict(schema, table) # Get as Python dict

```

  

## 🚦 Performance Characteristics

  

| Operation | Complexity | Verified Scale |

|-----------|------------|----------------|

| INSERT | O(1) | 2M rows × 2K columns |

| SELECT * | O(n) | 10K rows × 5K columns |

| WHERE clause | O(n) | 1M rows tested |

| to_pandas() | O(n) | 100K rows tested |

| from_pandas() | O(n) | 100K rows tested |

| Column scaling | ~O(n) | Up to 5000 columns |

  

## 📜 License

  

MIT License

  

---

  

**AlawymDB**: Almost Linear Any Way You Measure - because performance should scale with your data, not against it.
