Metadata-Version: 2.1
Name: JSONSki
Version: 0.0.29
Summary: Python binding port for JSONSki
Home-page: https://github.com/your_username/your_repository
Author: AutomataLab
Author-email: zhijia@cs.ucr.edu
License: UNKNOWN
Platform: UNKNOWN
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: test
Requires-Dist: pytest (>=6.0) ; extra == 'test'

# JSONSki_python
JSONSki_python is the Python binding port for JSONSki

# JSONSki

JSONSki is **a streaming JSONPath processor** with **fast-forward** functionality. During the streaming, it can automatically fast-forward over certain JSON substructures that are irrelavent to the query evaluation, without parsing them in detail. To make the fast-forward efficient, JSONSki features a highly bit-parallel solution that intensively utilizes bitwise and SIMD operations that are prevelent on modern CPUs to implement the fast-forward APIs. 

## Installation
```
pip install JSONSki
```

## Quick Start

```
import jsonski as jski
print(jski.loadSingleRecord("$.features[150].actor.login", "datasets/test.json"))
```

- We interface the following method:
```
jski.loadSingleRecord(args1, args2)    //args1 - String(query) and args2 - String(file_location)
```

## Requirements

### Hardware requirements

- CPUs: 64-bit ALU instructions, 256-bit SIMD instruction set, and the carry-less multiplication instruction (pclmulqdq)
- Operating System: Linux, MacOs (Intel Chips only) 
- C++ Compiler: g++ (7.4.0 or higher)


### Software requirements

Before starting to use JSONSki-API you need to assure you have the following prerequisites:

- Python (v3.7) see: [Installing Python](https://www.python.org/downloads/release/python-3100/)

- C++ : g++ (v7.4.0 and above) see: [Installing C++](https://gcc.gnu.org/install/)


## Getting Started with Querying using JSONSki
### JSONPath
JSONPath is the basic query language of JSON data. It refers to substructures of JSON data in a similar way as XPath queries are used for XML data. For the details of JSONPath syntax, please refer to [Stefan Goessner's article](https://goessner.net/articles/JsonPath/index.html#e2). 

#### JSONSki Queries Operators
| Operator                  |   Description     |
| :-----------------------: |:-----------------:|
| `$`                       | root object              |
| `.`                       | child object      |
| `[]`                       | child array      |
| `*`                       | wildcard, all objects or array members          |
| `[index]`             | array index      |
| `[start:end]`             | array slice operator      |


#### Path Examples
Consider a piece of geo-referenced tweet in JSON
```python
{
    "coordinates": [
        40.74118764, -73.9998279
    ],
    "user": {
        "id": 6253282
    },
    "place": {
        "name": "Manhattan",
        "bounding_box": {
            "type": "Ploygon",
            "pos": [
                [-74.026675, 40.683935],
                ......
            ]
        }
    }
}
```
| JsonPath | Result |
| :------- | :----- |
| `$.coordinates[*]` | all coordinates     |
| `$.place.name` | place name   |
| `$.place.bounding_box.pos[0]`| first position of the bounding box in place                      |
| `$.place.bounding_box.pos[0:2]`| first two positions of the bounding box in place                      |


### APIs
#### Records Loading 
#### JSONski API: Simplifying Data Handling

JSONski is a powerful and user-friendly API designed to streamline data handling and processing tasks, particularly when dealing with JSON-based data. It offers a range of functions to manipulate and manage data, making complex operations effortless. Two key functions within the JSONski API are jski.loadSingleRecord and jski.loadRecords:


- `loadSingleRecord(args1, args2)    //args1 - String(query) and args2 - String(file_location)`: loads the whole input file as one single record (allow newlines in strings and other legal places). 

- `loadRecords(args1, args2)    //args1 - String(query) and args2 - String(file_location)`: loads multiple records from the input file (all newlines are treated as delimiters; no newlines (except for `\n` and `\r` in JSON strings) are allowed within a record);.


## Performance Comparison with Python Parsing
Below is an example usage of Jsonski pip package. 
```
#JSONSki
import jsonski as jski
import time

start_time = time.time()
print(jski.loadSingleRecord("$[*].entities.urls[*].url","./JSONSki/dataset/twitter_sample_large_record.json"))
end_time = time.time()
elapsed_time = end_time - start_time
print("Elapsed jsonski time:", elapsed_time, "seconds")


#Python`s inbuilt JSON parser
import json as j

start_time = time.time()
def parse_json_file(file_path):
    with open(file_path, 'r') as file:
        json_data = j.load(file)
        return json_data
json_file_path = './JSONSki/dataset/twitter_sample_large_record.json'
print(parse_json_file(json_file_path))
end_time = time.time()
elapsed_time = end_time - start_time
print("Elapsed default_python_json time:", elapsed_time, "seconds")
```
- Note: The code snippet above benchmarks performance for JSONSki parsing VS Python in-built parsing.


## Publication
[1] Lin Jiang and Zhijia Zhao. [JSONSki: Streaming Semi-structured Data with Bit-Parallel Fast-Forwarding](https://dl.acm.org/doi/10.1145/3503222.3507719). In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022.
```
@inproceedings{jsonski,
  title={JSONSki: Streaming Semi-structured Data with Bit-Parallel Fast-Forwarding},
  author={Lin Jiang and Zhijia Zhao},
  booktitle={Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)},
  year={2022}
}
```

## Performance

![image](https://i.imgur.com/qJOMY02.pngs)







