Metadata-Version: 2.1
Name: blobfile
Version: 0.10.0
Summary: Read GCS and local paths with the same interface, clone of tensorflow.io.gfile
Home-page: https://github.com/cshesse/blobfile
Author: Christopher Hesse
License: Public Domain
Platform: UNKNOWN
Requires-Python: >=3.6.0
Description-Content-Type: text/markdown
Requires-Dist: pycryptodomex (~=3.8)
Requires-Dist: urllib3 (~=1.25)
Requires-Dist: xmltodict (~=0.12.0)
Requires-Dist: filelock (~=3.0)
Requires-Dist: typing-extensions (>=3.7.4.1)
Provides-Extra: dev
Requires-Dist: pytest ; extra == 'dev'
Requires-Dist: tensorflow ; extra == 'dev'
Requires-Dist: imageio ; extra == 'dev'
Requires-Dist: imageio-ffmpeg ; extra == 'dev'
Requires-Dist: azure-cli ; extra == 'dev'
Requires-Dist: google-cloud-storage ; extra == 'dev'
Requires-Dist: typeguard ; extra == 'dev'

# blobfile

This is a standalone clone of TensorFlow's [`gfile`](https://www.tensorflow.org/api_docs/python/tf/io/gfile/GFile), supporting both local paths and `gs://` (Google Cloud Storage) paths.

The main function is `BlobFile`, a replacement for `GFile`.  There are also a few additional functions, `basename`, `dirname`, and `join`, which mostly do the same thing as their `os.path` namesakes, only they also support `gs://` paths.  

Installation:

```sh
pip install blobfile
```

Usage:

```py
import blobfile as bf

with bf.BlobFile("gs://my-bucket-name/cats", "wb") as w:
    w.write(b"meow!")
```

Here are the functions:

* `BlobFile` - like `open()` but works with `gs://` paths too
* `LocalBlobFile` - like `BlobFile()` but operations take place on a local file.  When reading, this is done by downloading the file during the constructor.  When writing, this means uploading the file on `close()` or during destruction.  You can pass a `cache_dir` parameter to cache files for reading.  You are reponsible for cleaning up the cache directory.

Some are inspired by existing `os.path` and `shutil` functions:

* `copy` - copy a file from one path to another, will do a remote copy between two remote paths on the same blob storage service
* `exists` - returns `True` if the file or directory exists
* `glob` - return files matching a pattern, on GCS this only supports a single `*` operator.  In addition, it can be slow if the `*` appears early in the pattern since GCS can only do prefix matches; all additional filtering must happen locally
* `isdir` - returns `True` if the path is a directory
* `listdir` - list contents of a directory as a generator
* `makedirs` - ensure that a directory and all parent directories exist
* `remove` - remove a file
* `rmdir` - remove an empty directory
* `rmtree` - remove a directory tree
* `stat` - get the size and modification time of a file
* `walk` - walk a directory tree with a generator that yields `(dirpath, dirnames, filenames)` tuples
* `basename` - get the final component of a path
* `dirname` - get the path except for the final component
* `join` - join 2 or more paths together, inserting directory separators between each component

There are a few bonus functions:

* `get_url` - returns a url for a path along with the expiration for that url (or None)
* `md5` - get the md5 hash for a path, for GCS this is fast, but for other backends this may be slow
* `set_log_callback` - set a log callback function `log(msg: string)` to use instead of printing to stdout

