Metadata-Version: 2.1
Name: big-graph-dataset
Version: 0.0.8.post4
Summary: A collection of graph datasets in torch_geometric format.
Home-page: https://github.com/neutralpronoun/big-graph-dataset
Author: Alex O. Davies
Author-email: alexander.davies@bristol.ac.uk
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/x-rst
Requires-Dist: sphinxcontrib-bibtex==2.6.2
Requires-Dist: sphinx-readme==1.2.1
Requires-Dist: nbsphinx==0.9.4
Requires-Dist: aiohttp==3.9.5
Requires-Dist: aiosignal==1.3.1
Requires-Dist: alabaster==0.7.16
Requires-Dist: anyio==4.4.0
Requires-Dist: appdirs==1.4.4
Requires-Dist: appnope==0.1.4
Requires-Dist: argon2-cffi==23.1.0
Requires-Dist: argon2-cffi-bindings==21.2.0
Requires-Dist: arrow==1.3.0
Requires-Dist: asttokens==2.4.1
Requires-Dist: async-lru==2.0.4
Requires-Dist: async-timeout==4.0.3
Requires-Dist: attrs==23.2.0
Requires-Dist: Babel==2.15.0
Requires-Dist: beautifulsoup4==4.12.3
Requires-Dist: bleach==6.1.0
Requires-Dist: certifi==2024.2.2
Requires-Dist: cffi==1.16.0
Requires-Dist: charset-normalizer==3.3.2
Requires-Dist: click==8.1.7
Requires-Dist: cmake==3.29.2
Requires-Dist: comm==0.2.2
Requires-Dist: contourpy==1.2.1
Requires-Dist: cycler==0.12.1
Requires-Dist: Cython==3.0.10
Requires-Dist: debugpy==1.8.1
Requires-Dist: decorator==5.1.1
Requires-Dist: defusedxml==0.7.1
Requires-Dist: docker-pycreds==0.4.0
Requires-Dist: docutils==0.20.1
Requires-Dist: exceptiongroup==1.2.1
Requires-Dist: executing==2.0.1
Requires-Dist: fastjsonschema==2.19.1
Requires-Dist: filelock==3.14.0
Requires-Dist: fonttools==4.51.0
Requires-Dist: fqdn==1.5.1
Requires-Dist: frozenlist==1.4.1
Requires-Dist: fsspec==2024.3.1
Requires-Dist: gitdb==4.0.11
Requires-Dist: GitPython==3.1.43
Requires-Dist: h11==0.14.0
Requires-Dist: httpcore==1.0.5
Requires-Dist: httpx==0.27.0
Requires-Dist: idna==3.7
Requires-Dist: imagesize==1.4.1
Requires-Dist: ipykernel==6.29.4
Requires-Dist: ipython==8.25.0
Requires-Dist: ipywidgets==8.1.3
Requires-Dist: isoduration==20.11.0
Requires-Dist: jedi==0.19.1
Requires-Dist: Jinja2==3.1.4
Requires-Dist: joblib==1.4.2
Requires-Dist: json5==0.9.25
Requires-Dist: jsonpointer==2.4
Requires-Dist: jsonschema==4.22.0
Requires-Dist: jsonschema-specifications==2023.12.1
Requires-Dist: jupyter==1.0.0
Requires-Dist: jupyter-console==6.6.3
Requires-Dist: jupyter-events==0.10.0
Requires-Dist: jupyter-lsp==2.2.5
Requires-Dist: jupyter_client==8.6.2
Requires-Dist: jupyter_core==5.7.2
Requires-Dist: jupyter_server==2.14.1
Requires-Dist: jupyter_server_terminals==0.5.3
Requires-Dist: jupyterlab==4.2.1
Requires-Dist: jupyterlab_pygments==0.3.0
Requires-Dist: jupyterlab_server==2.27.2
Requires-Dist: jupyterlab_widgets==3.0.11
Requires-Dist: kiwisolver==1.4.5
Requires-Dist: littleballoffur==2.3.1
Requires-Dist: littleutils==0.2.2
Requires-Dist: llvmlite==0.42.0
Requires-Dist: MarkupSafe==2.1.5
Requires-Dist: matplotlib==3.8.4
Requires-Dist: matplotlib-inline==0.1.7
Requires-Dist: mistune==3.0.2
Requires-Dist: mpmath==1.3.0
Requires-Dist: multidict==6.0.5
Requires-Dist: nbclient==0.10.0
Requires-Dist: nbconvert==7.16.4
Requires-Dist: nbformat==5.10.4
Requires-Dist: nest-asyncio==1.6.0
Requires-Dist: networkit==11.0
Requires-Dist: networkx==3.3
Requires-Dist: notebook==7.2.0
Requires-Dist: notebook_shim==0.2.4
Requires-Dist: numba==0.59.1
Requires-Dist: numpy==1.26.4
Requires-Dist: ogb==1.3.6
Requires-Dist: outdated==0.2.2
Requires-Dist: overrides==7.7.0
Requires-Dist: packaging==24.0
Requires-Dist: pandas==1.5.3
Requires-Dist: pandocfilters==1.5.1
Requires-Dist: parso==0.8.4
Requires-Dist: pexpect==4.9.0
Requires-Dist: pillow==10.3.0
Requires-Dist: platformdirs==4.2.2
Requires-Dist: prometheus_client==0.20.0
Requires-Dist: prompt_toolkit==3.0.45
Requires-Dist: protobuf==4.25.3
Requires-Dist: psutil==5.9.8
Requires-Dist: ptyprocess==0.7.0
Requires-Dist: pure-eval==0.2.2
Requires-Dist: pycparser==2.22
Requires-Dist: Pygments==2.18.0
Requires-Dist: pynndescent==0.5.12
Requires-Dist: pyparsing==3.1.2
Requires-Dist: python-dateutil==2.9.0.post0
Requires-Dist: python-json-logger==2.0.7
Requires-Dist: python-louvain==0.16
Requires-Dist: pytz==2024.1
Requires-Dist: PyYAML==6.0.1
Requires-Dist: pyzmq==26.0.3
Requires-Dist: qtconsole==5.5.2
Requires-Dist: QtPy==2.4.1
Requires-Dist: rdkit==2023.9.6
Requires-Dist: referencing==0.35.1
Requires-Dist: requests==2.31.0
Requires-Dist: rfc3339-validator==0.1.4
Requires-Dist: rfc3986-validator==0.1.1
Requires-Dist: rpds-py==0.18.1
Requires-Dist: scikit-learn==1.4.2
Requires-Dist: scipy==1.13.0
Requires-Dist: Send2Trash==1.8.3
Requires-Dist: sentry-sdk==2.1.1
Requires-Dist: setproctitle==1.3.3
Requires-Dist: six==1.16.0
Requires-Dist: smmap==5.0.1
Requires-Dist: sniffio==1.3.1
Requires-Dist: snowballstemmer==2.2.0
Requires-Dist: soupsieve==2.5
Requires-Dist: Sphinx==7.3.7
Requires-Dist: sphinx-autodoc-typehints==2.1.1
Requires-Dist: sphinx-rtd-theme==2.0.0
Requires-Dist: sphinxcontrib-applehelp==1.0.8
Requires-Dist: sphinxcontrib-devhelp==1.0.6
Requires-Dist: sphinxcontrib-htmlhelp==2.0.5
Requires-Dist: sphinxcontrib-jquery==4.1
Requires-Dist: sphinxcontrib-jsmath==1.0.1
Requires-Dist: sphinxcontrib-qthelp==1.0.7
Requires-Dist: sphinxcontrib-serializinghtml==1.1.10
Requires-Dist: stack-data==0.6.3
Requires-Dist: sympy==1.12
Requires-Dist: terminado==0.18.1
Requires-Dist: threadpoolctl==3.5.0
Requires-Dist: tinycss2==1.3.0
Requires-Dist: tomli==2.0.1
Requires-Dist: torch==2.3.0
Requires-Dist: torch_geometric==2.3.1
Requires-Dist: torchaudio==2.3.0
Requires-Dist: torchvision==0.18.0
Requires-Dist: tornado==6.4
Requires-Dist: tqdm==4.66.4
Requires-Dist: traitlets==5.14.3
Requires-Dist: types-python-dateutil==2.9.0.20240316
Requires-Dist: typing_extensions==4.11.0
Requires-Dist: tzdata==2024.1
Requires-Dist: umap-learn==0.5.6
Requires-Dist: uri-template==1.3.0
Requires-Dist: urllib3==2.2.1
Requires-Dist: wandb==0.16.6
Requires-Dist: wcwidth==0.2.13
Requires-Dist: webcolors==1.13
Requires-Dist: webencodings==0.5.1
Requires-Dist: websocket-client==1.8.0
Requires-Dist: wget==3.2
Requires-Dist: widgetsnbextension==4.0.11
Requires-Dist: yarl==1.9.4

.. |CommunityDataset| replace:: ``CommunityDataset``
.. _CommunityDataset: https://big-graph-dataset.readthedocs.io/en/latest/bgd/synthetic.html#bgd.synthetic.CommunityDataset
.. |compute_top_scores()| replace:: ``compute_top_scores()``
.. _compute_top_scores(): https://big-graph-dataset.readthedocs.io/en/latest/top.html#top.compute_top_scores
.. |CoraDataset| replace:: ``CoraDataset``
.. _CoraDataset: https://big-graph-dataset.readthedocs.io/en/latest/bgd/real.html#bgd.real.CoraDataset
.. |EgoDataset| replace:: ``EgoDataset``
.. _EgoDataset: https://big-graph-dataset.readthedocs.io/en/latest/bgd/real.html#bgd.real.EgoDataset
.. |FacebookDataset| replace:: ``FacebookDataset``
.. _FacebookDataset: https://big-graph-dataset.readthedocs.io/en/latest/bgd/real.html#bgd.real.FacebookDataset
.. |GeneralEmbeddingEvaluation| replace:: ``GeneralEmbeddingEvaluation``
.. _GeneralEmbeddingEvaluation: https://big-graph-dataset.readthedocs.io/en/latest/top.html#top.GeneralEmbeddingEvaluation
.. |.genindex| replace:: Index
.. _.genindex: https://big-graph-dataset.readthedocs.io/en/latest/genindex.html
.. |get_all_datasets()| replace:: ``get_all_datasets()``
.. _get_all_datasets(): https://big-graph-dataset.readthedocs.io/en/latest/bgd/loaders.html#bgd.loaders.get_all_datasets
.. |get_test_datasets()| replace:: ``get_test_datasets()``
.. _get_test_datasets(): https://big-graph-dataset.readthedocs.io/en/latest/bgd/loaders.html#bgd.loaders.get_test_datasets
.. |get_train_datasets()| replace:: ``get_train_datasets()``
.. _get_train_datasets(): https://big-graph-dataset.readthedocs.io/en/latest/bgd/loaders.html#bgd.loaders.get_train_datasets
.. |get_val_datasets()| replace:: ``get_val_datasets()``
.. _get_val_datasets(): https://big-graph-dataset.readthedocs.io/en/latest/bgd/loaders.html#bgd.loaders.get_val_datasets
.. |.modindex| replace:: Module Index
.. _.modindex: https://big-graph-dataset.readthedocs.io/en/latest/py-modindex.html
.. |NeuralDataset| replace:: ``NeuralDataset``
.. _NeuralDataset: https://big-graph-dataset.readthedocs.io/en/latest/bgd/real.html#bgd.real.NeuralDataset
.. |RandomDataset| replace:: ``RandomDataset``
.. _RandomDataset: https://big-graph-dataset.readthedocs.io/en/latest/bgd/synthetic.html#bgd.synthetic.RandomDataset
.. |RedditDataset| replace:: ``RedditDataset``
.. _RedditDataset: https://big-graph-dataset.readthedocs.io/en/latest/bgd/real.html#bgd.real.RedditDataset
.. |RoadDataset| replace:: ``RoadDataset``
.. _RoadDataset: https://big-graph-dataset.readthedocs.io/en/latest/bgd/real.html#bgd.real.RoadDataset
.. |.search| replace:: Search Page
.. _.search: https://big-graph-dataset.readthedocs.io/en/latest/search.html
.. |ToPDataset| replace:: ``ToPDataset``
.. _ToPDataset: https://big-graph-dataset.readthedocs.io/en/latest/top.html#top.ToPDataset
.. |TreeDataset| replace:: ``TreeDataset``
.. _TreeDataset: https://big-graph-dataset.readthedocs.io/en/latest/bgd/synthetic.html#bgd.synthetic.TreeDataset


.. big-graph-dataset documentation master file, created by
   sphinx-quickstart on Tue Jun  4 13:53:10 2024.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

* `Big Graph Dataset <https://big-graph-dataset.readthedocs.io/en/latest/index.html>`_

  |



Big Graph Dataset
=================

This is a collaboration project to build a large, multi-domain set of graph bgd.
Each dataset comprises many small graphs.

The aim of this project is to provide a large set of graph datasets for use in machine learning research.
Currently graph datasets are distributed in individual repositories, increasing workload as researchers have to search for relevant resources.
Once these datasets are found, there is additional labour in formatting the data for use in deep learning.

We aim to provide datasets that are:
 - Composed of many small graphs
 - Diverse in domain
 - Diverse in tasks
 - Well-documented
 - Formatted uniformly across datasets for Pytorch Geometric

What we're looking for
======================

In short: anything! The idea behind this being a collaboration is that we cast a wide net over different domains and tasks.

There are a few rules for this first phase (see below) but the quick brief is that we're looking for datasets of small static graphs with well-defined tasks.
That just means that the structure of the graphs don't vary over time.

If your data is a bit more funky, for example multi-graphs or time-series on graphs, please get in touch and we can discuss how to include it.

In the examples I've provided datasets are mostly sampled from one large graph - this is not compulsory.

Contributing
============

The source can be found in the `Github repository<https://github.com/neutralpronoun/big-graph-dataset>`, and documentation on the `readthedocs page<https://big-graph-dataset.readthedocs.io/en/latest/>`.

The basics:
 - Create your own git branch
 - Copy the `bgd/example_dataset.py`
 - Have a look through
 - Re-tool it for your own dataset

 See more in Getting Started.

 * `Set Up & Contributing <https://big-graph-dataset.readthedocs.io/en/latest/get-started.html>`_



  |



I've provided code for sub-sampling graphs and producing statistics.

A few rules, demonstrated in `bgd/real/example_dataset.py`:
 - The datasets need at least a train/val/test split
 - Datasets should be many small (less than 400 node) graphs
 - Ideally the number of graphs in each dataset should be controllable
 - Data should be downloaded in-code to keep the repo small. If this isn't possible let me know.
 - Please cite your sources for data in documentation - see the existing datasets for example documentation
 - Where possible start from existing datasets that have been used in-literature, or if using generators, use generators that are well-understood (for example Erdos-Renyi graphs)

Please document your dataset files with your name and contact information at the top, I'll check code and merge your branches all at once at the end of the project.

Getting Started
===============

Check out the Reddit dataset example notebook for a quick start guide, then have a look at the source code for the bgd.

My environment is under `docs/requirements.txt`, use `pip install -r requirements. txt` within a virtual (Conda etc.) environment to get everything installed.

* `Reddit Example Dataset <https://big-graph-dataset.readthedocs.io/en/latest/reddit-dataset-example.html>`_

  * `A walkthrough of the dataset code for the Big Graph Dataset project <https://big-graph-dataset.readthedocs.io/en/latest/reddit-dataset-example.html#A-walkthrough-of-the-dataset-code-for-the-Big-Graph-Dataset-project>`_


  * `Sample to make a dataset of smaller graphs <https://big-graph-dataset.readthedocs.io/en/latest/reddit-dataset-example.html#Sample-to-make-a-dataset-of-smaller-graphs>`_
  * `The final dataset <https://big-graph-dataset.readthedocs.io/en/latest/reddit-dataset-example.html#The-final-dataset>`_
  * `Other datsets <https://big-graph-dataset.readthedocs.io/en/latest/reddit-dataset-example.html#Other-datsets>`_


    |



Datasets
========

Documentation for the datsets currently in the Big Graph Dataset project.

* `Many-Graph Datasets <https://big-graph-dataset.readthedocs.io/en/latest/bgd.html>`_

  * `From Real Data <https://big-graph-dataset.readthedocs.io/en/latest/bgd/real.html>`_

    * |CoraDataset|_


    * |EgoDataset|_


    * |FacebookDataset|_


    * |NeuralDataset|_


    * |RedditDataset|_


    * |RoadDataset|_



  * `Synthetic <https://big-graph-dataset.readthedocs.io/en/latest/bgd/synthetic.html>`_

    * |CommunityDataset|_


    * |RandomDataset|_


    * |TreeDataset|_



  * `Functions & Loaders <https://big-graph-dataset.readthedocs.io/en/latest/bgd/loaders.html>`_

    * |get_all_datasets()|_
    * |get_test_datasets()|_
    * |get_train_datasets()|_
    * |get_val_datasets()|_



      |



ToP (Topology Only Pre-Training)
================================

Documentation for the Topology Only Pre-Training component of the project.
We are using a pre-trained model to generate embeddings of the graphs in the datasets, hopefully to get some measure of how diverse the datasets are.
Very much a work-in-progress!

* `ToP (Topology only Pre-training) <https://big-graph-dataset.readthedocs.io/en/latest/top.html>`_

  * |GeneralEmbeddingEvaluation|_


  * |ToPDataset|_


  * |compute_top_scores()|_


    |



Credits
=======

This project is maintained by Alex O. Davies, a PhD student at the University of Bristol.
Contributors, by default, will be given fair credit upon initial release of the project.

Should you wish your authorship to be anonymous, or if you have any further questions, please contact me at `<alexander.davies@bristol.ac.uk>`.

* `Credits <https://big-graph-dataset.readthedocs.io/en/latest/credits.html>`_

  |




**Citing**

.. code-block:: bibtex

   @misc{big-graph-dataset,
   title = {{Big Graph Dataset} Documentation},
   howpublished = {https://big-graph-dataset.readthedocs.io/}}


Indices and tables
==================

* |.genindex|_
* |.modindex|_
* |.search|_

