Metadata-Version: 1.1
Name: block.bootstrap.pytorch
Version: 0.1.0
Summary: BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection
Home-page: https://github.com/cadene/block.bootstrap.pytorch
Author: Remi Cadene
Author-email: remi.cadene@icloud.com
License: UNKNOWN
Description-Content-Type: UNKNOWN
Description: # BLOCK: Bilinear Superdiagonal Fusion for VQA and VRD
        
        In Machine Learning, an important question is "How to embed two modalities in a same space".
        For instance, in Visual Question Answering, one must embed the image and the question in a same bi-modal space which will be classified to provide the answer.
        
        <p align="center">
            <img src="https://github.com/Cadene/block.bootstrap.pytorch/blob/master/assets/VQA_block.png?raw=true" width="600"/>
        </p>
        
        We introduce a novel module (BLOCK) to fuse two representations together. First, we experimentaly demonstrate that it is better than any available fusion. Secondly, we provide a therotical-grounded analysis around the notion of tensor complexity. For further details, please see [our AAAI 2019 paper](https://arxiv.org/abs/TODO) and [poster](http://remicadene.com/pdfs/poster_aaai2019.pdf).
        
        In this repo, we make our BLOCK fusion available via pip install including several powerful fusions from the state-of-the-art (MLB, MUTAN, MCB, MFB, MFH, etc.). Also, we provide pretrained models and all the code needed to reproduce our experiments.
        
        
        #### Summary
        
        * [Installation](#installation)
            * [Python 3 & Anaconda](#1-python-3--anaconda)
            * [As standalone project](#2-as-standalone-project)
            * [Download dataset](#3-download-datasets)
            * [As a python library](#2-as-a-python-library)
        * [Quick start](#quick-start)
            * [Train a model](#train-a-model)
            * [Evaluate a model](#evaluate-a-model)
        * [Reproduce results](#reproduce-results)
        * [Pretrained models](#pretrained-models)
        * [Fusions](#fusions)
            * [Block](#block)
            * [LinearSum](#linearsum)
            * [MLB](#mlb)
            * [Tucker](#tucker)
            * [Mutan](#mutan)
            * [BlockTucker](#blocktucker)
            * [MFB](#mfb)
            * [MFH](#mfh)
            * [MCB](#mcb)
        * [Useful commands](#useful-commands)
        * [Citation](#citation)
        * [Poster](#poster)
        * [Authors](#authors)
        * [Acknowledgment](#acknowledgment)
        
        
        ## Installation
        
        ### 1. Python 3 & Anaconda
        
        We don't provide support for python 2. We advise you to install python 3 with [Anaconda](https://www.continuum.io/downloads). Then, you can create an environment.
        
        ### 2. As standalone project
        
        ```
        conda create --name block python=3
        source activate block
        git clone --recursive https://github.com/Cadene/block.bootstrap.pytorch.git
        cd block.bootstrap.pytorch
        pip install -r requirements.txt
        ```
        
        ### 3. Download datasets
        
        Download annotations, images and features for VRD experiments:
        ```
        bash block/datasets/scripts/download_vrd.sh
        ```
        
        Download annotations, images and features for VQA experiments:
        ```
        bash block/datasets/scripts/download_vqa2.sh
        bash block/datasets/scripts/download_vgenome.sh
        bash block/datasets/scripts/download_tdiuc.sh
        ```
        
        **Note:** The features have been extracted from a pretrained Faster-RCNN with caffe. We don't provide the code for pretraining or extracting features for now.
        
        ### (2. As a python library)
        
        By importing the `block` python module, you can access every fusions, datasets and models in a simple way:
        ```python
        from block.fusions import Block
        from block.fusions import Mutan
        from block.fusions import MLB
        from block.fusions import MCB
        ...
        from block.datasets.vqa2 import VQA2
        from block.datasets.tdiuc import TDIUC
        ...
        from block.models.networks.vqa import VQA
        ...
        ```
        
        To be able to do so, you can use pip:
        ```
        pip install block.bootstrap.pytorch
        ```
        
        Or install from source:
        ```
        git clone https://github.com/Cadene/block.bootstrap.pytorch.git
        python setup.py install
        ```
        
        
        ## Quick start
        
        ### Train a model
        
        The [boostrap/run.py](https://github.com/Cadene/bootstrap.pytorch/blob/master/bootstrap/run.py) file load the options contained in a yaml file, create the corresponding experiment directory and start the training procedure. For instance, you can train our best model on VRD by running:
        ```
        python -m bootstrap.run -o block/options/vrd/block.yaml
        ```
        Then, several files are going to be created in `logs/vrd/block`:
        - [options.yaml](https://github.com/Cadene/block.bootstrap.pytorch/blob/master/assets/logs/vrd/block/options.yaml) (copy of options)
        - [logs.txt](https://github.com/Cadene/block.bootstrap.pytorch/blob/master/assets/logs/vrd/block/logs.txt) (history of print)
        - [logs.json](https://github.com/Cadene/block.bootstrap.pytorch/blob/master/assets/logs/vrd/block/logs.json) (batchs and epochs statistics)
        - [view.html](http://htmlpreview.github.io/?https://raw.githubusercontent.com/Cadene/block.bootstrap.pytorch/master/assets/logs/vrd/block/view.html?token=AEdvLlDSYaSn3Hsr7gO5sDBxeyuKNQhEks5cTF6-wA%3D%3D) (learning curves)
        - ckpt_last_engine.pth.tar (checkpoints of last epoch)
        - ckpt_last_model.pth.tar
        - ckpt_last_optimizer.pth.tar
        - ckpt_best_eval_epoch.predicate.R_50_engine.pth.tar (checkpoints of best epoch)
        - ckpt_best_eval_epoch.predicate.R_50_model.pth.tar
        - ckpt_best_eval_epoch.predicate.R_50_optimizer.pth.tar
        
        Many options are available in the [`options` directory](https://github.com/Cadene/block.bootstrap.pytorch/blob/master/options).
        
        ### Evaluate a model
        
        At the end of the training procedure, you can evaluate your model on the testing set. In this example, [boostrap/run.py](https://github.com/Cadene/bootstrap.pytorch/blob/master/bootstrap/run.py) load the options from your experiment directory, resume the best checkpoint on the validation set and start an evaluation on the testing set instead of the validation set while skipping the training set (train_split is empty). Thanks to `--misc.logs_name`, the logs will be written in the new `logs_predicate.txt` and `logs_predicate.json` files, instead of being appended to the `logs.txt` and `logs.json` files.
        ```
        python -m bootstrap.run \
        -o logs/vrd/block/options.yaml \
        --exp.resume best_eval_epoch.predicate.R_50 \
        --dataset.train_split \
        --dataset.eval_split test \
        --misc.logs_name predicate
        ```
        
        ## Reproduce results
        
        ### VRD dataset
        
        #### Train and evaluate on VRD
        
        1. Train block on trainset with early stopping on valset
        2. Evaluate the best checkpoint on testset (Predicate Prediction)
        3. Evaluate the best checkpoint on testset (Relationship and Phrase Detection)
        
        ```
        python -m bootstrap.run \
        -o block/options/vrd/block.yaml \
        --exp.dir logs/vrd/block
        
        python -m bootstrap.run \
        -o logs/vrd/block/options.yaml \
        --dataset.train_split \
        --dataset.eval_split test \
        --exp.resume best_eval_epoch.predicate.R_50 \
        --misc.logs_name predicate
        
        python -m bootstrap.run \
        -o logs/vrd/block/options.yaml \
        --dataset.train_split \
        --dataset.eval_split test \
        --dataset.mode rel_phrase \
        --model.metric.name vrd_rel_phrase \
        --exp.resume best_eval_epoch.predicate.R_50 \
        --misc.logs_name rel_phrase
        ```
        
        **Note:** You can copy past the three commands at once in the terminal to run one after each other seamlessly.
        
        **Note:** Block is not the only option available. You can find several others [here](https://github.com/Cadene/block.bootstrap.pytorch/tree/master/options/vrd).
        
        **Note:** Learning curves can be viewed in the experiment directy (logs/vrd/block/view.html). An example is available [here](http://htmlpreview.github.io/?https://raw.githubusercontent.com/Cadene/block.bootstrap.pytorch/master/assets/logs/vrd/block/view.html?token=AEdvLlDSYaSn3Hsr7gO5sDBxeyuKNQhEks5cTF6-wA%3D%3D).
        
        **Note:** In our article, we report result for a negative sampling ratio of 0.5. Better results in *Predicate Prediction* can be achieve with a ratio of 0.0. Better results in *Phrase Detection* and *Relationship Detection* can be achieve with a ratio of 0.8. You can change the ratio by doing so:
        ```
        python -m bootstrap.run \
        -o block/options/vrd/block.yaml \
        --exp.dir logs/vrd/block_ratio,0.0 \
        --dataset.neg_ratio 0.0
        ```
        
        #### Compare experiments on VRD
        
        Finally you can compare experiments on the valset or testset metrics:
        ```
        python -m block.compare_vrd_val -d \
        logs/vrd/block \
        logs/vrd/block_tucker \
        logs/vrd/mutan \
        logs/vrd/mfh \
        logs/vrd/mlb
        
        python -m block.compare_vrd_test -d \
        logs/vrd/block \
        logs/vrd/block_tucker
        ```
        
        Example:
        ```
        ## eval_epoch.predicate.R_50
          Place  Method          Score    Epoch
        -------  ------------  -------  -------
              1  block         86.3708       13
              2  block_tucker  86.2529        9
        
        ## eval_epoch.predicate.R_100
          Place  Method          Score    Epoch
        -------  ------------  -------  -------
              1  block         92.4588       13
              2  block_tucker  91.5816        9
        
        ## eval_epoch.phrase.R_50
          Place  Method          Score    Epoch
        -------  ------------  -------  -------
              1  block         25.4779       13
              2  block_tucker  23.7759        9
        
        ## eval_epoch.phrase.R_100
          Place  Method          Score    Epoch
        -------  ------------  -------  -------
              1  block         29.7198       13
              2  block_tucker  27.9131        9
        
        ## eval_epoch.rel.R_50
          Place  Method          Score    Epoch
        -------  ------------  -------  -------
              1  block         18.0806       13
              2  block_tucker  17.0856        9
        
        ## eval_epoch.rel.R_100
          Place  Method          Score    Epoch
        -------  ------------  -------  -------
              1  block         21.1181       13
              2  block_tucker  19.7565        9
        ```
        
        ### VQA2 dataset
        
        #### Training and evaluation (train/val)
        
        We use this simple setup to tune our hyperparameters on the valset.
        
        ```
        python -m bootstrap.run \
        -o block/options/vqa2/block.yaml \
        --exp.dir logs/vqa2/block
        ```
        
        #### Training and evaluation (train+val/val/test)
        
        This heavier setup allows us to train a model on 95% of the concatenation of train and val sets, and to evaluate it on the 5% rest. Then we extract the predictions of our best checkpoint on the testset. Finally, we submit a json file on the EvalAI web site.
        
        ```
        python -m bootstrap.run \
        -o block/options/vqa2/block.yaml \
        --exp.dir logs/vqa2/block_trainval \
        --dataset.proc_split trainval
        
        python -m bootstrap.run \
        -o logs/vqa2/block_trainval/options.yaml \
        --exp.resume best_eval_epoch.accuracy_top1 \
        --dataset.train_split \
        --dataset.eval_split test \
        --misc.logs_name test
        ```
        
        #### Training and evaluation (train+val+vg/val/test)
        
        Same, but we add pairs from the VisualGenome dataset.
        
        ```
        python -m bootstrap.run \
        -o block/options/vqa2/block.yaml \
        --exp.dir logs/vqa2/block_trainval_vg \
        --dataset.proc_split trainval \
        --dataset.vg True
        
        python -m bootstrap.run \
        -o logs/vqa2/block_trainval_vg/options.yaml \
        --exp.resume best_eval_epoch.accuracy_top1 \
        --dataset.train_split \
        --dataset.eval_split test \
        --misc.logs_name test
        ```
        
        #### Compare experiments on valset
        
        You can compare experiments by displaying their best metrics on the valset.
        
        ```
        python -m block.compare_vqa_val -d logs/vqa2/block logs/vqa2/mutan
        ```
        
        #### Submit predictions on EvalAI
        
        It is not possible to automaticaly compute the accuracies on the testset. You need to submit a json file on the [EvalAI platform](http://evalai.cloudcv.org/web/challenges/challenge-page/80/my-submission). The evaluation step on the testset creates the json file that contains the prediction of your model on the full testset. For instance: `logs/vqa2/block_trainval_vg/results/test/epoch,19/OpenEnded_mscoco_test2015_model_results.json`. To get the accuracies on testdev or test sets, you must submit this file.
        
        
        ### TDIUC dataset
        
        #### Training and evaluation (train/val/test)
        
        The full training set is split into a trainset and a valset. At the end of the training, we evaluate our best checkpoint on the testset. The TDIUC metrics are computed and displayed at the end of each epoch. They are also stored in `logs.json` and `logs_test.json`.
        
        ```
        python -m bootstrap.run \
        -o block/options/tdiuc/block.yaml \
        --exp.dir logs/tdiuc/block
        
        python -m bootstrap.run \
        -o logs/tdiuc/block/options.yaml \
        --exp.resume best_eval_epoch.accuracy_top1 \
        --dataset.train_split \
        --dataset.eval_split test \
        --misc.logs_name test
        ```
        
        #### Compare experiments
        
        You can compare experiments by displaying their best metrics on the valset or testset.
        
        ```
        python -m block.compare_tdiuc_val -d logs/tdiuc/block logs/tdiuc/mutan
        python -m block.compare_tdiuc_test -d logs/tdiuc/block logs/tdiuc/mutan
        ```
        
        ## Pretrained models
        
        ### VRD
        
        Download **Block**:
        ```
        mkdir -p logs/vrd
        cd logs/vrd
        wget http://data.lip6.fr/cadene/block/vrd/block.tar.gz
        tar -xzvf block.tar.gz
        ```
        
        Results `python -m block.compare_vrd_test -d logs/vrd/block`:
        - predicate.R_50: 86.3708
        - predicate.R_100: 92.4588
        - phrase.R_50: 25.4779
        - phrase.R_100: 29.7198
        - rel.R_50: 18.0806
        - rel.R_100: 21.1181
        
        ### VQA2
        
        Download **Block train/val**:
        ```
        mkdir -p logs/vqa2
        cd logs/vqa2
        wget http://data.lip6.fr/cadene/block/vqa2/block.tar.gz
        tar -xzvf block.tar.gz
        ```
        
        Results val (`python -m block.compare_vqa2_val -d logs/vqa2/block`):
        - overall (oe): 63.6
        - accuracy_top1: 54.4254
        
        
        Download **Block train+val/val/test**:
        ```
        mkdir -p logs/vqa2
        cd logs/vqa2
        wget http://data.lip6.fr/cadene/block/vqa2/block_trainval.tar.gz
        tar -xzvf block_trainval.tar.gz
        ```
        
        Results test-dev (EvalAI):
        - overall: 66.74
        - yes/no: 83.73
        - number: 46.51
        - other: 56.84
        
        
        Download **Block train+val+vg/val/test**:
        ```
        mkdir -p logs/vqa2
        cd logs/vqa2
        wget http://data.lip6.fr/cadene/block/vqa2/block_trainval_vg.tar.gz
        tar -xzvf block_trainval_vg.tar.gz
        ```
        
        Results test-dev (EvalAI):
        - overall: 67.41
        - yes/no: 83.89
        - number: 46.22
        - other: 58.18
        
        
        ### TDIUC
        
        Download **Block train+val/val/test**:
        ```
        mkdir -p logs/tdiuc
        cd logs/tdiuc
        wget http://data.lip6.fr/cadene/block/tdiuc/block_trainval.tar.gz
        tar -xzvf block_trainval.tar.gz
        ```
        
        Results val (`python -m block.compare_tdiuc_val -d logs/tdiuc/block`):
        - accuracy_top1: 88.0195
        - acc_mpt_a: 72.2555
        - acc_mpt_h: 59.9484
        - acc_mpt_a_norm: 60.9635
        - acc_mpt_h_norm: 44.7724
        
        Results test (`python -m block.compare_tdiuc_test -d logs/tdiuc/block`):
        - accuracy_top1: 86.3242
        - acc_mpt_a: 72.4447
        - acc_mpt_h: 66.15
        - acc_mpt_a_norm: 58.5728
        - acc_mpt_h_norm: 38.8279
        
        
        ## Documentation
        
        ### Block
        
        [code](https://github.com/Cadene/block.bootstrap.pytorch/blob/master/models/networks/fusions/fusions.py#L30)
        
        <!-- <img src="http://latex2png.com/output//latex_fae4ddee815f7e0a6a1ffadae34b463e.png" />
         -->
        
        ### ConcatMLP
        
        [code](https://github.com/Cadene/block.bootstrap.pytorch/blob/master/models/networks/fusions/fusions.py#L590)
        
        ### LinearSum
        
        <img src="http://latex2png.com/output//latex_4ddc1e548ad7573a1d3a898226a29c95.png" width="200"/>
        
        [code](https://github.com/Cadene/block.bootstrap.pytorch/blob/master/models/networks/fusions/fusions.py#L531)
        
        ### MLB
        
        [code](https://github.com/Cadene/block.bootstrap.pytorch/blob/master/models/networks/fusions/fusions.py#L284)
        
        ### Tucker
        
        [code](https://github.com/Cadene/block.bootstrap.pytorch/blob/master/models/networks/fusions/fusions.py#L233)
        
        ### Mutan
        
        [code](https://github.com/Cadene/block.bootstrap.pytorch/blob/master/models/networks/fusions/fusions.py#L175)
        
        ### BlockTucker
        
        [code](https://github.com/Cadene/block.bootstrap.pytorch/blob/master/models/networks/fusions/fusions.py#L104)
        
        ### MFB
        
        [code](https://github.com/Cadene/block.bootstrap.pytorch/blob/master/models/networks/fusions/fusions.py#L343)
        
        ### MFH
        
        [code](https://github.com/Cadene/block.bootstrap.pytorch/blob/master/models/networks/fusions/fusions.py#L407)
        
        ### MCB
        
        /!\ Not available in pytorch 1.0 - Avaiable in pytorch 0.3 and 0.4
        
        [code](https://github.com/Cadene/block.bootstrap.pytorch/blob/master/models/networks/fusions/fusions.py#L500)
        
        
        
        ## Useful commands
        
        ### Compare experiments
        
        ```
        python -m bootstrap.compare -d \
        logs/recipe1m/adamine \
        logs/recipe1m/avg \
        -k eval_epoch.metric.recall_at_1_im2recipe_mean max
        ```
        
        Results:
        ```
        ## eval_epoch.metric.recall_at_1_im2recipe_mean
        
          Place  Method      Score    Epoch
        -------  --------  -------  -------
              1  adamine    0.3827       76
              2  avg        0.3201       51
        ```
        
        ### Use a specific GPU
        
        For a specific experiment:
        ```
        CUDA_VISIBLE_DEVICES=0 python -m boostrap.run -o block/options/vqa_block.yaml
        ```
        
        For the current terminal session:
        ```
        export CUDA_VISIBLE_DEVICES=0
        ```
        
        ### Overwrite an option
        
        The boostrap.pytorch framework makes it easy to overwrite a hyperparameter. In this example, we run an experiment with a non-default learning rate. Thus, I also overwrite the experiment directory path:
        ```
        python -m bootstrap.run -o block/options/vqa_block.yaml \
        --optimizer.lr 0.0003 \
        --exp.dir logs/vqa/block_lr,0.0003
        ```
        
        ### Resume training
        
        If a problem occurs, it is easy to resume the last epoch by specifying the options file from the experiment directory while overwritting the `exp.resume` option (default is None):
        ```
        python -m bootstrap.run -o logs/vqa/block/options.yaml \
        --exp.resume last
        ```
        
        ### Web API
        
        ```
        TODO
        ```
        
        ### Extract your own image features
        
        ```
        TODO
        ```
        
        
        ## Citation
        
        ```
        @InProceedings{BenYounes_2019_AAAI,
            author = {Ben-Younes, Hedi and Cadene, Remi and Thome, Nicolas and Cord, Matthieu},
            title = {BLOCK: {B}ilinear {S}uperdiagonal {F}usion for {V}isual {Q}uestion {A}nswering and {V}isual {R}elationship {D}etection},
            booktitle = {The Thirty-Third AAAI Conference on Artificial Intelligence},
            year = {2019},
            url = {https://arxiv.org/abs/TODO}
        }
        ```
        
        ## Poster
        
        <p align="center">
            <a href="http://remicadene.com/pdfs/poster_aaai2019.pdf"><img src="https://github.com/Cadene/block.bootstrap.pytorch/blob/master/assets/poster_aaai2019.png?raw=true" width="300"/></a>
        </p>
        
        ## Authors
        
        This code was made available by [Hedi Ben-Younes](https://twitter.com/labegne) (Sorbonne-Heuritech), [Remi Cadene](http://remicadene.com) (Sorbonne), [Matthieu Cord](http://webia.lip6.fr/~cord) (Sorbonne) and [Nicolas Thome](http://webia.lip6.fr/~thomen) (CNAM). 
        
        ## Acknowledgment
        
        Special thanks to the authors of [VQA](TODO) and [VRD](TODO), the datasets used in this research project.
        
Keywords: pytorch block vqa vrd visual question answering visual relationship detection relation bootstrap deep learning aaai
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Build Tools
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.7
