Skip to content

tf_vae.json empty after running vae_train.py #31

Open
@asolano

Description

@asolano

Greetings,

I am trying to reproduce the experiment on a DGX station I currently have access to, and the fist two steps looks alright, but the result of the command:

$ python vae_train.py
...
step 298000 35.82913 3.7688284 32.0603
step 298500 34.947067 2.9355032 32.011562
step 299000 35.83263 3.8249977 32.007633
step 299500 36.45114 4.418231 32.03291
step 300000 35.098816 3.0974069 32.001408
step 300500 35.483387 3.4664068 32.01698
step 301000 35.43274 3.4285662 32.004173

is an empty array:

$ cat tf_vae/vae.json 
[]

According to the documentation the model should be saved on that file, so any hint about where to look for the problem is appreciated.

Thanks,

Alfredo

PS: I am using the following Dockerfile to recreate the environment in the paper, in case in might be relevant:

FROM tensorflow/tensorflow:1.8.0-gpu-py3

# gym-doom requirements
RUN apt-get update && apt-get install -y --no-install-recommends \
        cmake \
        zlib1g-dev \
        libjpeg-dev \
        libboost-all-dev \
        gcc \
        libsdl2-dev \
        wget \
        unzip \
        python3-tk \
        && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# make python3 the default
RUN update-alternatives --remove python /usr/bin/python2 && \
    update-alternatives --install /usr/bin/python python /usr/bin/python3 10

# NOTE overriding numpy version to match the paper's
# NOTE numpy==1.13.3 gives an error importing vizdoom
RUN pip install --upgrade pip && \
    pip install --no-cache-dir --user --upgrade \
        gym==0.9.4 \
        ppaquette-gym-doom==0.0.6 \
        cma==2.2.0  \
        mpi4py==2.0.0

ENTRYPOINT ["/bin/bash"]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions