Skip to content

Checkpoint restoration for VAEGAN needs to account for global step #82

@indraastra

Description

@indraastra

The VAEGAN training code saves checkpoints using the value of the global training step, which results in checkpoints with names like 'vaegan.ckpt-800.index', for example. Any code that looks for an existing checkpoint also needs to account for this naming scheme, but the existence check used doesn't quite work with this scheme:

if os.path.exists(ckpt_name + '.index') or os.path.exists(ckpt_name):

I would suggest changing the check to something like this:

    latest_checkpoint = tf.train.latest_checkpoint(os.path.dirname(ckpt_name))
    if latest_checkpoint:
        saver.restore(sess, latest_checkpoint)
        print("Model restored from checkpoint {}.".format(latest_checkpoint))
    else:
        print("Model checkpoint not found.")

(This won't quite work if checkpoints from multiple models are created in the same directory, since it relies on the presence of a file named 'checkpoint'.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions