Open
Description
System information
- What is the top-level directory of the model you are using: models/research/object_detection
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
- TensorFlow installed from (source or binary): Source
- TensorFlow version (use command below): r1.15
- Bazel version (if compiling from source):
- CUDA/cuDNN version: 10.0
- GPU model and memory: K80/12GB
- Exact command to reproduce:
Describe the problem
I am using ssd_mobilenet_v2_quantized_300x300_coc0.config file from object detection API for quantization aware model training using tran.py script from legacy.
When I use num_clones=number of GPU
and perform quantization aware training, training goes fine but export_tflite_ssd_graph.py gives following error
I1104 20:35:37.946473 139919232743168 saver.py:1284] Restoring parameters from /home/ubuntu/dvs_human_detection/q_train_mc/model.ckpt-1000
2019-11-04 20:35:38.959815: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key BoxPredictor_0/BoxEncodingPredictor/act_quant/max not found in checkpoint
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/ubuntu/.local/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/ubuntu/.local/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: Key BoxPredictor_0/BoxEncodingPredictor/act_quant/max not found in checkpoint
[[{{node save/RestoreV2}}]]
(1) Not found: Key BoxPredictor_0/BoxEncodingPredictor/act_quant/max not found in checkpoint
[[{{node save/RestoreV2}}]]
[[save/RestoreV2/_711]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.5/site-packages/tensorflow_core/python/training/saver.py", line 1290, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/ubuntu/.local/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/ubuntu/.local/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/ubuntu/.local/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/ubuntu/.local/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: Key BoxPredictor_0/BoxEncodingPredictor/act_quant/max not found in checkpoint
[[node save/RestoreV2 (defined at /.local/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
(1) Not found: Key BoxPredictor_0/BoxEncodingPredictor/act_quant/max not found in checkpoint
[[node save/RestoreV2 (defined at /.local/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[save/RestoreV2/_711]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'save/RestoreV2':
File "/models/research/object_detection/export_tflite_ssd_graph.py", line 143, in <module>
tf.app.run(main)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/.local/lib/python3.5/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/.local/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/models/research/object_detection/export_tflite_ssd_graph.py", line 139, in main
FLAGS.max_classes_per_detection, use_regular_nms=FLAGS.use_regular_nms)
File "/models/research/object_detection/export_tflite_ssd_graph_lib.py", line 287, in export_tflite_graph
moving_average_checkpoint.name)
File "/models/research/object_detection/exporter.py", line 111, in replace_variable_values_with_moving_averages
read_saver = tf.train.Saver(ema_variables_to_restore)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
self.build()
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
build_restore=build_restore)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/training/saver.py", line 508, in _build_internal
restore_sequentially, reshape)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
restore_sequentially)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
name=name)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.5/site-packages/tensorflow_core/python/training/saver.py", line 1300, in restore
names_to_keys = object_graph_key_mapping(save_path)
File "/home/ubuntu/.local/lib/python3.5/site-packages/tensorflow_core/python/training/saver.py", line 1618, in object_graph_key_mapping
object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
File "/home/ubuntu/.local/lib/python3.5/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 915, in get_tensor
return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/models/research/object_detection/export_tflite_ssd_graph.py", line 143, in <module>
tf.app.run(main)
File "/home/ubuntu/.local/lib/python3.5/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/ubuntu/.local/lib/python3.5/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/home/ubuntu/.local/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/home/ubuntu/models/research/object_detection/export_tflite_ssd_graph.py", line 139, in main
FLAGS.max_classes_per_detection, use_regular_nms=FLAGS.use_regular_nms)
File "/home/ubuntu/models/research/object_detection/export_tflite_ssd_graph_lib.py", line 287, in export_tflite_graph
moving_average_checkpoint.name)
File "/home/ubuntu/models/research/object_detection/exporter.py", line 112, in replace_variable_values_with_moving_averages
read_saver.restore(sess, current_checkpoint_file)
File "/home/ubuntu/.local/lib/python3.5/site-packages/tensorflow_core/python/training/saver.py", line 1306, in restore
err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
2 root error(s) found.
(0) Not found: Key BoxPredictor_0/BoxEncodingPredictor/act_quant/max not found in checkpoint
[[node save/RestoreV2 (defined at /.local/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
(1) Not found: Key BoxPredictor_0/BoxEncodingPredictor/act_quant/max not found in checkpoint
[[node save/RestoreV2 (defined at /.local/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[save/RestoreV2/_711]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'save/RestoreV2':
File "/models/research/object_detection/export_tflite_ssd_graph.py", line 143, in <module>
tf.app.run(main)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/.local/lib/python3.5/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/.local/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/models/research/object_detection/export_tflite_ssd_graph.py", line 139, in main
FLAGS.max_classes_per_detection, use_regular_nms=FLAGS.use_regular_nms)
File "/models/research/object_detection/export_tflite_ssd_graph_lib.py", line 287, in export_tflite_graph
moving_average_checkpoint.name)
File "/models/research/object_detection/exporter.py", line 111, in replace_variable_values_with_moving_averages
read_saver = tf.train.Saver(ema_variables_to_restore)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
self.build()
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
build_restore=build_restore)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/training/saver.py", line 508, in _build_internal
restore_sequentially, reshape)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
restore_sequentially)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
name=name)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/.local/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
But when I set num_clones=1
, training and export both works.
But single GPU training is very slow.
How can I perform quantization aware training on multi GPU?
Config file I used
model {
ssd {
num_classes: 1
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
feature_extractor {
type: "ssd_mobilenet_v2"
depth_multiplier: 0.6
min_depth: 16
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 3.99999989895e-05
}
}
initializer {
truncated_normal_initializer {
mean: 0.0
stddev: 0.0299999993294
}
}
activation: RELU_6
batch_norm {
decay: 0.999700009823
center: true
scale: true
epsilon: 0.0010000000475
train: true
}
}
}
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
}
}
similarity_calculator {
iou_similarity {
}
}
box_predictor {
convolutional_box_predictor {
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 3.99999989895e-05
}
}
initializer {
truncated_normal_initializer {
mean: 0.0
stddev: 0.0299999993294
}
}
activation: RELU_6
batch_norm {
decay: 0.999700009823
center: true
scale: true
epsilon: 0.0010000000475
train: true
}
}
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.800000011921
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
}
}
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.20000000298
max_scale: 0.949999988079
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.333299994469
}
}
post_processing {
batch_non_max_suppression {
score_threshold: 9.99999993923e-09
iou_threshold: 0.600000023842
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
normalize_loss_by_num_matches: true
loss {
localization_loss {
weighted_smooth_l1 {
}
}
classification_loss {
weighted_sigmoid {
}
}
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.990000009537
loss_type: CLASSIFICATION
max_negatives_per_positive: 3
min_negatives_per_image: 3
}
classification_weight: 1.0
localization_weight: 1.0
}
}
}
train_config {
batch_size: 24
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
optimizer {
rms_prop_optimizer {
learning_rate {
exponential_decay_learning_rate {
initial_learning_rate: 0.00400000018999
decay_steps: 800720
decay_factor: 0.949999988079
}
}
momentum_optimizer_value: 0.899999976158
decay: 0.899999976158
epsilon: 1.0
}
}
#fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
#from_detection_checkpoint: true
num_steps: 1000
}
train_input_reader {
label_map_path: "./label_map.pbtxt"
tf_record_input_reader {
input_path: "./train_pos_neg_v2/train_dataset.record-00000-of-00100"
}
}
eval_config {
num_examples: 6200
metrics_set: "coco_detection_metrics"
use_moving_averages: true
include_metrics_per_category: true
}
eval_input_reader {
label_map_path: "./label_map.pbtxt"
shuffle: false
num_readers: 1
tf_record_input_reader {
input_path: "./test_pos_neg_v2/test_dataset.record-00000-of-00010"
}
}
graph_rewriter {
quantization {
delay: 500
weight_bits: 8
activation_bits: 8
}
}
NOTE* When I download pretrained model from zoo, conversion works