Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook running on windows machine #17

Open
gteti opened this issue Jan 20, 2022 · 29 comments
Open

Notebook running on windows machine #17

gteti opened this issue Jan 20, 2022 · 29 comments

Comments

@gteti
Copy link

gteti commented Jan 20, 2022

Do you have any notebook on detection with custom data working on windows system ?

Also some of the code implemented in those notebook requires the object_detection library which works on numpy=1.18 while depthai requires numpy=1.21.

Thanks!

@Luxonis-Brandon
Copy link
Contributor

Not sure on Windows. @tersekmatija may know though.

@gteti
Copy link
Author

gteti commented Jan 22, 2022

@Luxonis-Brandon Thank you for your reply. I hope he help me out 😃
Have a nice weekend everyone!

@tersekmatija
Copy link
Contributor

Hey @gteti ,

All tutorials should theoretically be able to run on Windows, though it might be a bit hard to set up some environments. For example, you can find how to set up darknet for local YoloV3 or V4 training (here)[https://github.com/AlexeyAB/darknet#requirements-for-windows-linux-and-macos]. You should also install all the libraries that we import in the tutorial either using pip or conda. We provide them in Colab as it is easier to set up, and you can run them in a matter of few clicks. Are you looking for a specific tutorial?

Some of the tutorials are based on older versions of libraries, so that they are executable and there are no compatibility issues. Environments for training should be independent of pipeline environments, so you can train (run the tutorial) in a separate environment, and then use the blob to run it on a pipeline with DepthAI in another environment. In short, the version mismatch shouldn't be a problem :)

@gteti
Copy link
Author

gteti commented Jan 26, 2022

I want to be able to run your notebook https://github.com/luxonis/depthai-ml-training/blob/master/colab-notebooks/Easy_Object_Detection_With_Custom_Data_Demo_Training.ipynb on a Windows system. If it can't be done, which Azure service do you suggest to run the same notebook ? I'm not an expert of cloud computing but I'm seeing linux VM or "machine learning and AI" version.

Thank you

@tersekmatija
Copy link
Contributor

Have you tried installing the required libraries? You should install Python (3.7 or lower, since tutorial uses TF 1.x), and the install the required libraries (tensorflow 1, numpy, and others that appear through the notebook). You'd have to change some paths though. The easiest path would be to just run the notebook in Colab.

@gteti
Copy link
Author

gteti commented Jan 26, 2022

Have you tried installing the required libraries? You should install Python (3.7 or lower, since tutorial uses TF 1.x), and the install the required libraries (tensorflow 1, numpy, and others that appear through the notebook). You'd have to change some paths though. The easiest path would be to just run the notebook in Colab.

Thank you for your fast reply. I'll try it as you suggested. I was using Python 3.8 or later and I started having issues on depthai (I was using the same ENV) which requires a more recent numpy and object_detection library which uses an older version. I'll hopefully try this in a few hours and see how it goes on my Win laptop. I was even upgrading the Tensorflow code using the more recent one 😄 (silly me.. )

@gteti
Copy link
Author

gteti commented Jan 28, 2022

I managed to move further on running the notebook on my ubuntu VM. I got to the point of finally training the model but the process end (the notebook cell complete the operation) without the all the steps to take place. Do you, perhaps, know why? I'm using python 3.7 and tensorflow 1.15.0 on a ubuntu VM on my windows 10 system.
Thanks

This is the output and the point were it "stops". I'm trying now with tf 1.14.

WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0128 09:17:15.309371 139650078273920 model_lib.py:717] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting train_steps: 5000
I0128 09:17:15.309661 139650078273920 config_util.py:552] Maybe overwriting train_steps: 5000
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0128 09:17:15.309750 139650078273920 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: 1
I0128 09:17:15.309824 139650078273920 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: 1
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I0128 09:17:15.309894 139650078273920 config_util.py:552] Maybe overwriting eval_num_epochs: 1
INFO:tensorflow:Maybe overwriting load_pretrained: True
I0128 09:17:15.309963 139650078273920 config_util.py:552] Maybe overwriting load_pretrained: True
INFO:tensorflow:Ignoring config override key: load_pretrained
I0128 09:17:15.310080 139650078273920 config_util.py:562] Ignoring config override key: load_pretrained
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
W0128 09:17:15.310296 139650078273920 model_lib.py:733] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
INFO:tensorflow:create_estimator_and_inputs: use_tpu False, export_to_tpu False
I0128 09:17:15.310383 139650078273920 model_lib.py:768] create_estimator_and_inputs: use_tpu False, export_to_tpu False
INFO:tensorflow:Using config: {'_model_dir': '/home/user/Desktop/Python/content/training/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f0272d59250>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I0128 09:17:15.311410 139650078273920 estimator.py:212] Using config: {'_model_dir': '/home/user/Desktop/Python/content/training/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f0272d59250>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7f0271c2c830>) includes params argument, but params are not passed to Estimator.
W0128 09:17:15.311612 139650078273920 model_fn.py:630] Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7f0271c2c830>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Not using Distribute Coordinator.
I0128 09:17:15.311988 139650078273920 estimator_training.py:186] Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
I0128 09:17:15.312223 139650078273920 training.py:612] Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
I0128 09:17:15.312546 139650078273920 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
WARNING:tensorflow:From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0128 09:17:15.329903 139650078273920 deprecation.py:323] From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0128 09:17:15.360383 139650078273920 dataset_builder.py:83] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /home/user/Desktop/Python/content/models/research/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0128 09:17:15.367554 139650078273920 deprecation.py:323] From /home/user/Desktop/Python/content/models/research/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:From /home/user/Desktop/Python/content/models/research/object_detection/builders/dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0128 09:17:15.397883 139650078273920 deprecation.py:323] From /home/user/Desktop/Python/content/models/research/object_detection/builders/dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /home/user/Desktop/Python/content/models/research/object_detection/inputs.py:76: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0128 09:17:33.207298 139650078273920 deprecation.py:323] From /home/user/Desktop/Python/content/models/research/object_detection/inputs.py:76: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /home/user/Desktop/Python/content/models/research/object_detection/utils/ops.py:493: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0128 09:17:33.373834 139650078273920 deprecation.py:323] From /home/user/Desktop/Python/content/models/research/object_detection/utils/ops.py:493: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/autograph/operators/control_flow.py:1004: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
W0128 09:17:44.101332 139650078273920 api.py:332] From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/autograph/operators/control_flow.py:1004: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
WARNING:tensorflow:From /home/user/Desktop/Python/content/models/research/object_detection/inputs.py:258: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0128 09:17:49.850121 139650078273920 deprecation.py:323] From /home/user/Desktop/Python/content/models/research/object_detection/inputs.py:258: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
INFO:tensorflow:Calling model_fn.
I0128 09:17:55.011837 139650078273920 estimator.py:1148] Calling model_fn.
WARNING:tensorflow:From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W0128 09:17:55.433474 139650078273920 deprecation.py:323] From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
INFO:tensorflow:depth of additional conv before box predictor: 0
I0128 09:17:58.581957 139650078273920 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0128 09:17:58.619893 139650078273920 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0128 09:17:58.656145 139650078273920 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0128 09:17:58.693111 139650078273920 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0128 09:17:58.728884 139650078273920 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0128 09:17:58.765338 139650078273920 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
W0128 09:17:58.814242 139650078273920 variables_helper.py:153] Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_2_3x3_s2_512/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 256, 512]], model variable shape: [[3, 3, 256, 512]]. This variable will not be initialized from the checkpoint.
W0128 09:17:58.814424 139650078273920 variables_helper.py:153] Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_3_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 256]], model variable shape: [[3, 3, 128, 256]]. This variable will not be initialized from the checkpoint.
W0128 09:17:58.814539 139650078273920 variables_helper.py:153] Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_4_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 256]], model variable shape: [[3, 3, 128, 256]]. This variable will not be initialized from the checkpoint.
W0128 09:17:58.814652 139650078273920 variables_helper.py:153] Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 64, 128]], model variable shape: [[3, 3, 64, 128]]. This variable will not be initialized from the checkpoint.
WARNING:tensorflow:From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/rmsprop.py:119: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0128 09:18:07.349004 139650078273920 deprecation.py:506] From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/rmsprop.py:119: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
I0128 09:18:14.915427 139650078273920 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I0128 09:18:14.916795 139650078273920 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I0128 09:18:21.947349 139650078273920 monitored_session.py:240] Graph was finalized.
2022-01-28 09:18:21.951748: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2022-01-28 09:18:21.986072: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2591995000 Hz
2022-01-28 09:18:21.986795: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55837f158e50 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-01-28 09:18:21.986825: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-01-28 09:18:21.991117: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-01-28 09:18:21.991750: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2022-01-28 09:18:21.991788: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ubuntu1910vm): /proc/driver/nvidia/version does not exist
INFO:tensorflow:Restoring parameters from /home/user/Desktop/Python/content/training/model.ckpt-0
I0128 09:18:21.993026 139650078273920 saver.py:1284] Restoring parameters from /home/user/Desktop/Python/content/training/model.ckpt-0
WARNING:tensorflow:From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
W0128 09:18:25.348032 139650078273920 deprecation.py:323] From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
I0128 09:18:27.341648 139650078273920 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0128 09:18:28.024017 139650078273920 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /home/user/Desktop/Python/content/training/model.ckpt.
I0128 09:18:48.071689 139650078273920 basic_session_run_hooks.py:606] Saving checkpoints for 0 into /home/user/Desktop/Python/content/training/model.ckpt.

@tersekmatija
Copy link
Contributor

I'll CC @conorsim on this as he worked with this notebook recently.

@gteti
Copy link
Author

gteti commented Jan 28, 2022

I managed to go a little further by increasing the GB of RAM I set up for the VM. Now I'm at this crash output:

WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0128 14:47:00.409278 140083713515904 model_lib.py:717] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting train_steps: 1000
I0128 14:47:00.409551 140083713515904 config_util.py:552] Maybe overwriting train_steps: 1000
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0128 14:47:00.409642 140083713515904 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: 1
I0128 14:47:00.409811 140083713515904 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: 1
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I0128 14:47:00.409902 140083713515904 config_util.py:552] Maybe overwriting eval_num_epochs: 1
INFO:tensorflow:Maybe overwriting load_pretrained: True
I0128 14:47:00.409976 140083713515904 config_util.py:552] Maybe overwriting load_pretrained: True
INFO:tensorflow:Ignoring config override key: load_pretrained
I0128 14:47:00.410127 140083713515904 config_util.py:562] Ignoring config override key: load_pretrained
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
W0128 14:47:00.410353 140083713515904 model_lib.py:733] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
INFO:tensorflow:create_estimator_and_inputs: use_tpu False, export_to_tpu False
I0128 14:47:00.410442 140083713515904 model_lib.py:768] create_estimator_and_inputs: use_tpu False, export_to_tpu False
INFO:tensorflow:Using config: {'_model_dir': '/home/user/Desktop/Python/training/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f679d59e290>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I0128 14:47:00.411472 140083713515904 estimator.py:212] Using config: {'_model_dir': '/home/user/Desktop/Python/training/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f679d59e290>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7f679c4009e0>) includes params argument, but params are not passed to Estimator.
W0128 14:47:00.411842 140083713515904 model_fn.py:630] Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7f679c4009e0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Not using Distribute Coordinator.
I0128 14:47:00.412261 140083713515904 estimator_training.py:186] Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
I0128 14:47:00.412449 140083713515904 training.py:612] Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
I0128 14:47:00.412909 140083713515904 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
WARNING:tensorflow:From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0128 14:47:00.423244 140083713515904 deprecation.py:323] From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0128 14:47:00.459816 140083713515904 dataset_builder.py:83] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /home/user/Desktop/Python/content/models/research/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0128 14:47:00.466567 140083713515904 deprecation.py:323] From /home/user/Desktop/Python/content/models/research/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:From /home/user/Desktop/Python/content/models/research/object_detection/builders/dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0128 14:47:00.497107 140083713515904 deprecation.py:323] From /home/user/Desktop/Python/content/models/research/object_detection/builders/dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /home/user/Desktop/Python/content/models/research/object_detection/inputs.py:76: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0128 14:47:20.122996 140083713515904 deprecation.py:323] From /home/user/Desktop/Python/content/models/research/object_detection/inputs.py:76: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /home/user/Desktop/Python/content/models/research/object_detection/utils/ops.py:493: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0128 14:47:20.301322 140083713515904 deprecation.py:323] From /home/user/Desktop/Python/content/models/research/object_detection/utils/ops.py:493: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/autograph/operators/control_flow.py:1004: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
W0128 14:47:32.506782 140083713515904 api.py:332] From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/autograph/operators/control_flow.py:1004: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
WARNING:tensorflow:From /home/user/Desktop/Python/content/models/research/object_detection/inputs.py:258: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0128 14:47:39.211907 140083713515904 deprecation.py:323] From /home/user/Desktop/Python/content/models/research/object_detection/inputs.py:258: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
INFO:tensorflow:Calling model_fn.
I0128 14:47:45.380594 140083713515904 estimator.py:1148] Calling model_fn.
WARNING:tensorflow:From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W0128 14:47:45.933379 140083713515904 deprecation.py:323] From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
INFO:tensorflow:depth of additional conv before box predictor: 0
I0128 14:47:49.544731 140083713515904 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0128 14:47:49.587774 140083713515904 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0128 14:47:49.625890 140083713515904 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0128 14:47:49.664865 140083713515904 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0128 14:47:49.710493 140083713515904 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0128 14:47:49.753618 140083713515904 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
W0128 14:47:49.813167 140083713515904 variables_helper.py:153] Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_2_3x3_s2_512/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 256, 512]], model variable shape: [[3, 3, 256, 512]]. This variable will not be initialized from the checkpoint.
W0128 14:47:49.813374 140083713515904 variables_helper.py:153] Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_3_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 256]], model variable shape: [[3, 3, 128, 256]]. This variable will not be initialized from the checkpoint.
W0128 14:47:49.813501 140083713515904 variables_helper.py:153] Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_4_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 256]], model variable shape: [[3, 3, 128, 256]]. This variable will not be initialized from the checkpoint.
W0128 14:47:49.813614 140083713515904 variables_helper.py:153] Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 64, 128]], model variable shape: [[3, 3, 64, 128]]. This variable will not be initialized from the checkpoint.
WARNING:tensorflow:From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/rmsprop.py:119: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0128 14:47:59.193702 140083713515904 deprecation.py:506] From /home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/rmsprop.py:119: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
I0128 14:48:07.704100 140083713515904 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I0128 14:48:07.705442 140083713515904 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I0128 14:48:15.565379 140083713515904 monitored_session.py:240] Graph was finalized.
2022-01-28 14:48:15.567888: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2022-01-28 14:48:15.592850: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2591995000 Hz
2022-01-28 14:48:15.593048: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x556990790de0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-01-28 14:48:15.593065: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
INFO:tensorflow:Running local_init_op.
I0128 14:48:21.931502 140083713515904 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0128 14:48:22.609029 140083713515904 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /home/user/Desktop/Python/training/model.ckpt.
I0128 14:48:42.721936 140083713515904 basic_session_run_hooks.py:606] Saving checkpoints for 0 into /home/user/Desktop/Python/training/model.ckpt.
2022-01-28 14:49:00.066397: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 25920000 exceeds 10% of system memory.
2022-01-28 14:49:00.172177: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 25920000 exceeds 10% of system memory.
2022-01-28 14:49:00.228564: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 25920000 exceeds 10% of system memory.
2022-01-28 14:49:00.285738: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 25920000 exceeds 10% of system memory.
2022-01-28 14:49:00.339917: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 25920000 exceeds 10% of system memory.
INFO:tensorflow:loss = 6.091243, step = 1
I0128 14:49:08.479160 140083713515904 basic_session_run_hooks.py:262] loss = 6.091243, step = 1
INFO:tensorflow:global_step/sec: 0.222944
I0128 14:56:36.980542 140083713515904 basic_session_run_hooks.py:692] global_step/sec: 0.222944
INFO:tensorflow:loss = 2.5264313, step = 101 (448.527 sec)
I0128 14:56:36.995469 140083713515904 basic_session_run_hooks.py:260] loss = 2.5264313, step = 101 (448.527 sec)
INFO:tensorflow:Saving checkpoints for 131 into /home/user/Desktop/Python/training/model.ckpt.
I0128 14:58:51.453722 140083713515904 basic_session_run_hooks.py:606] Saving checkpoints for 131 into /home/user/Desktop/Python/training/model.ckpt.
Traceback (most recent call last):
  File "/home/user/Desktop/Python/content/models/research/object_detection/model_main.py", line 114, in <module>
    tf.app.run()
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/home/user/Desktop/Python/content/models/research/object_detection/model_main.py", line 110, in main
    tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
    return executor.run()
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run
    return self.run_local()
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
    saving_listeners=saving_listeners)
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default
    saving_listeners)
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1494, in _train_with_estimator_spec
    _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1259, in run
    run_metadata=run_metadata)
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1426, in run
    run_metadata=run_metadata))
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/basic_session_run_hooks.py", line 594, in after_run
    if self._save(run_context.session, global_step):
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/training/basic_session_run_hooks.py", line 619, in _save
    if l.after_save(session, step):
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 519, in after_save
    self._evaluate(global_step_value)  # updates self.eval_result
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 539, in _evaluate
    self._evaluator.evaluate_and_export())
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 920, in evaluate_and_export
    hooks=self._eval_spec.hooks)
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 480, in evaluate
    name=name)
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 522, in _actual_eval
    return _evaluate()
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 504, in _evaluate
    self._evaluate_build_graph(input_fn, hooks, checkpoint_path))
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1511, in _evaluate_build_graph
    self._call_model_fn_eval(input_fn, self.config))
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1544, in _call_model_fn_eval
    input_fn, ModeKeys.EVAL)
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1025, in _get_features_and_labels_from_input_fn
    self._call_input_fn(input_fn, mode))
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1116, in _call_input_fn
    return input_fn(**kwargs)
  File "/home/user/Desktop/Python/content/models/research/object_detection/inputs.py", line 808, in _eval_input_fn
    params=params)
  File "/home/user/Desktop/Python/content/models/research/object_detection/inputs.py", line 931, in eval_input
    reduce_to_frame_fn=reduce_to_frame_fn)
  File "/home/user/Desktop/Python/content/models/research/object_detection/builders/dataset_builder.py", line 184, in build
    config.input_path[:], input_reader_config, filename_shard_fn=shard_fn)
  File "/home/user/Desktop/Python/content/models/research/object_detection/builders/dataset_builder.py", line 75, in read_dataset
    filenames = tf.gfile.Glob(input_files)
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/lib/io/file_io.py", line 363, in get_matching_files
    return get_matching_files_v2(filename)
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/lib/io/file_io.py", line 390, in get_matching_files_v2
    for single_filename in pattern
  File "/home/user/anaconda3/envs/trainenv/lib/python3.7/site-packages/tensorflow_core/python/lib/io/file_io.py", line 392, in <listcomp>
    compat.as_bytes(single_filename))
tensorflow.python.framework.errors_impl.NotFoundError: content; No such file or directory

@conorsim
Copy link
Collaborator

The tensorflow.python.framework.errors_impl.NotFoundError: content; No such file or directory error might be referring to the /content directory in Colab. Have you ensured all of your paths don't have /content in them and instead your present working directory?

@gteti
Copy link
Author

gteti commented Jan 28, 2022

The tensorflow.python.framework.errors_impl.NotFoundError: content; No such file or directory error might be referring to the /content directory in Colab. Have you ensured all of your paths don't have /content in them and instead your present working directory?

Thank you for the reply. I've changed EVERY path to the machine where I'm running the notebook. E.g. /home/user/Desktop/Python/content/

That plus the increase in memory for the VM managed to get me a step further. Do I need to change the path in every .py like object_detection etc. ?

@conorsim
Copy link
Collaborator

What are you using for model_dir? In the Colab version model_dir = training/ which is a relative path. And the PWD at that time is /content/models/research, so the full path for saving checkpoints is /content/models/research/training. From the stack trace, I can see you're trying to save to /home/user/Desktop/Python/training. Maybe that is causing the error?

So, maybe try changing model_dir or ensuring your PWD is /home/user/Desktop/Python/content/models/research?

@gteti
Copy link
Author

gteti commented Jan 28, 2022

What are you using for model_dir? In the Colab version model_dir = training/ which is a relative path. And the PWD at that time is /content/models/research, so the full path for saving checkpoints is /content/models/research/training. From the stack trace, I can see you're trying to save to /home/user/Desktop/Python/training. Maybe that is causing the error?

So, maybe try changing model_dir or ensuring your PWD is /home/user/Desktop/Python/content/models/research?

Yes, I'm using /home/user/Desktop/Python/content/training/. Since there were no training folder I thought I could create one "elsewhere". I'll try to test it first thing in tomorrow morning. Thank you again

For now, I'm able to run the same thing on my home PC -> similar VM with ubuntu and I got to model.ckpt-74 data. Of course with the wrong training folder.

@gteti
Copy link
Author

gteti commented Jan 30, 2022

I was able to run the training code on 1 VM of mine. I stopped the training after completing 1000 steps and rerun it with fewer steps (from 5000 to 1500) because I wanted to see the rest of the notebook working. I think that it doesn't matter where the training folder is, what matters is configuring correctly the path of os.environ['PYTHONPATH'] += ':/content/models/research/:/content/models/research/slim/' .

I've finished the 1500 steps run now, and I finally got the training/export folder where the saved_model.pb is. I'll finally see the rest of notebook and I HOPE I can reproduce this working environment on a real machine instead of a VM.

@gteti
Copy link
Author

gteti commented Jan 30, 2022

I'm stuck now because of openvino installation not supported for Ubuntu. The proposed l_openvino_toolkit_p_2021.3.394 is not supported for ubuntu NOT LTS.

So by looking and openvino documentation: https://docs.openvino.ai/latest/openvino_docs_install_guides_installing_openvino_apt.html I was able to install it manually using the sudo apt install intel-openvino-runtime-ubuntu20-2021.2.220. This doesn't install the module model_optimizer/extensions/front/tf/ required from the notebook code.

So I thought about installing the dependecies which for me are located at /opt/intel/openvino_2021.2.200/install_dependencies/install_openvino_dependencies.sh. I managed to run it (avoiding some errors in the script) but I'm still missing content in the folder /opt/intel/openvino_2021.2.200/deployment_tools/ the folder model_optimizer/extensions/front/tf/ is not present. I will retry it with an LTS version.

@conorsim
Copy link
Collaborator

Just FYI, on my Ubuntu 20.04 install I am using openvino_2021.4.582 and I checked and I have that directory. I also haven't had any problems with it.

@gteti
Copy link
Author

gteti commented Jan 31, 2022

Just FYI, on my Ubuntu 20.04 install I am using openvino_2021.4.582 and I checked and I have that directory. I also haven't had any problems with it.

Is your ubuntu an LTS version or a normal one ? I'll try with that version too. Thanks!

@conorsim
Copy link
Collaborator

It's an LTS version.

@gteti
Copy link
Author

gteti commented Feb 1, 2022

I managed to install a Ubuntu 18.04 LTS and was, finally, able to install (all or in part) OpenVino. Once I am running the notebook I arrived at this point where I get a permission error
image

I created this VM with vagrant but I've added the user to sudoers. Do you know why this is happening ?

@conorsim
Copy link
Collaborator

conorsim commented Feb 1, 2022

Have you tried sudo chmod 777 ssd_v2_support.json? This should allow write permissions on the file.

@gteti
Copy link
Author

gteti commented Feb 1, 2022

Have you tried sudo chmod 777 ssd_v2_support.json? This should allow write permissions on the file.

I did the whole folder, hoping it would be enough 😄

@gteti
Copy link
Author

gteti commented Feb 1, 2022

I've tried on the same VM created on a different computer, the vagrant file is the same. I just changed Python from 3.7 to 3.6 because I saw on the other machine installation of OpenVino that it was installing python 3.6.*. Now I've previous had the chmod 777 setup on the /opt/intel/ folder but I get this error:
immagine

Also when I try to use /opt/intel/openvino_2021/ it automatically moves to openvino_2021.3.394.

I have to forcefully pass through OpenVino to have my tensorflow converted to .blob for use on DepthAI module ?

@gteti
Copy link
Author

gteti commented Feb 1, 2022

I just found out http://blobconverter.luxonis.com/ 😄 you are awesome, guys!!
I just have to figure out how to use it to convert this
immagine
Into
immagine

@gteti
Copy link
Author

gteti commented Feb 2, 2022

I was finally able to create a BLOB export of the model. Following the guide I've created a folder for a custom_mobilenet. I also downloaded the ssd_mobilenet_v2_coco.config ssd_mobilenet_v2_coco.config.txt file and modified it, changing the PATH_TO_BE_CONFIGURED into (in my case) the path of the custom_mobilenet. Where I've added:

  • label_map.pbtxt
  • train.record
  • test.record
  • pretrained_model folder which I thought was the only one, of the possible output with a model.ckpt (if not, where I can find model.ckpt ?)

Running the command

python3 depthai_demo.py -cnn custom_mobilenet

Gets me the following error. What am I doing wrong ?

[I'm training and creating a blob on 2 separate VM with ubuntu and I'm running the python depthai_demo on my main host windows]
[I had to reinstall the depthai version present in the install_requirements because I had upgraded as stated in https://github.com/luxonis/depthai-experiments/issues/280]

image

Thanks 😄

@conorsim
Copy link
Collaborator

conorsim commented Feb 2, 2022

I don't know the demo very well but I think you may want -cnnp and full path to your blob instead of -cnn. I believe -cnn refers to an existing model in a model zoo instead of a custom model. However, I would recommend using the code here and changing nnPathDefault to be the path to your custom model. You may have to change other things like the label map to fit the needs of your custom model.

@gteti
Copy link
Author

gteti commented Feb 3, 2022

I've tried with your -cnnp and I gave as the path ~/Python\depthai\custom_mobilenet\frozen_inference_graph_openvino_2021.4_5shave.blob but I get this error.
image

The downloaded blob contained a .config.json file which can't be used. I modified the ssd_mobilenet_v2 as requested and even copied the depthai/resources/nn/mobilenet-ssd json file inside the the same folder. The PB file is in the same folder but not shown.
image

I've been running the code as explained in the notebook:
image

@gteti
Copy link
Author

gteti commented Feb 3, 2022

After redoing the conversion of the model with the pipeline.config full path added
image

I started relaunching the demo as you suggested

python depthai_demo.py -cnnp ~\Python\depthai\custom_mobilenet

And the demo finally started. When I choose, from the dropdown list the custom_model I obtained a new error saying that a custom_model.json was missing in my folder. I've added that file, copying the mobilenet-ssd.json with a different name and now I get this error:
image

@conorsim
Copy link
Collaborator

conorsim commented Feb 8, 2022

Hi, sorry for the delay. Could you try checking OpenVINO version compatibility with by checking the required OpenVINO version of your pipeline with getRequiredOpenVINOVersion() and set the version to that version with setOpenVINOVersion()?

https://docs.luxonis.com/projects/api/en/latest/references/python/?highlight=python%20api#depthai.Pipeline.setOpenVINOVersion

@gteti
Copy link
Author

gteti commented Feb 17, 2022

Hi, sorry for the delay. Could you try checking OpenVINO version compatibility with by checking the required OpenVINO version of your pipeline with getRequiredOpenVINOVersion() and set the version to that version with setOpenVINOVersion()?

https://docs.luxonis.com/projects/api/en/latest/references/python/?highlight=python%20api#depthai.Pipeline.setOpenVINOVersion

Sorry for the late reply, I've been very busy.
I've printed the getOpenVINO and getRequiredOpenVINO

print("Version: ", self._pm.pipeline.getOpenVINOVersion())
print("Required: ", self._pm.pipeline.getRequiredOpenVINOVersion())

Output:

Version:  Version.VERSION_2021_4
Required:  None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants