-
Notifications
You must be signed in to change notification settings - Fork 1
Overview
Welcome to the tator_yolo_scripts wiki!
2023-07-05 Major Update: yolov5 is removed from the project. Scripts for working with user-installed versions of yolo are instead implemented
Clone the git repo and set up the venv environment
git clone [email protected]:WHOIGit/tator_yolo_scripts.git
cd tator_yolo_scripts
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp bin/mp4dump venv/bin/
Note: If you're setting this up on the HPC, make sure you are using the python3 module for the creation of venv, ie module load python3/3.9.10
. Enabling modules is otherwise already handled by sbatch scripts.
Note: to use the tator api you will need to update tator_token.txt
with an appropriate tator user-access token. See API Token. On scratch/IFCB/tator
a token for user "hpc" is already installed, as is the python venv environment.
For all following scripts, you may invoke the --help
flag to get a list of available command line arguments.
-
api_util.py
- This tool helps you explore the content available on tator. -
download_localizations.py
- This downloads specified localizations to a CSV. It can optionally download video frames and localization clippings too. -
convert_localization_csv_to_yolo_training.py
- Converts a localization csv to a directory suitable for use with yolo training. The CSV must contain a column of paths to local video frame files. -
convert_yolo_labels_to_localization_csv.py
- Converts a directory of yolo labels to a csv which can then be uploaded to tator. -
upload_localizations.py
- This uploads localizations from a csv, incl. custom attributes, to tator.
See train_pipeline_demo.sh
for an example for how the above scripts can be used to train a yolo model and upload the validation results to tator.
Scripts that end in .sbatch
are designed to be used with slurm's sbatch command and can be used directly from a login-node. They furthermore accept additional positional arguments which are be described for each script in the next sections. If you intend to use python scripts directly, first use srun
to access an appropriate node, then use source venv/bin/activate
to activate the installed environment.
Some sbatch scripts have a sister-script that runs an array of jobs. These scripts typically have the word "list" in their filename and accept as input a newline-deliminated text file of entries. Each entry in the list text file must follow the correct formatting as for the single-usecase script. Typically these are full-paths to video files or tiff directories.
When using sbatch arrays, the user will need to manually set the sbatch --array
variable. The script uses the array task-id to reference corresponding line-number/entry in the list file. The values can be entered in the sbatch file itself, or specified at runtime (directly after the sbatch command, but before the sbatch script file).
Here are some array examples.
-
--array=1-10
will launch 10 tasks, one for each of the first 10 entries of the list file. -
--array=3,6,9
will launch 3 tasks using as input the entries from lines three six and nine from the list file. This is useful for reprocessing certain entries from a given list -
--array=1-999
There is a 1000 queued-job limit per user, so very large lists will need multiple subsequent submissions. -
--array=1000-1999
Array values higher that 1000 are possible though. -
--array=11,22,50-60%1
Comma'd and range value can be mixed. Also NOTE the%1
at the end, this notation limits the number of concurrent jobs. "%1" ensures that only one job is run at a time. This can be important for uploading media so as not to overwhelm the server with simultaneous requests.
Creating a list file is most easily achieved using the find
command.
To create a list of video files for instance would look something like this: find /full/path/to/video/directory -type f -name "*.mp4" | sort > listfiles/video_list.txt
A list of tiff directories would look something like this: find /full/path/to/tiff_dir/parent/directory -type d -name "*DEPLOYMENT-ID*" | sort > listfiles/tiffdir_list.txt
Finally, you may need to edit scripts to get the intended effect. For instance, the default mail-to person in these scripts is needs to be entered/specified. These script expect uploads to go to a particular tator instance and project. HOST, PROJ_ID and MEDIA_ID may need to be changed in some files to target the appropriate endpoint. Finally, some automation like the determination cruise-id to be used as a "section" or folder on tator MAY NOT BE APPROPRIATE for all video filename formats. It is important to be aware of these details when processing jobs. It is always recommended to review a listfile before running an array of jobs.
The ISIIS project uses tiff images which need to be compiled to video before being uploaded to tator. In the isiis_tiff_scripts
are the following script designed to help with that process.
-
check_tiff_dir.sbatch
- Accepts a single parameter: a directory containing tiff frames. This script sequentially checks that all tiff frames in a directory are accounted. Any missing frames get listed in a file under the "tiffcheck_output" directory. The .txt file of missing frames bears the name of the tiff's parent directory. Missing frames and frames that have differing names (besides the frame number) halts the processing of tiff frames to video, so running this check for before running convert_tiff_dir_to_video.sbatch is important. Slurm log files for this script can be found underslogs/tiffcheck
-
check_tiff_dir_list.sbatch
- As above, but accepts a listfile of directories. -
convert_tiff_dir_to_video.sbatch
- Accepts a single parameter: a directory containing tiff frames. This script creates a .mp4 video file based on the tiff files in the input directory. Additionally, a title frame is added at the start of the video (tator is 0-indexed and the raw tiff frames are 1-indexed; an extra title frame helps with this discrepancy). The video file is created in the "tiff2video_output" directory and bears the name of the tiffs' parent directory. Slurm log files for this script can be found underslogs/tiff2video
-
convert_tiff_dir_list_to_videos.sbatch
- As above, but accepts a listfile of directories. The same listfile and array-values ascheck_tiff_dir_list.sbatch
should be used.
sbatch --array=3,5-11 check_tiff_dir_list.sbatch listfiles/tiff2vid_EN644_v1.txt
sbatch convert_tiff_dir_to_video.sbatch /vortexfs1/share/nes-lter/Stingray/data/NESLTER_EN644/NESLTER_EN644_23Aug2019_005
sbatch --array=3,5,7-11 convert_tiff_dir_list_to_videos.sbatch listfiles/tiff2vid_EN644_v1.txt
-
upload_video.sbatch
- Accepts a single parameter: a video file. This script uploads a video file up to tator. The project-id and media-type are pre-set in the script to correspond with the "ISIIS" project and "Shadowgraph" media-type of the current instance of tator at time of writing. This script handles the transcoding of the video file to streaming formats of different resolutions, as well as the uploading of those transcodes to tator. You must have "tator_token.txt" set to a proper user access token. Transcode files are saved undertator_transcode_workspace
. Slurm log files for this script can be found underslogs/video2tator
; check the log for errors if your upload fails. On tator, videos will be added to a "section" or Folder derived from the video's filename: specifically, the characters between the first and second "_" underscore are used (this typically corresponds with a cruise id, but not always!) -
upload_video_list.sbatch
- As above, but accepts a listfile of video filepaths.
sbatch --array=1-$(wc -l < listfiles/videos2tator_batch2.txt)%1 upload_video_list.sbatch listfiles/videos2tator_batch2.txt