You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently using the 17K-Graffiti dataset for my research, and I've encountered an issue where a significant number of image files are missing from the training set. Specifically, 1131 images referenced in the annotations file could not be found in the dataset I downloaded. Below is the code snippet I used to identify the missing files:
importosimportpandasaspdimportmatplotlib.pyplotaspltimportmatplotlib.imageasmpimgfromtqdmimporttqdm# Define pathsbase_path='../Data/17kGraffiti'test_path=os.path.join(base_path, 'test/graffiti')
train_path=os.path.join(base_path, 'train/graffiti')
test_labels_path=os.path.join(base_path, 'test_bboxes.pkl')
train_labels_path=os.path.join(base_path, 'train_bboxes.pkl')
# Load labelsdefload_labels_pandas(pkl_path):
returnpd.read_pickle(pkl_path)
# Convert bounding boxes to YOLO formatdefconvert_bbox_to_yolo(size, bbox):
dw=1./size[0]
dh=1./size[1]
x= (bbox[0] +bbox[2]) /2.0y= (bbox[1] +bbox[3]) /2.0w=bbox[2] -bbox[0]
h=bbox[3] -bbox[1]
x=x*dww=w*dwy=y*dhh=h*dhreturn (x, y, w, h)
# Save YOLO labelsdefsave_yolo_labels(labels, img_dir, output_dir):
ifnotos.path.exists(output_dir):
os.makedirs(output_dir)
foridx, rowintqdm(labels.iterrows(), total=labels.shape[0], desc="Processing Labels"):
img_file=row['FileName'] +'.jpg'img_path=os.path.join(img_dir, img_file)
try:
img=mpimg.imread(img_path)
exceptFileNotFoundError:
print(f"Image not found: {img_path}")
continueiflen(img.shape) ==3:
h, w, _=img.shapeeliflen(img.shape) ==2:
h, w=img.shapeelse:
print(f"Unexpected image shape: {img.shape} for image {img_path}")
continuebboxes=row['bbox']
withopen(os.path.join(output_dir, row['FileName'] +'.txt'), 'w') asf:
forbboxinbboxes:
yolo_bbox=convert_bbox_to_yolo((w, h), bbox)
f.write(f"0 {yolo_bbox[0]}{yolo_bbox[1]}{yolo_bbox[2]}{yolo_bbox[3]}\n")
ifidx<5:
print(f"File: {row['FileName']}")
print(f"YOLO bbox: {yolo_bbox}\n")
# Load labelsprint("Loading labels...")
test_labels=load_labels_pandas(test_labels_path)
train_labels=load_labels_pandas(train_labels_path)
# Print sample labelsprint("Test labels sample:")
print(test_labels.head())
print("\nTrain labels sample:")
print(train_labels.head())
# Check for missing filestrain_files=os.listdir(train_path)
missing_files= []
forfilenameintrain_labels['FileName']:
iff"{filename}.jpg"notintrain_files:
missing_files.append(filename)
print(f"Number of missing files in training set: {len(missing_files)}")
print("Sample missing files:", missing_files[:5])
ifmissing_files:
updated_train_labels=train_labels[~train_labels['FileName'].isin(missing_files)]
else:
updated_train_labels=train_labelsprint(f"Updated number of training labels: {len(updated_train_labels)}")
# Convert and save labels in YOLO formatprint("Converting and saving labels to YOLO format...")
save_yolo_labels(updated_train_labels, train_path, os.path.join(base_path, 'train/labels'))
save_yolo_labels(test_labels, test_path, os.path.join(base_path, 'test/labels'))
print("Conversion and saving complete.")
Sample Output:
Number of missing files in the training set: 1131
Sample missing files: ['10008971653_d32f09b87b_c', '10034339546_8a2486cbc9_c', '10112431913_06b2dfb89a_c', '10121326145_df091e3dd8_c', '10185187695_39e7589395_c']
I also retrieved the annotations from this repository, but it seems that my dataset might be incomplete or incorrectly structured. Could you please verify whether there might be an issue with the provided dataset or advise on how to resolve this?
Thank you in advance for your help!
The text was updated successfully, but these errors were encountered:
Hi,
I'm currently using the 17K-Graffiti dataset for my research, and I've encountered an issue where a significant number of image files are missing from the training set. Specifically, 1131 images referenced in the annotations file could not be found in the dataset I downloaded. Below is the code snippet I used to identify the missing files:
Sample Output:
I also retrieved the annotations from this repository, but it seems that my dataset might be incomplete or incorrectly structured. Could you please verify whether there might be an issue with the provided dataset or advise on how to resolve this?
Thank you in advance for your help!
The text was updated successfully, but these errors were encountered: