I would like to convert my coco JSON file as follows:
The CSV file with annotations should contain one annotation per line. Images with multiple bounding boxes should use one row per bounding box. Note that indexing for pixel values starts at 0. The expected format of each line is:
path/to/image.jpg,x1,y1,x2,y2,class_name
A full example:
*/data/imgs/img_001.jpg,837,346,981,456,cow
/data/imgs/img_002.jpg,215,312,279,391,cat
/data/imgs/img_002.jpg,22,5,89,84,bird
This defines a dataset with 3 images: img_001.jpg
contains a cow, img_002.jpg
contains a cat and a bird, and img_003.jpg
contains no interesting objects/animals.
How could I do that?
I have such function.
def convert_coco_json_to_csv(filename):
import pandas as pd
import json
# COCO2017/annotations/instances_val2017.json
s = json.load(open(filename, 'r'))
out_file = filename[:-5] + '.csv'
out = open(out_file, 'w')
out.write('id,x1,y1,x2,y2,label\n')
all_ids = []
for im in s['images']:
all_ids.append(im['id'])
all_ids_ann = []
for ann in s['annotations']:
image_id = ann['image_id']
all_ids_ann.append(image_id)
x1 = ann['bbox'][0]
x2 = ann['bbox'][0] + ann['bbox'][2]
y1 = ann['bbox'][1]
y2 = ann['bbox'][1] + ann['bbox'][3]
label = ann['category_id']
out.write('{},{},{},{},{},{}\n'.format(image_id, x1, y1, x2, y2, label))
all_ids = set(all_ids)
all_ids_ann = set(all_ids_ann)
no_annotations = list(all_ids - all_ids_ann)
# Output images without any annotations
for image_id in no_annotations:
out.write('{},{},{},{},{},{}\n'.format(image_id, -1, -1, -1, -1, -1))
out.close()
# Sort file by image id
s1 = pd.read_csv(out_file)
s1.sort_values('id', inplace=True)
s1.to_csv(out_file, index=False)
Here is a function I use to convert Coco format to AutoML CSV format for image object detection annotated data:
def convert_coco_json_to_csv(filename,bucket):
import pandas as pd
import json
s = json.load(open(filename, 'r'))
out_file = filename[:-5] + '.csv'
with open(out_file, 'w') as out:
out.write('GCS_FILE_PATH,label,X_MIN,Y_MIN,,,X_MAX,Y_MAX,,\n')
file_names = [f"{bucket}/{image['file_name']}" for image in s['images']]
categories = [cat['name'] for cat in s['categories']]
for label in s['annotations']:
#The COCO bounding box format is [top left x position, top left y position, width, height].
# for AutoML: For example, a bounding box for the entire image is expressed as (0.0,0.0,,,1.0,1.0,,), or (0.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0).
HEIGHT = s['images'][label['image_id']]['height']
WIDTH = s['images'][label['image_id']]['width']
X_MIN = label['bbox'][0]/WIDTH
X_MAX = (label['bbox'][0] + label['bbox'][2]) / WIDTH
Y_MIN = label['bbox'][1] / HEIGHT
Y_MAX = (label['bbox'][1] + label['bbox'][3]) / HEIGHT
out.write(f"{file_names[label['image_id']]},{categories[label['category_id']]},{X_MIN},{Y_MIN},,,{X_MAX},{Y_MAX},,\n")
And simply you can use it by calling the function with the file name and the gs storage where images were uploaded:
convert_coco_json_to_csv("/content/train_annotations.coco.json", "gs://[bucket name]")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With