This project is inspired by Jacques Mattheij's blog entry: Sorting 2 Metric Tons of Lego who uses a Deep Learning approach to sort Lego bricks. Our first goal is to improve the software side, especially creating a generic dataset for the neural network training. Due to the high number of bricks (10000+) and the related similarities, we plan a meaningful combination of categories and single bricks as classification task.
We would be happy if you would contribute to this project! We could then provide all necessary resource files, e.g. real test images or trained models.
- Dataset Generation
- Render a Single Brick
- Data Analysis (Brick Similarity) and Preprocessing
- Render Images
- Classification
- Preliminaries
- Training
- Evaluation
- Inference
- Requirements / Installation
In order to generate images from a single 3d model we use Blender's Python interface and the ImportLDraw module. All 3d models come from a collection of LDraw™ an open standard for LEGO CAD programs that allow the user to create virtual LEGO models. Make sure you have successfully installed the module and set the ldraw path correctly.
Control of the blender script as stand-alone:
blender -b -P dataset/blender/render.py -- -i 9.dat --save ./ --images_per_brick 1 --config thumbnail-example.json
Hint for Mac users to access blender: /Applications/Blender/blender.app/Contents/MacOS/blender -b -P ...
To view the data for the first time, we generate thumbnails for each 3d file with fixed lightning conditions, background, camera settings, etc.. Furthermore, we extract some metadata such as category and label. The category distribution of all available bricks is shown below.
To generate thumbnails i.e. for a the category 'Technic' run:
python dataset/generate_thumbnails.py --dataset ./ldraw/parts/ --category Technic --output ./technic
It is known that some parts have both different identifiers and are very similar to each other. For instance, a part differs only by small imprint on the front. The question is how to deal with these two cases. For the first one it is essential to identify these visual equal parts that only differ in the part number. The second case can be ignored if all parts are to be distinguished. However, it can be useful to consider bricks with the same shape as one class, which reduces the number of classes and the associated classification complexity.
Based on the generated thumbnails, a brick similarity can be calculated by comparing two images using the structural similarity image metric (SSIM). By setting thresholds, we can identify identical parts as well as shape-similar parts. In future, we plan further experiments and improvements e.g. by measuring the shape similarity.
The 10 most similar bricks with the part id '6549' (included itself) of a subset of all 'Technic' parts:
Similarities to all other parts:When the script is finished with creating thumbnails, a final CSV is written that contains lists of IDs for identical parts and shape-like parts.
id | label | category | thumbnail_rendered | identical | shape_identical |
---|---|---|---|---|---|
6549 | Technic 4 Speed Shift Lever | Technic | True | 6549 | 2699 3705c01 3737c01 6549 73485 |
2698c01 | Technic Action Figure (Complete) | Technic | True | 2698c01 | 13670 24316 2698c01 3705 370526 4211815 4519 6587 87083 99008 u9011 |
... | ... | ... | ... | ... | ... |
- render images 2000 for each id
- image size: 224x224x3
- brick augmentation:
- 38 official brick colors
- rotation: in x direction due to no standardized origin: 0, 90, 180, 270
- scaling: normalized one dimension and size variation via zooming factor
[ ] surface texture and reflexion
- setting background:
- random images from IndoorCVPR09 dataset vs. noise
- camera position: random on the surface on the upper half of a sphere
- exposure: random spot on the sphere surface, radius and energy fixed
[ ] varying exposure
Script to render all images:
python dataset/generate_dataset.py
TODO: Parameter via argparse
In a first experiment, we focus on a small subset in order to test the capability of the synthetically generated training material. For this reason we manually selected 15 classes and created a real-world test set. At this point, the knowledge of the identical and form-identical parts is irrelevant and will be considered searately in future.
- Common fastai image classification workflow as baseline
- Fine-tuning ResNeXt50 (and ResNet34), 5 epochs, fit_one_cycle 1e-3, batch size 42 (128)
Evaluate on real-world testset!
- created for 15 bricks. 20 images for each
- crop and resize images manually
- train object detection model with squared bounding boxes
Results not satisfactory to some extent! The Difference between the synthetic generated training set and real-world validation set is to high!
Possible Reasons (further investigation needed):
- brick size differ
- lighting conditions differ
- resolution differ
- background is relevant for generalization
- viewing angle differ
- reduce generalization gap by improving rendering process
- multi-instance training i.e. stack three images to prevent confusion in classification
- object detection model for preprocessing: crop squared images from video stream
- inference script for one-camera setting (object detection, ...) and multi-camera setting
- Blender 2.79 + ImportLDraw
conda env create -f environment.yml