I plan to implement a CNN that can estimate depth from single images by using NYU depth v2 dataset. Going through the tutorial has shown me that it is easy to implement a CNN which deals with a classification problem on Caffe. I'm curious that if Caffe is suited for a task that involves multidimensional ground truths (e.g. depth image) and regression (depth estimation).
What I want to achieve is to use depth images as ground truths to train a CNN that can estimate depth images. I need to load labels as single channel image data.
I could only find this answer by Shelhamer that is related to my problem https://groups.google.com/d/msg/caffe-users/JXmZrz4cCMU/mBTU1__ohg4J
I understand that I should define two top layers, one for input and the other for depth data as ground truth. Then I can use a loss layer (like EucledianLoss) to calculate loss. I've added a model below.
Is this model going to work as intended? If not, is there any other way to do it on Caffe?
layer {
name: "data"
type: "ImageData"
top: "data"
image_data_param {
source: "input_set.txt"
batch_size: 50
}
}
layer {
name: "label"
type: "ImageData"
top: "label"
image_data_param {
source: "depth_set.txt"
batch_size: 50
}
is_color: false
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "some_output_layer_name"
bottom: "label"
top: "loss"
}
Yes, the above model should work the way you expected it to work. Just make sure that the dimensions of some_output_layer_name blob is same as that of label blob.
Apparently my above model was correct way to start but has some problems. If you have labels as images you can use ImageData layer provided in caffe. ImageData has 2 top layers, first is the actual image data and the second one is its "label" which is a number (used for simple classification problems). In your source file you can provide path to your label data and put some arbitrary "label"s and just ignore these. ignored1 and ignored2 below corresponds to these ignored labels.
layer {
name: "data"
type: "ImageData"
top: "data"
top: "ignored1"
image_data_param {
source: "path/to/data/data.txt"
batch_size: 32
new_height: 228
new_width: 304
}
}
# Label data
layer {
name: "depth"
type: "ImageData"
top: "depth"
top: "ignored2"
image_data_param {
is_color: false
source: "path/to/data/labels.txt"
batch_size: 32
new_height: 55
new_width: 74
}
}
data.txt sample:
/path/to/your/data/1.png 0
/path/to/your/data/2.png 0
/path/to/your/data/3.png 0
...
labels.txt sample:
/path/to/your/labels/1.png 0
/path/to/your/labels/2.png 0
/path/to/your/labels/3.png 0
...
Alternatively you can write your own Python layer to read your image and label data. Here's an example layer to read NYUDv2 data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With