Missing dataset and checkpoints for activity-anticipation #7

tmanh · 2016-12-03T07:21:44Z

Hi,

Could you please provide the dataset and checkpoint (or, at least the instruction) for the RNN implemetation of activity-anticipation?

I can't find the dataset folders in the repository or instruction in README file.

path_to_dataset = '/scr/ashesh/activity-anticipation/dataset/{0}'.format(fold)
path_to_checkpoints = '/scr/ashesh/activity-anticipation/checkpoints/{0}'.format(fold)

Thank you so much.

pschydlo · 2017-04-03T16:09:50Z

I'm having the same problem, are you using the CAD120 dataset directly? how do you preprocess the dataset? would it be possible to release the prepared dataset for training?

Thank you very much in advance!

tmanh · 2017-04-03T16:59:22Z

Oh, I read his code and found out that he had used the svm binary features from this link http://pr.cs.cornell.edu/humanactivities/data/features.tar

Once you successfully downloaded it, you need to edit and run readData.py for converting svm binary features to node and edge features (which are described in his paper). After that, you can run his model :)

pschydlo · 2017-04-03T17:05:14Z

Thank you so much for the tip! :)

The code does not make it easy to figure out what is happening at times, some comments would be really useful! Nevertheless it's already nice to have the source code to replicate the experiments!

Now I have the following directory structure:

/features_cad120_ground_truth_segmentation

/features_binary_svm_format
/segments_svm_format

Am I right to assume that this directory:
/scail/scratch/group/cvgl/ashesh/activity-anticipation/features_ground_truth'

Corresponds to the main directory?
/features_cad120_ground_truth_segmentation

or?
/features_cad120_ground_truth_segmentation/features_binary_svm_format

The author also mentions this cryptic directory:
'/scail/scratch/group/cvgl/ashesh/activity-anticipation/activityids_fold{0}.txt'

Have you figured out where I can find or how to create this file?

Thank you for the help!

EDIT: Managed to generate the pik files, in case there is anyone else with the same problem:

Follow Ahn's link, download the features, extract them into the folders features_binary_svm_format and segments_svm_format,
Substitute the file names in the readData file: where it says s='' just plug in s= path to the feature folder/features_binary_svm
The fold files are just new line separated lists of the activities divided in N sets. You can easily generate them executing this command: "ls | tr -d '.txt' | split -l 32 - fold" in the /segments_svm_format folder, this command just lists all the activities and stores the list in 4 files (32 activities in each file)

Good luck!

pschydlo · 2017-04-10T14:13:36Z

After reading through the readData code it's really hard to decipher what the feature arrays represent, have you managed to figure it out?

For example, X_tr_human_disjoint is an array with dimensions 25x93x790 where 790 is the dimension of the feature vector, do you know what the other two dimensions are?

The same with X_tr_objects_disjoint whose dimensions are 25x226x620 where 620 stands for the object feature vector.

In the human feature structure the 25 as far as I understood stands for the maximum number of segments and 93 for the training examples (segments) set size, but this is not coherent with the dimensions of the object structure, what does the 226 stand for?

Thanks in advance for your time and attention!

EDIT: In case anyone has the same question, the mysterious 226 is a dimension that represents the concatenation of the objects, to avoid having a variable sized frame the author just concatenates the objects along the dimension of the activity. The 93 corresponds to the activities and 226 to the sum of the objects along all activities, the average number of objects in every activity is 2.43, 2.43*93 = 226 = Total number of objects ever seen along all segments (not distinct!)

Since the author never stores which object corresponds to which activity I now wonder how the author is able to reconstruct the original structure in the end?

tmanh · 2017-04-11T03:40:22Z

Dimensions of the feature is T x N x D
where T is number of time steps (segment), N is number of training (testing) samples and D is the dimension of the feature vector
Why the number of training samples of objects and human are different?

Actually, CAD-120 is human-object interaction dataset (in one activity, we can have 2-3 interacted objects). CAD-120 dataset is not only used for activity recognition, but the authors also used it for object affordances recognition. Briefly, affordance is the possibility of an action on an object or environment. It means one object at one time have one affordance. Because X_tr_objects_disjoint is used for both object affordance detection and anticipation, it has much more training examples than X_tr_human_disjoint which is only used for human activity labelling.
So, you just misunderstood the purpose of X_tr_objects_disjoint.

How the author is able to reconstruct the original structure in the end?

I think you have not fully understood the paper of S-RNN. He did mentioned about sharing parameter mechanism in his paper. He trained human activity recognition and affordance recognition at the same time.

loss_layer_1 = self.train_layer_1(X_shared_1_minibatch,X_1_minibatch,Y_1_minibatch)
loss_layer_2 = self.train_layer_2(X_shared_2_minibatch,X_2_minibatch,Y_2_minibatch)

And, Human-Object relation feature (for human activity recognition) and Object-Human relation (for object affordance recognition) are fed into the same RNN node (shared_layers), while X_tr_human_disjoint is fed into layer_1 and X_tr_objects_disjoint is fed into (layer_2).

self.X = shared_layers[0].input
self.X_1 = layer_1[0].input
self.X_2 = layer_2[0].input

You can find the above code in sharedRNN file. To see how he used sharedRNN use can read this code in activity-rnn-full-model.

shared_input_layer = TemporalInputFeatures(inputJointFeatures)
shared_hidden_layer = LSTM('tanh', 'sigmoid', lstm_init, 4, 128, rng=rng)

shared_layers = [shared_input_layer, shared_hidden_layer]
human_layers = [ConcatenateFeatures(inputHumanFeatures), LSTM('tanh', 'sigmoid', lstm_init, 4, 256, rng=rng),

softmax(num_sub_activities, softmax_init, rng=rng)]
object_layers = [ConcatenateFeatures(inputObjectFeatures), LSTM('tanh', 'sigmoid', lstm_init, 4, 256, rng=rng),
softmax(num_affordances, softmax_init, rng=rng)]

trY_1 = T.lmatrix()
trY_2 = T.lmatrix()
sharedrnn = SharedRNN(shared_layers, human_layers, object_layers, softmax_loss, trY_1, trY_2, 1e-3)

Good luck!!!

tmanh closed this as completed Apr 11, 2017

tmanh reopened this Apr 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing dataset and checkpoints for activity-anticipation #7

Missing dataset and checkpoints for activity-anticipation #7

tmanh commented Dec 3, 2016

pschydlo commented Apr 3, 2017

tmanh commented Apr 3, 2017 •

edited

Loading

pschydlo commented Apr 3, 2017 •

edited

Loading

pschydlo commented Apr 10, 2017 •

edited

Loading

tmanh commented Apr 11, 2017 •

edited

Loading

Missing dataset and checkpoints for activity-anticipation #7

Missing dataset and checkpoints for activity-anticipation #7

Comments

tmanh commented Dec 3, 2016

pschydlo commented Apr 3, 2017

tmanh commented Apr 3, 2017 • edited Loading

pschydlo commented Apr 3, 2017 • edited Loading

pschydlo commented Apr 10, 2017 • edited Loading

tmanh commented Apr 11, 2017 • edited Loading

tmanh commented Apr 3, 2017 •

edited

Loading

pschydlo commented Apr 3, 2017 •

edited

Loading

pschydlo commented Apr 10, 2017 •

edited

Loading

tmanh commented Apr 11, 2017 •

edited

Loading