Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing dataset and checkpoints for activity-anticipation #7

Open
tmanh opened this issue Dec 3, 2016 · 5 comments
Open

Missing dataset and checkpoints for activity-anticipation #7

tmanh opened this issue Dec 3, 2016 · 5 comments

Comments

@tmanh
Copy link

tmanh commented Dec 3, 2016

Hi,

Could you please provide the dataset and checkpoint (or, at least the instruction) for the RNN implemetation of activity-anticipation?

I can't find the dataset folders in the repository or instruction in README file.

path_to_dataset = '/scr/ashesh/activity-anticipation/dataset/{0}'.format(fold)
path_to_checkpoints = '/scr/ashesh/activity-anticipation/checkpoints/{0}'.format(fold)

Thank you so much.

@pschydlo
Copy link

pschydlo commented Apr 3, 2017

I'm having the same problem, are you using the CAD120 dataset directly? how do you preprocess the dataset? would it be possible to release the prepared dataset for training?

Thank you very much in advance!

@tmanh
Copy link
Author

tmanh commented Apr 3, 2017

Oh, I read his code and found out that he had used the svm binary features from this link http://pr.cs.cornell.edu/humanactivities/data/features.tar

Once you successfully downloaded it, you need to edit and run readData.py for converting svm binary features to node and edge features (which are described in his paper). After that, you can run his model :)

@pschydlo
Copy link

pschydlo commented Apr 3, 2017

Thank you so much for the tip! :)

The code does not make it easy to figure out what is happening at times, some comments would be really useful! Nevertheless it's already nice to have the source code to replicate the experiments!

Now I have the following directory structure:

/features_cad120_ground_truth_segmentation

  • /features_binary_svm_format
  • /segments_svm_format

Am I right to assume that this directory:
/scail/scratch/group/cvgl/ashesh/activity-anticipation/features_ground_truth'

Corresponds to the main directory?
/features_cad120_ground_truth_segmentation

or?
/features_cad120_ground_truth_segmentation/features_binary_svm_format

The author also mentions this cryptic directory:
'/scail/scratch/group/cvgl/ashesh/activity-anticipation/activityids_fold{0}.txt'

Have you figured out where I can find or how to create this file?

Thank you for the help!

EDIT: Managed to generate the pik files, in case there is anyone else with the same problem:

  • Follow Ahn's link, download the features, extract them into the folders features_binary_svm_format and segments_svm_format,
  • Substitute the file names in the readData file: where it says s='' just plug in s= path to the feature folder/features_binary_svm
  • The fold files are just new line separated lists of the activities divided in N sets. You can easily generate them executing this command: "ls | tr -d '.txt' | split -l 32 - fold" in the /segments_svm_format folder, this command just lists all the activities and stores the list in 4 files (32 activities in each file)

Good luck!

@pschydlo
Copy link

pschydlo commented Apr 10, 2017

After reading through the readData code it's really hard to decipher what the feature arrays represent, have you managed to figure it out?

For example, X_tr_human_disjoint is an array with dimensions 25x93x790 where 790 is the dimension of the feature vector, do you know what the other two dimensions are?

The same with X_tr_objects_disjoint whose dimensions are 25x226x620 where 620 stands for the object feature vector.

In the human feature structure the 25 as far as I understood stands for the maximum number of segments and 93 for the training examples (segments) set size, but this is not coherent with the dimensions of the object structure, what does the 226 stand for?

Thanks in advance for your time and attention!

EDIT: In case anyone has the same question, the mysterious 226 is a dimension that represents the concatenation of the objects, to avoid having a variable sized frame the author just concatenates the objects along the dimension of the activity. The 93 corresponds to the activities and 226 to the sum of the objects along all activities, the average number of objects in every activity is 2.43, 2.43*93 = 226 = Total number of objects ever seen along all segments (not distinct!)

Since the author never stores which object corresponds to which activity I now wonder how the author is able to reconstruct the original structure in the end?

@tmanh
Copy link
Author

tmanh commented Apr 11, 2017

  1. Dimensions of the feature is T x N x D
    where T is number of time steps (segment), N is number of training (testing) samples and D is the dimension of the feature vector

  2. Why the number of training samples of objects and human are different?

  • Actually, CAD-120 is human-object interaction dataset (in one activity, we can have 2-3 interacted objects). CAD-120 dataset is not only used for activity recognition, but the authors also used it for object affordances recognition. Briefly, affordance is the possibility of an action on an object or environment. It means one object at one time have one affordance. Because X_tr_objects_disjoint is used for both object affordance detection and anticipation, it has much more training examples than X_tr_human_disjoint which is only used for human activity labelling.
  • So, you just misunderstood the purpose of X_tr_objects_disjoint.
  1. How the author is able to reconstruct the original structure in the end?
  • I think you have not fully understood the paper of S-RNN. He did mentioned about sharing parameter mechanism in his paper. He trained human activity recognition and affordance recognition at the same time.
loss_layer_1 = self.train_layer_1(X_shared_1_minibatch,X_1_minibatch,Y_1_minibatch)
loss_layer_2 = self.train_layer_2(X_shared_2_minibatch,X_2_minibatch,Y_2_minibatch)
  • And, Human-Object relation feature (for human activity recognition) and Object-Human relation (for object affordance recognition) are fed into the same RNN node (shared_layers), while X_tr_human_disjoint is fed into layer_1 and X_tr_objects_disjoint is fed into (layer_2).
self.X = shared_layers[0].input
self.X_1 = layer_1[0].input
self.X_2 = layer_2[0].input
  • You can find the above code in sharedRNN file. To see how he used sharedRNN use can read this code in activity-rnn-full-model.
shared_input_layer = TemporalInputFeatures(inputJointFeatures)
shared_hidden_layer = LSTM('tanh', 'sigmoid', lstm_init, 4, 128, rng=rng)

shared_layers = [shared_input_layer, shared_hidden_layer]
human_layers = [ConcatenateFeatures(inputHumanFeatures), LSTM('tanh', 'sigmoid', lstm_init, 4, 256, rng=rng),

softmax(num_sub_activities, softmax_init, rng=rng)]
object_layers = [ConcatenateFeatures(inputObjectFeatures), LSTM('tanh', 'sigmoid', lstm_init, 4, 256, rng=rng),
softmax(num_affordances, softmax_init, rng=rng)]

trY_1 = T.lmatrix()
trY_2 = T.lmatrix()
sharedrnn = SharedRNN(shared_layers, human_layers, object_layers, softmax_loss, trY_1, trY_2, 1e-3)

Good luck!!!

@tmanh tmanh closed this as completed Apr 11, 2017
@tmanh tmanh reopened this Apr 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants