On 'v' action rewards #50

aletd · 2020-02-28T12:50:28Z

Hello dear mr. Yuhang Song,

In the paper, it is mentioned that the rewards for action v are given by

And the parameters θv are optimized according to the rule:

In the code,

DHP/envs.py

Lines 488 to 502 in 73ddec2

    
           '''get direction reward and ground-truth v from data_base in last state''' 
        
           last_prob, distance_per_data = suppor_lib.get_prob( 
        
               lon=self.last_lon, 
        
               lat=self.last_lat, 
        
               theta=action * 45.0, 
        
               subjects=self.subjects, 
        
               subjects_total=self.subjects_total, 
        
               cur_data=self.last_data, 
        
           ) 
        
           '''rescale''' 
        
           distance_per_step = distance_per_data * self.data_per_step 
        
           '''convert v to degree''' 
        
           degree_per_step = distance_per_step / math.pi * 180.0 
        
           '''set v_lable''' 
        
           v_lable = degree_per_step

there seems to be no reward for v calculated, instead v_lable is estimated as a "weighted" target value (sum of subject_i_v * similarity),

DHP/suppor_lib.py

Lines 154 to 159 in 73ddec2

    
           distance_per_data = 0.0 
        
           for i in range(subjects_total): 
        
               if config.if_normalize_v_lable: 
        
                   distance_per_data += prob_dic_normalized[i] * subjects[i].data_frame[cur_data].v 
        
               else: 
        
                   distance_per_data += prob_dic[i] * subjects[i].data_frame[cur_data].v

which then contributes another term (v-v_lable)^2 in the loss function:

DHP/a3c.py

Lines 238 to 239 in 73ddec2

    
           # v loss 
        
           v_loss = 0.5 * tf.reduce_sum(tf.square(pi.v - self.v_lable))

Is there any particular reason why the direct sum of rewards is not calculated, and instead the above approach is considered?

aletd · 2020-03-12T15:36:40Z

Bump!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On 'v' action rewards #50

On 'v' action rewards #50

aletd commented Feb 28, 2020

aletd commented Mar 12, 2020

On 'v' action rewards #50

On 'v' action rewards #50

Comments

aletd commented Feb 28, 2020

aletd commented Mar 12, 2020