PolicyDist

Policy distribution interface.

*batch_shape depends on the input of the algorithm you are using.

Module: aine_drl.policy_dist

class PolicyDist(ABC)
  

Methods

Sample actions from the policy distribution.

@abstractmethod
def sample(self, reparam_trick: bool = False) -> Action
  

Parameters:

Name	Description
reparam_trick (`bool`)	(default = `False`) Whether to use reparameterization trick.

Returns:

Name	Shape
action (`Action`)	Action shape depends on the constructor arguments

Returns the log of the probability mass/density function according to the Action.

@abstractmethod
def log_prob(self, action: Action) -> torch.Tensor
  

Returns:

Name	Shape
log_prob (`Tensor`)	`(*batch_shape, num_branches)`

Returns the joint log of the probability mass/density function according to the action.

def joint_log_prob(self, action: Action) -> torch.Tensor
  

Returns:

Name	Shape
joint_log_prob (`Tensor`)	`(*batch_shape, 1)`

Returns the entropy of the policy distribution.

@abstractmethod
def entropy(self) -> torch.Tensor
  

Returns:

Name	Shape
entropy (`Tensor`)	`(*batch_shape, num_branches)`

Returns the joint entropy of the policy distribution.

def joint_entropy(self) -> torch.Tensor
  

Returns:

Name	Shape
joint_entropy (`Tensor`)	`(*batch_shape, 1)`