uint8 action bool init --- float32[] state float32 reward bool done