Python API¶

`machine_common_sense.create_controller`([...])	Creates and returns a new MCS Controller object.
`machine_common_sense.load_scene_json_file`(...)	Loads the given JSON scene config file and returns its data.
`machine_common_sense.Action`(value)	The actions available in the MCS simulation environment.
`machine_common_sense.Controller`(...)	MCS Controller class implementation for the MCS wrapper of the AI2-THOR library.
`machine_common_sense.GoalCategory`(value)	Each goal dict will have a "category" string that describes the type of scene (or, the type of task within the scene) being run.
`machine_common_sense.GoalMetadata`([...])	Defines metadata for a goal in the MCS 3D environment.
`machine_common_sense.Material`(value)	Possible materials of objects.
`machine_common_sense.ObjectMetadata`([uuid, ...])	Defines metadata for an object in the MCS 3D environment.
`machine_common_sense.ReturnStatus`(value)	An enumeration.
`machine_common_sense.Reward`()	Reward utility class
`machine_common_sense.SerializerMsgPack`()	Serializer to (de)serialize StepMetadata into/from MsgPack format.
`machine_common_sense.StepMetadata`([...])	Defines output metadata from an action step in the MCS 3D environment.
`machine_common_sense.Stringifier`()	Defines functions to turn objects into strings for debugging and human readable output.

class machine_common_sense.Action(value)[source]¶

The actions available in the MCS simulation environment.

For actions requiring objectImageCoords or receptacleObjectImageCoords, note that (0,0) represents the top left corner of the viewport, and that inputs must be greater than (0,0).

CLOSE_OBJECT = 'CloseObject'¶

Close a nearby object.

Parameters:

objectId (string, optional) – The “uuid” of the target object. Required unless the “objectImageCoords” properties are given.
objectImageCoordsX (float, optional) – The X of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)
objectImageCoordsY (float, optional) – The Y of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)
amount (float) – The amount to close the object between 0 (completely opened) and 1 (completely closed). Default: 1

Returns:

“SUCCESSFUL” – Action successful.
”IS_CLOSED_COMPLETELY” – If the object is completely closed.
”NOT_INTERACTABLE” – If the object corresponding to the “objectImageCoords” vector is not an interactable object. This includes structural objects like the room’s walls.
”NOT_OBJECT” – If the object corresponding to the “objectId” (or object corresponding to the “objectImageCoords” vector) is not an object.
”NOT_OPENABLE” – If the object itself cannot be closed.
”NOT_RECEPTACLE” – If the object corresponding to the “objectImageCoords” vector is not a receptacle object.
”NOT_VISIBLE” – If the object corresponding to the “objectId” is not in the viewport.
”OUT_OF_REACH” – If you cannot close the object because you are out of reach.
”FAILED” – Unexpected error; please report immediately to development team.

DROP_OBJECT = 'DropObject'¶

Drop an object you are holding.

Parameters:

objectId (string, optional) – The “uuid” of the held object. Defaults to the first held object.

Returns:

“SUCCESSFUL” – Action successful.
”NOT_HELD” – If you cannot put down the object corresponding to the “objectId” because you are not holding it.
”NOT_OBJECT” – If the object corresponding to the “objectId” is not an object.
”FAILED” – Unexpected error; please report immediately to development team.

END_HABITUATION = 'EndHabituation'¶

Ends a habituation trial for the scene by blanking the screen for one action (and teleporting the agent if needed). Sometimes needed depending on the task type.

Note that we currently plan to use the starting position/rotation as teleport parameters here for applicable cases. We cannot currently guarantee that using a position intersecting another object or outside the room won’t cause issues or errors.

Returns:

“SUCCESSFUL” – Action successful.
”FAILED” – Unexpected error; please report immediately to development team.

END_SCENE = 'EndScene'¶

Call end_scene now there is no actions available. Does nothing.

Returns:

“SUCCESSFUL” – Action successful.
”FAILED” – Unexpected error; please report immediately to development team.

INITIALIZE = 'Initialize'¶: Initialize the scene. Intended only for internal use.

INTERACT_WITH_AGENT = 'InteractWithAgent'¶

Interact with an agent. If that agent has an object, it will hold out the object for you to pickup; otherwise, the agent will look sad. If the agent was pointing at an object, the agent will resume pointing afterward.

Parameters:

objectId (string, optional) – The “uuid” of the target object. Required unless the “objectImageCoords” properties are given.
objectImageCoordsX (float, optional) – The X of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)
objectImageCoordsY (float, optional) – The Y of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)

Returns:

“SUCCESSFUL” – Action successful.
”NOT_INTERACTABLE” – If the object corresponding to the “objectImageCoords” vector is not an interactable object.
”NOT_OBJECT” – If the object corresponding to the “objectId” (or object corresponding to the “objectImageCoords” vector) is not an object.
”NOT_AGENT” – If the object being interacted with is not a simulation agent
”AGENT_CURRENTLY_INTERACTING_WTIH_PERFORMER” – If the object being interacted with is a simulation agent already interacting with the performer.
”AGENT_IS_BUSY” – If the object being interacted with is a simulation agent that is currently rotating to face an object or beginning its point animation.
”NOT_VISIBLE” – If the object corresponding to the “objectId” is not in the viewport.
”OUT_OF_REACH” – If you cannot move the object because you are out of reach.
”FAILED” – Unexpected error; please report immediately to development team.

LOOK_DOWN = 'LookDown'¶

Rotate your viewport down (add 10 degrees to head tilt).

Returns:

“SUCCESSFUL” – Action successful.
”CANNOT_ROTATE” – Failed because you cannot look down/up more than +/- 90 degrees.
”FAILED” – Unexpected error; please report immediately to development team.

LOOK_UP = 'LookUp'¶

Rotate your view up (subtract 10 degrees from head tilt).

Returns:

“SUCCESSFUL” – Action successful.
”CANNOT_ROTATE” – Failed because you cannot look down/up more than +/- 90 degrees.
”FAILED” – Unexpected error; please report immediately to development team.

MOVE_AHEAD = 'MoveAhead'¶

Move yourself forward based on your current viewport.

Returns:

“SUCCESSFUL” – Action successful.
”OBSTRUCTED” – If you cannot move forward because your path is obstructed.
”FAILED” – Unexpected error; please report immediately to development team.

MOVE_BACK = 'MoveBack'¶

Move yourself backward based on your current viewport.

Returns:

“SUCCESSFUL” – Action successful.
”OBSTRUCTED” – If you cannot move backward because your path is obstructed.
”FAILED” – Unexpected error; please report immediately to development team.

MOVE_LEFT = 'MoveLeft'¶

Move yourself left based on your current viewport.

Returns:

“SUCCESSFUL” – Action successful.
”OBSTRUCTED” – If you cannot move left because your path is obstructed.
”FAILED” – Unexpected error; please report immediately to development team.

MOVE_OBJECT = 'MoveObject'¶

Apply a movement of 0.1 meters units to a nearby object. If the object would come into contact with another object, and the other object is small and moveable, the other object will also move 0.1 meters in the same direction. This movement does not attempt to simulate realistic physics in regard to collisions with other object(s). If you wish to simulate realistic physical movement, please use the PullObject and PushObject actions instead, which apply a force using the environment’s physics simulation engine.

Parameters:

objectId (string, optional) – The “uuid” of the target object. Required unless the “objectImageCoords” properties are given.
objectImageCoordsX (float, optional) – The X of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)
objectImageCoordsY (float, optional) – The Y of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)
lateral (int) – The x axis direction of movement on the object relative to the agent. Can be -1, 0, 1. If only lateral is given, straight will default to 0 Default: 0
straight (int) – The x axis direction of movement on the object relative to the agent. Can be -1, 0, 1. If only straight is given, lateral will default to 0 Default: 1

Returns:

“SUCCESSFUL” – Action successful.
”NOT_INTERACTABLE” – If the object corresponding to the “objectImageCoords” vector is not an interactable object.
”NOT_OBJECT” – If the object corresponding to the “objectId” (or object corresponding to the “objectImageCoords” vector) is not an object.
”NOT_MOVEABLE” – If the object itself cannot be moved by a baby.
”NOT_VISIBLE” – If the object corresponding to the “objectId” is not in the viewport.
”OBSTRUCTED” – If you cannot rotate the object because the path of movement is obstructed.
”OUT_OF_REACH” – If you cannot move the object because you are out of reach.
”FAILED” – Unexpected error; please report immediately to development team.

MOVE_RIGHT = 'MoveRight'¶

Move yourself right based on your current viewport.

Returns:

“SUCCESSFUL” – Action successful.
”OBSTRUCTED” – If you cannot move right because your path is obstructed.
”FAILED” – Unexpected error; please report immediately to development team.

OPEN_OBJECT = 'OpenObject'¶

Open a nearby object.

Parameters:

objectId (string, optional) – The “uuid” of the target object. Required unless the “objectImageCoords” properties are given.
objectImageCoordsX (float, optional) – The X of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)
objectImageCoordsY (float, optional) – The Y of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)
amount (float) – The amount to open the object between 0 (completely closed) and 1 (completely opened). Default: 1

Returns:

“SUCCESSFUL” – Action successful.
”IS_OPENED_COMPLETELY” – If the object is completely opened.
”NOT_INTERACTABLE” – If the object corresponding to the “objectImageCoords” vector is not an interactable object. This includes structural objects like the room’s walls.
”NOT_OBJECT” – If the object corresponding to the “objectId” (or object corresponding to the “objectImageCoords” vector) is not an object.
”NOT_OPENABLE” – If the object itself cannot be opened.
”NOT_RECEPTACLE” – If the object corresponding to the “objectImageCoords” vector is not a receptacle object.
”NOT_VISIBLE” – If the object corresponding to the “objectId” is not in the viewport.
”OBSTRUCTED” – If you cannot open the object because you will be in the way of the object when its opened.
”OUT_OF_REACH” – If you cannot open the object because you are out of reach.
”FAILED” – Unexpected error; please report immediately to development team.

PASS = 'Pass'¶

Do nothing.

Returns:

“SUCCESSFUL” – Action successful.
”FAILED” – Unexpected error; please report immediately to development team.

PICKUP_OBJECT = 'PickupObject'¶

Pick up a nearby object and hold it in your hand. This action incorporates reaching out your hand in front of you, opening your fingers, and grabbing the object. You may hold multiple objects simultaneously.

Parameters:

objectId (string, optional) – The “uuid” of the target object. Required unless the “objectImageCoords” properties are given.
objectImageCoordsX (float, optional) – The X of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)
objectImageCoordsY (float, optional) – The Y of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)

Returns:

“SUCCESSFUL” – Action successful.
”NOT_INTERACTABLE” – If the object corresponding to the “objectImageCoords” vector is not an interactable object.
”NOT_OBJECT” – If the object corresponding to the “objectId” (or object corresponding to the “objectImageCoords” vector) is not an object.
”NOT_PICKUPABLE” – If the object itself cannot be picked up.
”NOT_VISIBLE” – If the object corresponding to the “objectId” is not in the viewport.
”OUT_OF_REACH” – If you cannot pick up the object because you are out of reach.
”FAILED” – Unexpected error; please report immediately to development team.

PULL_OBJECT = 'PullObject'¶

Pull a nearby object by applying a physical force directly toward you on the X/Z axis to the center point of the object (note that this means it does not matter where you Pull on the object, since the force is always applied to the center point).

Parameters:

objectId (string, optional) – The “uuid” of the target object. Required unless the “objectImageCoords” properties are given.
objectImageCoordsX (float, optional) – The X of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)
objectImageCoordsY (float, optional) – The Y of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)
force (float) – The amount of force, from 0 to 1, used to move the target object. Default: 0.5

Returns:

“SUCCESSFUL” – Action successful.
”NOT_INTERACTABLE” – If the object corresponding to the “objectImageCoords” vector is not an interactable object.
”NOT_OBJECT” – If the object corresponding to the “objectId” (or object corresponding to the “objectImageCoords” vector) is not an object.
”NOT_MOVEABLE” – If the object itself cannot be moved by a baby.
”NOT_VISIBLE” – If the object corresponding to the “objectId” is not in the viewport.
”OUT_OF_REACH” – If you cannot move the object because you are out of reach.
”FAILED” – Unexpected error; please report immediately to development team.

PUSH_OBJECT = 'PushObject'¶

Push a nearby object by applying a physical force directly away from you on the X/Z axis to the center point of the object (note that this means it does not matter where you Push on the object, since the force is always applied to the center point).

Parameters:

objectId (string, optional) – The “uuid” of the target object. Required unless the “objectImageCoords” properties are given.
objectImageCoordsX (float, optional) – The X of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)
objectImageCoordsY (float, optional) – The Y of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)
force (float) – The amount of force, from 0 to 1, used to move the target object. Default: 0.5

Returns:

“SUCCESSFUL” – Action successful.
”NOT_INTERACTABLE” – If the object corresponding to the “objectImageCoords” vector is not an interactable object.
”NOT_OBJECT” – If the object corresponding to the “objectId” (or object corresponding to the “objectImageCoords” vector) is not an object.
”NOT_MOVEABLE” – If the object itself cannot be moved by a baby.
”NOT_VISIBLE” – If the object corresponding to the “objectId” is not in the viewport.
”OUT_OF_REACH” – If you cannot move the object because you are out of reach.
”FAILED” – Unexpected error; please report immediately to development team.

PUT_OBJECT = 'PutObject'¶

Put down an object you are holding into/onto a nearby receptacle object. A receptacle is an object that can hold other objects, like a block, box, drawer, shelf, or table.

Parameters:

objectId (string, optional) – The “uuid” of the held object. Defaults to the first held object.
receptacleObjectId (string, optional) – The “uuid” of the target receptacle. Required unless the “receptacleObjectImageCoords” properties are given.
receptacleObjectImageCoordsX (float, optional) – The X of a pixel coordinate on the target receptacle based on your current viewport. Can be used in place of the “receptacleObjectId” property. (See note under “Action” header regarding image coordinates.)
receptacleObjectImageCoordsY (float, optional) – The Y of a pixel coordinate on the target receptacle based on your current viewport. Can be used in place of the “receptacleObjectId” property. (See note under “Action” header regarding image coordinates.)

Returns:

“SUCCESSFUL” – Action successful.
”NOT_HELD” – If you cannot put down the object corresponding to the “objectId” because you are not holding it.
”NOT_INTERACTABLE” – If the object corresponding to the “objectImageCoords” or “receptacleObjectImageCoords” vector is not an interactable object. This includes structural objects like the room’s walls.
”NOT_OBJECT” – If the object corresponding to the “objectId” and/or “receptacleObjectId” (or object corresponding to the “receptacleObjectImageCoords” vector) is not an object.
”NOT_RECEPTACLE” – If the object corresponding to the “receptacleObjectId” (or object corresponding to the “receptacleObjectImageCoords” vector) is not a receptacle.
”NOT_VISIBLE” – If the object corresponding to the “objectId” is not in the viewport.
”OBSTRUCTED” – If you cannot put down the object because your path is obstructed.
”OUT_OF_REACH” – If you cannot put down the object because you are out of reach.
”FAILED” – Unexpected error; please report immediately to development team.

ROTATE_LEFT = 'RotateLeft'¶

Rotate your viewport left by 10 degrees.

Returns:

“SUCCESSFUL” – Action successful.
”FAILED” – Unexpected error; please report immediately to development team.

ROTATE_OBJECT = 'RotateObject'¶

Apply a rotation of 5 degrees to a nearby object. Will fail if rotating the object would cause it to collide with another object or the performer agent, returning OBSTRUCTED.

Parameters:

objectId (string, optional) – The “uuid” of the target object. Required unless the “objectImageCoords” properties are given.
objectImageCoordsX (float, optional) – The X of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)
objectImageCoordsY (float, optional) – The Y of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)
clockwise (bool) – If the rotation should be clockwise. Default: True

Returns:

“SUCCESSFUL” – Action successful.
”NOT_INTERACTABLE” – If the object corresponding to the “objectImageCoords” vector is not an interactable object.
”NOT_OBJECT” – If the object corresponding to the “objectId” (or object corresponding to the “objectImageCoords” vector) is not an object.
”NOT_MOVEABLE” – If the object itself cannot be moved by a baby.
”NOT_VISIBLE” – If the object corresponding to the “objectId” is not in the viewport.
”OBSTRUCTED” – If you cannot rotate the object because the path of rotation is obstructed.
”OUT_OF_REACH” – If you cannot move the object because you are out of reach.
”FAILED” – Unexpected error; please report immediately to development team.

ROTATE_RIGHT = 'RotateRight'¶

Rotate your viewport right by 10 degrees.

Returns:

“SUCCESSFUL” – Action successful.
”FAILED” – Unexpected error; please report immediately to development team.

TORQUE_OBJECT = 'TorqueObject'¶

Apply torque to a nearby object.

Parameters:

objectId (string, optional) – The “uuid” of the target object. Required unless the “objectImageCoords” properties are given.
objectImageCoordsX (float, optional) – The X of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)
objectImageCoordsY (float, optional) – The Y of a pixel coordinate on the target object based on your current viewport. Can be used in place of the “objectId” property. (See note under “Action” header regarding image coordinates.)
force (float) – The amount of force, from -1 to 1, used to move the target object. Default: 0.5

Returns:

“SUCCESSFUL” – Action successful.
”NOT_INTERACTABLE” – If the object corresponding to the “objectImageCoords” vector is not an interactable object.
”NOT_OBJECT” – If the object corresponding to the “objectId” (or object corresponding to the “objectImageCoords” vector) is not an object.
”NOT_MOVEABLE” – If the object itself cannot be moved by a baby.
”NOT_VISIBLE” – If the object corresponding to the “objectId” is not in the viewport.
”OUT_OF_REACH” – If you cannot move the object because you are out of reach.
”FAILED” – Unexpected error; please report immediately to development team.

static input_to_action_and_params(input_str: str) → Tuple[source]¶

Transforms the given input string into an action string and parameter dict.

Parameters:

input_value (string) – The input value.

Returns:

string – The action string, or None if the given input had an error transforming the action string.
dict – The parameter dict, or None if the given input had an error transforming parameters.

class machine_common_sense.Controller(unity_app_file_path: str, config: ConfigManager)[source]¶

MCS Controller class implementation for the MCS wrapper of the AI2-THOR library.

https://ai2thor.allenai.org/ithor/documentation/

Parameters:

unity_app_file_path (str) –
config (ConfigManager) –

end_scene(rating: float | None = None, score: float | None = None, report: Dict[int, object] | None = None) → None[source]¶

Ends the current scene. Calling end_scene() before calling start_scene() will do nothing. Calling end_scene() twice with the same scene will throw an exception.

Parameters:

rating (float, optional) – The plausibility rating to classify a passive / VoE scene as either plausible or implausible. Not used for any interactive scenes. For passive agent scenes, this rating should be continuous, from 0.0 (completely implausible) to 1.0 (completely plausible). For other passive scenes, this rating must be binary, either 0 (implausible) or 1 (plausible). End-of-scene ratings are required for all passive / VoE scenes. (default None)
score (float, optional) –
The continuous plausibility score between 0.0 (completely implausible) and 1.0 (completely plausible). End-of-scene scores are required for all passive / VoE scenes except agent scenes. Not used for any interactive scenes or passive agent scenes. (default None)

Note: when an issue causes the program to exit prematurely or end_scene isn’t properly called but history_enabled is true, this value will be written to file as -1.
report (Dict[int, object], optional) –
Variable for retrospective per frame reporting for passive / VoE scenes. Not used for any interactive scenes or passive agent scenes. (default None)

Key is an int representing a step/frame number from output step metadata, starting at 1. Value or payload contains:
- ratingfloat or int, optional
  The plausibility rating to classify a passive / VoE scene as either plausible or implausible. Not used for any interactive scenes. For passive agent scenes, this rating should be continuous, from 0.0 (completely implausible) to 1.0 (completely plausible). For other passive scenes, this rating must be binary, either 0 (implausible) or 1 (plausible). Frame-by-frame ratings are no longer required for any scenes (but end-of-scene ratings are). (default None)
- scorefloat, optional
  The continuous plausibility score between 0.0 (completely implausible) and 1.0 (completely plausible). Frame-by-frame scores are required for all passive / VoE scenes except agent scenes. Not used for any interactive scenes or passive agent scenes. (default None)
- violations_xy_listList[Dict[str, float]], optional
  A list of one or more (x, y) locations (ex: [{“x”: 1, “y”: 3.4}]), each representing a potential violation-of-expectation. These locations are required for all passive / VoE scenes except agent scenes. Not used for any interactive scenes or passive agent scenes. (default None)
- internal_stateobject, optional
  A properly formatted json object representing various kinds of internal states at a particular moment. Examples include the estimated position of the agent, current map of the world, etc. (default None)
Example report:

{

1: {

“rating”: 1,

”score”: 0.75,

”violations_xy_list”: [{“x”: 1,”y”: 1}],

”internal_state”: {“test”: “some state”}

}

}

get_metadata_level() → str[source]¶

Returns the current metadata level set in the config. If none specified, returns ‘default’.

Returns:: A string containing the current metadata level.
Return type:: string

retrieve_object_states(object_id: str) → List[source]¶: Return the state list at the current step for the object with the given ID from the scene configuration data, if any.

start_scene(config_data: SceneConfiguration | Dict) → StepMetadata[source]¶

Starts a new scene using the given scene configuration data dict and returns the scene output data object.

Parameters:: config_data (SceneConfiguration or dict that can be serialized to) – SceneConfiguration The MCS scene configuration data for the scene to start.
Returns:: The output data object from the start of the scene (the output from an “Initialize” action).
Return type:: StepMetadata

step(action: str, **kwargs: str) → StepMetadata | None[source]¶

Runs the given action within the current scene.

Parameters:

action (string) – A selected action string from the list of available actions.
**kwargs – Zero or more key-and-value parameters for the action.

Returns:

The MCS output data object from after the selected action and the physics simulation were run. Returns None if you have passed the “last_step” of this scene.

Return type:

StepMetadata

Raises:

ValueError – If values are outside acceptable ranges or unable to: convert to a number.

stop_simulation() → None[source]¶: Stop the 3D simulation environment. This controller won’t work any more.

class machine_common_sense.GoalCategory(value)[source]¶

Each goal dict will have a “category” string that describes the type of scene (or, the type of task within the scene) being run. Each goal dict will also have a “metadata” dict containing one or more properties depending on the “category”.

AGENTS = 'agents'¶

In a trial that has an Agents goal, you must sit and observe a scene as one or more simulation-controlled agents act in predefined ways within your camera’s viewport, and then decide whether the scene is “expected” or “unexpected”. The camera will always be positioned at an isometric perspective, like you’re standing on an elevated platform looking down at the scene. Each scene will consist of eight sequential habituation trials, depicting expected agent behaviors and separated by EndHabituation actions (each of which generates a black frame image when called), immediately followed by the test trial, depicting either an expected or unexpected agent behavior. All nine of these trials happen within the same “scene”. These trials will demand a “common sense” understanding of agents, their behaviors, and their interactions with objects in the environment.

This goal category is only used for the passive/VoE agent tasks. All interactive agent tasks will use either the retrieval or multi retrieval goal category.

Notes

You are required to call controller.end_scene() at the end of each scene with a continuous plausibility rating, from 0.0 (completely implausible) to 1.0 (completely plausible). You are not required to also pass it a score.

IMITATION = 'imitation'¶

In a trial that has an imitation goal, you must imitate the actions of another agent in the scene to find and pickup a target object. Executing the same actions, on the same objects, in the same order, is of critical importance; if you do not imitate the actions correctly, you will be forced to end the scene (by calling end_scene, or using the END_SCENE action), without achieving the reward. In MCS Evaluation 4 and onward, the target object will always be a soccer ball (football), and, in MCS Evaluation 6, the imitated actions will always be opening containers of various colors and shapes (using the normal OpenObject action).

Notes

At oracle metadata level, the metadata dict property of this GoalMetadata object will contain a target property, which is a dict containing the following parameters:

Parameters:: id (string) – The unique objectId of the target object to retrieve.

INTUITIVE_PHYSICS = 'intuitive physics'¶

In a trial that has an Intuitive Physics goal, you must sit and observe a scene as objects move across your camera’s viewport, and then decide whether the scene is “plausible” or “implausible”. These trials will demand a “common sense” understanding of basic (“intuitive”) physics, like object permanence or shape constancy. Inspired by Emmanuel Dupoux’s “IntPhys: A Benchmark for Visual Intuitive Physics Reasoning” (http://intphys.com).

Notes

You are required to call controller.end_scene() at the end of each scene with a binary plausibility rating – either 0 (implausible) or 1 (plausible) – and a continuous plausibility score – from 0.0 (completely implausible) to 1.0 (completely plausible). This is also where you would submit any retrospective reporting on a per step basis via report.

MULTI_RETRIEVAL = 'multi retrieval'¶

In a trial that has a multi retrieval goal, you must find and pickup one or more target objects. In MCS Evaluation 4 and onward, the target object will always be a soccer ball (football).

This may involve exploring the scene, avoiding obstacles, interacting with objects (like closed containers) or agents, and tracking moving objects. These trials will demand a “common sense” understanding of self navigation (how to move and rotate yourself within a scene and around obstacles), object interaction (how objects work, including opening containers), the basic physics of movement (kinematics, gravity, friction, etc.), and agency (identifying people and using them to achieve a goal).

Notes

At oracle metadata level, the metadata dict property of this GoalMetadata object will contain a targets property, which is a list of dicts that each contain the following parameters:

Parameters:: id (string) – The unique objectId of one of the target objects to retrieve.

PASSIVE = 'passive'¶

In a trial that has a Passive goal, you must sit and observe a scene as action unfolds in your camera’s viewport, and then decide whether the scene is “plausible” or “implausible”. These trials will demand a “common sense” understanding of places, objects, or agency. This goal category covers all passive scenes that do not fall under the “agents” or “intuitive physics” categories.

Notes

You are required to call controller.end_scene() at the end of each scene with a binary plausibility rating – either 0 (implausible) or 1 (plausible) – and a continuous plausibility score – from 0.0 (completely implausible) to 1.0 (completely plausible). This is also where you would submit any retrospective reporting on a per step basis via report.

RETRIEVAL = 'retrieval'¶

In a trial that has a retrieval goal, you must find and pickup a target object. In MCS Evaluation 4 and onward, the target object will always be a soccer ball (football).

This may involve exploring the scene, avoiding obstacles, interacting with objects (like closed containers) or agents, and tracking moving objects. These trials will demand a “common sense” understanding of self navigation (how to move and rotate yourself within a scene and around obstacles), object interaction (how objects work, including opening containers), the basic physics of movement (kinematics, gravity, friction, etc.), and agency (identifying people and using them to achieve a goal).

Notes

At oracle metadata level, the metadata dict property of this GoalMetadata object will contain a target property, which is a dict containing the following parameters:

Parameters:: id (string) – The unique objectId of the target object to retrieve.

class machine_common_sense.GoalMetadata(action_list=None, category='', description='', habituation_total=0, last_preview_phase_step=0, last_step=None, metadata=None, steps_allowed_in_lava=0, triggered_by_target_sequence=None)[source]¶

Defines metadata for a goal in the MCS 3D environment.

Variables:

action_list (list of lists of (string, dict) tuples, or None) –
The list of all actions that are available for the scene at each step (outer list). Each inner list is the list of all actions that are available for the single step corresponding to the inner list’s index within the outer list. Each action is returned as a tuple containing the action string and the action’s restricted paramters, if any.

For example: (“Pass”, {}) forces a Pass action; (“PickupObject”, {}) forces a PickupObject action with any parameters; and (“PickupObject”, {“objectId”: “a”}) forces a PickupObject action with the specific parameters objectId=a.

An action_list of None means that all actions are always available. An empty inner list means that all actions will be available on that specific step.

See StepMetadata.action_list for the available actions of the current step. May be a subset of all possible actions. See Action.
category (string) – The category that describes this goal and the properties in its metadata. See Goal.
description (string) –
A human-readable sentence describing this goal and containing the target task(s) and object(s).

Sizes: - tiny: near the size of a baseball - small: near the size of a baby - medium: near the size of a child - large: near the size of an adult - huge: near the size of a sofa

Weights: - light: can be held by a baby - heavy: cannot be held by a baby, but can be pushed or pulled - massive: cannot be moved by a baby

Colors: black, blue, brown, green, grey, orange, purple, red, white, yellow

Materials: See Material.
habituation_total (int) – The total count of habituation trials that will be in this scene.
last_preview_phase_step (integer) – The last step of the Preview Phase of this scene, if a Preview Phase is scripted in the scene configuration. Each step of a Preview Phase normally has a single specific action defined in this goal’s action_list property for the performer to choose, like [‘Pass’]. Default: 0 (no Preview Phase)
last_step (integer) – The last step of this scene. This scene will automatically end following this step.
metadata (dict) – The metadata specific to this goal. See Goal.
steps_allowed_in_lava (integer) – The number of steps allowed in lava before the scene ends
triggered_by_target_sequence (List of strings) – The target sequence the performer must open containers to trigger the targets placement by a placer for imitation tasks.

retrieve_action_list_at_step(step_number: int, steps_in_lava: int | None = 0, triggered_by_sequence_incorrect: bool | None = False, is_passive_scene: bool = False) → List[source]¶: Return the action list from the given goal at the given step as a a list of actions tuples by default.

class machine_common_sense.Material(value)[source]¶

Possible materials of objects. An object can have one or more materials.

CERAMIC = 'CERAMIC'¶

FABRIC = 'FABRIC'¶

FOOD = 'FOOD'¶

GLASS = 'GLASS'¶

METAL = 'METAL'¶

ORGANIC = 'ORGANIC'¶

PAPER = 'PAPER'¶

PLASTIC = 'PLASTIC'¶

RUBBER = 'RUBBER'¶

SOAP = 'SOAP'¶

SPONGE = 'SPONGE'¶

STONE = 'STONE'¶

UNDEFINED = 'UNDEFINED'¶

WAX = 'WAX'¶

WOOD = 'WOOD'¶

static verify_material_enum_string(enum_string)[source]¶

Returns whether the given string can be successfully converted into an Material enum.

Parameters:: enum_string – The string to be converted into an Material enum.
Return type:: boolean

class machine_common_sense.ObjectMetadata(uuid='', dimensions=None, direction=None, distance=-1.0, distance_in_steps=-1.0, distance_in_world=-1.0, held=False, mass=0.0, material_list=None, position=None, rotation=None, segment_color=None, shape='', state_list=None, texture_color_list=None, visible=False, is_open=False, openable=False, locked=False, associated_with_agent='', simulation_agent_held_object='', simulation_agent_is_holding_held_object=False)[source]¶

Defines metadata for an object in the MCS 3D environment.

Variables:

uuid (string) – The unique ID of this object, used with some actions.
associated_with_agent (str) – The agent holding this object.
dimensions (list of dicts) – The dimensions of this object in the environment’s 3D global coordinate system as a list of 8 points (dicts with “x”, “y”, and “z”).
direction (dict) – The direction vector of “x”, “y”, and “z” degrees between your position and this object’s position (the difference in the two positions), normalized to 1. You can use the “x” and “y” as the “rotation” and “horizon” parameters (respectively) in a “RotateLook” action to face this object.
distance (float) – DEPRECATED. Same as distance_in_steps. Please use distance_in_steps or distance_in_world.
distance_in_steps (float) – The distance from you to this object in number of steps (“Move” actions) on the 2D X/Z movement grid.
distance_in_world (float) – The distance from you to this object in the environment’s 3D global coordinate system.
held (boolean) – Whether you are holding this object.
is_open (boolean) – Whether the object is open or not
locked (boolean) – Whether the object is locked
mass (float) – Haptic feedback. The mass of this object.
material_list (list of strings) – Haptic feedback. The material(s) of this object. See Material.
openable (boolean) – Whether the object can be opened
position (dict) – The “x”, “y”, and “z” coordinates for the global position of the center of this object’s 3D model.
rotation (dict) – This object’s rotation angles around the “x”, “y”, and “z” axes in degrees.
segment_color (dict) – The “r”, “g”, and “b” pixel values of this object in images from the StepMetadata’s “object_mask_list”.
shape (string) – This object’s shape in plain English.
simulation_agent_held_object (str) – The object associated held for the simulation agent that the performer can retrieve through the InteractWithAgent method.
simulation_agent_is_holding_held_object (str) – Whether this simulation agent is currently holding its associated held object.
state_list (list of strings) – This object’s state(s) from the current step in the scene. Sometimes used by objects with scripted behavior in passive scenes.
texture_color_list (list of strings) – This object’s colors, derived from its textures, in plain English.
visible (boolean) – Whether you can see this object in your camera viewport.

class machine_common_sense.ReturnStatus(value)[source]¶: An enumeration.

class machine_common_sense.Reward[source]¶

Reward utility class

Determine if the agent achieved the goal.

Parameters:

goal – GoalMetadata
objects – Dict
agent – Dict
reach – float
steps_on_lava – int
lava_penalty – float
step_penalty – float
goal_reward – float

Returns:

reward is 1 if goal achieved, 0 otherwise

Return type:

int

class machine_common_sense.SerializerMsgPack[source]¶

Serializer to (de)serialize StepMetadata into/from MsgPack format.

static serialize(step_metadata: StepMetadata)[source]¶

Serializes step metadata into MsgPack.

You can use .. code-block:: python

object_to_persist = {
‘payload’: step_metadata, ‘additional_info’: ‘info’ }

to add extra data.

Parameters:: step_metadata – MCS step metadata output
Returns:: Serialized version of step metadata in MsgPack format.

class machine_common_sense.StepMetadata(action_list=None, camera_aspect_ratio=None, camera_clipping_planes=None, camera_field_of_view=0.0, camera_height=0.0, depth_map_list=None, goal=None, habituation_trial=None, haptic_feedback=None, head_tilt=0.0, holes=None, image_list=None, lava=None, object_list=None, object_mask_list=None, performer_radius=0.0, performer_reach=0.0, physics_frames_per_second=0, position=None, resolved_object='', resolved_receptacle='', return_status='UNDEFINED', reward=0, room_dimensions=None, rotation=0.0, segmentation_colors=None, step_number=0, steps_on_lava=0, structural_object_list=None, triggered_by_sequence_incorrect=False)[source]¶

Defines output metadata from an action step in the MCS 3D environment.

Variables:

action_list (list of (string, dict) tuples) –
The list of all actions that are available for the next step. Each action is returned as a tuple containing the action string and the action’s restricted parameters, if any. For example: (“Pass”, {}) forces a Pass action; (“PickupObject”, {}) forces a PickupObject action with any parameters; and (“PickupObject”, {“objectId”: “a”}) forces a PickupObject action with the specific parameters objectId=a. EndHabituation is a special case of the action_list where its parameters will always be empty. When taking the EndHabituation action, the MCS system may apply hidden displacement parameters to the robot. An action_list of None or an empty list means that all actions will be available for the next step.

To “step” using the first action from the action_list:
```
step_metadata = controller.start_scene(scene_data)
action, params = step_metadata.action_list[0]
step_metadata = controller.step(action, **params)
```
Derived from GoalMetadata.action_list[step_number]. May be a subset of all possible actions. See Action.
camera_aspect_ratio ((float, float)) – The player camera’s aspect ratio. This will remain constant for the whole scene.
camera_clipping_planes ((float, float)) – The player camera’s near and far clipping planes, in meters. This will remain constant for the whole scene. Default (0.01, 150)
camera_field_of_view (float) – The player camera’s field of view. This will remain constant for the whole scene.
camera_height (float) – The player camera’s height, in meters.
depth_map_list (list of 2D numpy arrays) – The list of 2-dimensional numpy arrays of depth float data from the scene after the last action and physics simulation were run. This is usually a list with 1 array, except for the output from start_scene for a scene with a scripted Preview Phase (Preview Phase case details TBD). Each 32-bit depth float in the 2-dimensional numpy array is a value between the camera’s near clipping plane (default 0.01) and the camera’s far clipping plane (default 150) corresponding to the depth, in meters, at that pixel in the image. Note that this list will be empty if the metadata level is ‘none’.
goal (GoalMetadata or None) – The goal for the whole scene. Will be None in “Exploration” scenes.
haptic_feedback (dict) – Haptic feedback sources for the agent. Values are true or false depending on if the agent is touching the haptic feedback source. The only current supported contact is “on_lava”
habituation_trial (int or None) – The current habituation trial (as a positive integer), or None if the scene is not currently in a habituation trial (meaning this scene is in a test trial).
head_tilt (float) – How far your head is tilted up/down in degrees (between 90 and -90). Changed by setting the “horizon” parameter in a “RotateLook” action.
holes (list of tuples) – Coordinates of holes as (X, Z) float tuples. Will be set to ‘None’ if using a metadata level below the ‘oracle’ level.
image_list (list of Pillow.Image objects) – The list of images from the scene after the last action and physics simulation were run. This is usually a list with 1 image, except for the output from start_scene for a scene with a scripted Preview Phase. (Preview Phase case details TBD).
lava (list of tuples) – Coordinates of pools of lava as (X1, Z1, X2, Z2) float tuples, where X1/Z1 is the top-left corner and X2/Z2 is the bottom-right conrer. Will be set to ‘None’ if using a metadata level below the ‘oracle’ level.
object_list (list of ObjectMetadata objects) – The list of metadata for all the visible interactive objects in the scene. This list will be empty if using a metadata level below the ‘oracle’ level. For metadata on structural objects like walls, please see structural_object_list
object_mask_list (list of Pillow.Image objects) – The list of object mask (instance segmentation) images from the scene after the last action and physics simulation were run. This is usually a list with 1 image, except for the output from start_scene for a scene with a scripted Preview Phase (Preview Phase case details TBD). The color of each object in the mask corresponds to the “color” property in its ObjectMetadata object. Note that this list will be empty if the metadata level is ‘none’ or ‘level1’.
performer_radius (float) – The radius of the performer, in meters.
performer_reach (float) – The max reach of the performer, in meters.
physics_frames_per_second (float) – The frames per second of the physics engine
position (dict) – The “x”, “y”, and “z” coordinates for your global position. Will be set to ‘None’ if using a metadata level below the ‘oracle’ level.
resolved_object (string) – The object that was selected based on objectImageCoords
resolved_receptacle_object_id (string) – The receptacle that was selected based on receptacleObjectImageCoords
return_status (string) – The return status from your last action. See Action.
reward (integer) – Reward is 1 on successful completion of a task, 0 otherwise.
room_dimensions (dict) – The “x”, “y”, and “z” dimensions of the current scene. Will be set to ‘None’ if using a metadata level below the ‘oracle’ level.
rotation (float) – Your current rotation angle in degrees. Will be set to ‘None’ if using a metadata level below the ‘oracle’ level.
segmentation_colors (list of dicts) – The colors for all objects in the instance segmentation images (in object_mask_list), each represented as a dict containing an “objectId” string property and “r”, “g”, and “b” int properties for the corresponding red, green, and blue values. The ceiling has an objectId of “ceiling”; exterior room walls have objectIds of “wall_back”, “wall_front”, “wall_left”, and “wall_right”; floor sections have objectIds starting with “floor “ and then the texture name (since different areas of the floor can have different textures); holes have objectIds of “hole”; hole walls have objectIds of “hole wall”; and lava areas have objectIds of “lava”. Will be empty if using a metadata level below the ‘oracle’ level.
step_number (integer) – The step number of your last action, recorded since you started the current scene.
steps_in_lava (integer) – The number of steps the agent has touched lava
structural_object_list (list of ObjectMetadata objects) – The list of metadata for all the visible structural objects (like walls, occluders, and ramps) in the scene. This list will be empty if using a metadata level below the ‘oracle’ level. Occluders are composed of two separate objects, the “wall” and the “pole”, with corresponding object IDs (occluder_wall_<uuid> and occluder_pole_<uuid>), and ramps are composed of between one and three objects (depending on the type of ramp), with corresponding object IDs.
triggered_by_sequence_incorrect (bool) – If the the sequence to trigger a placer holding the target is incorrect

copy_without_depth_or_images()[source]¶: Return a deep copy of this StepMetadata with default depth_map_list, image_list, and object_mask_list properties.

class machine_common_sense.Stringifier[source]¶

Defines functions to turn objects into strings for debugging and human readable output. It is not intended to be reconstructed which is why this is seperate from serialization

static class_to_str(input_class, depth=0)[source]¶

Transforms the given class into a string.

Parameters:

input_value – The input class.
depth (int, optional) – The indent depth (default 0).

Return type:

string

static generate_pretty_object_output(object_list)[source]¶

Transforms the given list of ObjectMetadata objects into a list of strings.

Parameters:: object_list (list of ObjectMetadata objects) – The input list.
Return type:: list of strings

static value_to_str(input_value, depth=0)[source]¶

Transforms the given value into a string.

Parameters:

input_value – The input value.
depth (int, optional) – The indent depth (default 0).

Return type:

string

static vector_to_string(vector)[source]¶

Transforms the given vector into a string.

Parameters:: vector (dict) – The input vector.
Return type:: string

class machine_common_sense.UnityExecutableProvider[source]¶

Automatically provides MCS AI2-THOR Unity executable for the MCS package. Will check a cache and download if necessary

clear_cache(version=None)[source]¶: clears the entire cache if no version is passed in, otherwise clears the specified version

get_executable(version=None, download_if_missing=True, force_download=False) → Path[source]¶: For a given version, this will return the path to that executable or throw an exception if it cannot be found.

machine_common_sense.change_config(controller: Controller, config_file_or_dict: Dict | str | None = None)[source]¶

Creates and returns a new MCS Controller object. Should only be called After a run and before a scene is changed.

Parameters:

controller (Controller) – The currently used controller that the config should be changed on.
config_file_or_dict (str or dict, optional) – Can be a path to configuration file to read in or a dictionary of various properties, such as metadata level and whether or not to save history files (default None)

machine_common_sense.create_controller(config_file_or_dict: Dict | str | None = None, unity_app_file_path: str | None = None, unity_cache_version: str | None = None)[source]¶

Creates and returns a new MCS Controller object.

Parameters:

config_file_or_dict (str or dict, required) –
Can be a path to configuration file to read in or a dictionary of various properties, such as metadata level and whether or not to save history files (default None)
- Note the order of precedence for config options, in case more than one is given:
1. MCS_CONFIG_FILE_PATH environment variable (meant for internal TA2 use during evaluation)
2. If no environment variable given, use config_file_or_dict parameter. The value can be a string file path or a dictionary.
3. Raises FileNotFoundError if no config found.
unity_app_file_path (str, optional) – The file path to your MCS Unity application. If Not provided, the internal cache and downloader will attempt to locate and use the current version. (default None)
unity_cache_version (str, optional) – If no file path is provided for the MCS Unity application, the version provided will be found via cache and internal downloader. If not provided, the version matching the MCS code will be used. (default None)

Returns:

The MCS Controller object.

Return type:

Controller

machine_common_sense.get_controller(unity_exec: str, config: ConfigManager)[source]¶: Function to get the controller, pulled into its own function so we can time it.

machine_common_sense.get_controller_with_timeout(unity_exec: str, config: ConfigManager)[source]¶: Wrapper function that sets a timeout for the controller creation. If getting the controller times out, None is returned.

machine_common_sense.init_logging(log_config: Dict | None = None, log_config_file: str = 'log.config.user.py')[source]¶

Initializes logging system. If no parameters are provided, a default configuration will be applied. See python logging documentation for details.

https://docs.python.org/3/library/logging.config.html#logging-config-dictschema

Parameters:

log_config (dict, optional) – A dictionary the contains the logging configuration. If None, a default configuration will be used
log_config_file (str, optional) – Path to an override configuration file. The file will contain a python dictionary for the logging configuration. This file is typically not used, but allows a user to change the logging configuration without code changes. Default, log.config.user.py

machine_common_sense.load_scene_json_file(scene_json_file_path: str) → Dict[source]¶

Loads the given JSON scene config file and returns its data.

Parameters:

config_json_file_path (str) – The file path to your MCS JSON scene configuration file.

Returns:

The MCS scene configuration data from the given JSON file.

Return type:

dict

Raises:

FileNotFoundError –
ValueError –

Python API¶

Table of Contents

Previous topic

Next topic

This Page