planner-inference

Last updated Oct 12, 2017

planner-inference

Infer how suboptimal agents are suboptimal while planning, for example if they are hyperbolic time discounters. Use this to do better inverse reinforcement learning.

Code

Note that for SIMPLE baseline only gridsizes of 8 & 14 work.

To run benchmark testing, run python run_benchmarks.py --low LOW --high HIGH etc.

Gridworlds

gridworld.py: Implements the Gridworld MDP, which is used for simple experiments.

gridworld_data.py: Generates example gridworlds, runs agents on the gridworlds to generate trajectories, and collects all of the trajectories and puts them into training and test sets used for learning.

Agents

agent_interface.py: Defines the interface that agents should follow.

agent_runner.py: Defines run_agent, which given an agent and an environment, runs the agent in the environment, producing a trajectory.

agents.py: Defines many different agents that can play tabular MDPs. Currently the agents are using value iteration like approaches.

Value Iteration Networks

The code here is taken from Tensorflow VINs with a few edits.

model.py: Implementation of VIN and VIN with untied weights.

train.py: Trains a VIN using gridworld data.

Other

disjoint_sets.py: An implementation of the disjoint sets data structure, used in gridworld_data.py to generate interesting grid worlds.

utils.py: Utility functions.

Testing

All of the *_tests.py contain tests for the corresponding *.py file. All of the tests can be run using:

./run_tests.sh

You may need to first give it execute permissions:

chmod +x run_tests.sh

Name		Name	Last commit message	Last commit date
Latest commit History 259 Commits
scratch		scratch
.gitignore		.gitignore
Benchmark_Investigation.ipynb		Benchmark_Investigation.ipynb
Plotting Value Heat Map.ipynb		Plotting Value Heat Map.ipynb
README.md		README.md
agent_interface.py		agent_interface.py
agent_runner.py		agent_runner.py
agent_runner_test.py		agent_runner_test.py
agents.py		agents.py
agents_test.py		agents_test.py
analyze_data.py		analyze_data.py
birl.py		birl.py
convkernel.npy		convkernel.npy
create_graphs.py		create_graphs.py
disjoint_sets.py		disjoint_sets.py
examples.py		examples.py
fast_agents.py		fast_agents.py
gridworld.py		gridworld.py
gridworld_data.py		gridworld_data.py
gridworld_test.py		gridworld_test.py
kerneltest.py		kerneltest.py
maxent.py		maxent.py
mdp_interface.py		mdp_interface.py
merge_data.py		merge_data.py
model.py		model.py
requirements.txt		requirements.txt
run_benchmarks.py		run_benchmarks.py
run_tests.sh		run_tests.sh
tabular_maxent.py		tabular_maxent.py
train.py		train.py
utils.py		utils.py
utils_test.py		utils_test.py
visual_data_explanation.py		visual_data_explanation.py

HumanCompatibleAI/learning_biases

Folders and files

Latest commit

History

Repository files navigation

planner-inference

Code

Gridworlds

Agents

Value Iteration Networks

Other

Testing

About

Resources

Stars

Watchers

Forks

Languages