Hilbert foundation policies (HILPs) provide a way to train general-purpose policies in an unsupervised way from offline data: analyze a dataset of trajectories, learn a metric space of behaviors, and then solve new tasks in zero- or few-shot! https://seohong.me/projects/hilp/ A thread 👇
See Tweet