Skip to content

Robocasa Language Embedding Cuda Out of Memory Error #191

@JacobB33

Description

@JacobB33

In line 222 of the Robocasa branch of robomimic/utils/train_utils.py, upon dataset creation, the dataset kwargs are deepcoppied. Since the language embedding model is one of the dataset_kwargs, this makes a copy of the model as well. This has caused me to run into a cuda out-of-memory issue when you train on a large number of dataset files. For example in Libero if you have 90 datasets, there are 90 copies of the language embedding model in cuda memory.
I made a quick modification that fixed this problem:

for i in range(len(ds_weights)):     
        ds_kwargs_copy = deepcopy(ds_kwargs)
        # Change so that we do not run out of cuda memory
        if "lang_encoder" in ds_kwargs:
                ds_kwargs_copy["lang_encoder"] = ds_kwargs["lang_encoder"]

        keys = ["hdf5_path", "filter_by_attribute"]

        for k in keys:
            ds_kwargs_copy[k] = ds_kwargs[k][i]

        ds_kwargs_copy["dataset_lang"] = ds_langs[i]
        ds_list.append(ds_class(**ds_kwargs_copy))

Should I maybe make this a PR? It might be more efficient to pop the lang_encoder and then not copy it for every dataset (even though with the above fix it gets immediately deleted)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions