Skip to content

Integrating a language model with ULTRA #9

@daniel4x

Description

@daniel4x

Hi @migalkin,
First of all, Kudus for your work!!!! (both ULTRA and nodepiece 😄 ) .

I'm curious to hear your thoughts about integrating a language model (LM) with ULTRA.
Previously, with other KG models such as nodepiece, it was straightforward to integrate a language model to enrich the graph embeddings with textual embeddings.
I used to concat both the entity textual and graph representations and maybe apply additional layers to match the desired dimensions.

example:

# code from pykeen framework + modification
x_e, x_r = entity_representations[0](), self.relation_representations[0]()
indicies = torch.arange(self.text_representation.weight.data.shape[0])
x_e = self.merge_model(self.text_representation(indicies), x_e)  # Concat + linear layer

# Perform message passing and get updated states
for layer in self.gnn_encoder:
        x_e, x_r = layer(
            x_e=x_e,
            x_r=x_r,
            edge_index=getattr(self, f"{mode}_edge_index"),
            edge_type=getattr(self, f"{mode}_edge_type"),
        )

So far, it worked well and boosted the model's performance from ~50% when used with transE and up to ~30% with nodepiece on my datasets.

With ULTRA I guess that I have some additional work to do :)...
I started with understanding how the entity representation is "generated" on the fly:
https://github.com/DeepGraphLearning/ULTRA/blob/33c6e6b8e522aed3d33f6ce5d3a1883ca9284718/ultra/models.py#L166-L174C4

I understand that from that point only the tail representations are used to feed the MLP.

I replaced the MLP with my own MLP - to match the dim to the concatenation of both representations. Then, I tried to contact both, output from ULTRA with the textual entity representation. As far as I understand, due to this "late" concatenation only the tail entity textual representation will be used.
When tested, I got (almost) the same results with/without the textual representation.

Not sure what I expect to hear :), but I hope you may have an idea for combining both representations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions