Skip to content

Comments

MSC4242: State DAGs#841

Open
kegsay wants to merge 5 commits intomainfrom
kegan/4242
Open

MSC4242: State DAGs#841
kegsay wants to merge 5 commits intomainfrom
kegan/4242

Conversation

@kegsay
Copy link
Member

@kegsay kegsay commented Feb 2, 2026

MSC4242: State DAGs

Pull Request Checklist

Copy link
Collaborator

@MadLittleMods MadLittleMods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

Reviewed up to TestMSC4242OnSendJoinSJ04

Haven't looked at the utils in tests/msc4242/msc4242_api_test.go yet

PrevEvents: []string{room.CurrentState(spec.MRoomMember, sender).EventID()},
},
// no event in the state DAG has this event ID.
PrevStateEvents: []string{"$4PRgaFIMcD9z4vzgkUUm0YI5CZHYORUPzWGJac6guAo"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can name this more obviously

Suggested change
PrevStateEvents: []string{"$4PRgaFIMcD9z4vzgkUUm0YI5CZHYORUPzWGJac6guAo"},
PrevStateEvents: []string{"$eventIdDoesNotExist"},

(also applies below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the same because it isn't a valid event ID.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's not valid about it?

$opaque_id

However, the precise format depends upon the room version specification: early room versions included a domain component, whereas more recent versions omit the domain and use a base64-encoded hash instead.

-- https://spec.matrix.org/v1.16/appendices/#event-ids

It seems like the only requirement maybe is that it should be base64 decodable but there isn't any hash comparison to do so I don't think that's a requirement either.

3.3 Event IDs

The event ID is the reference hash of the event encoded using a variation of Unpadded Base64 which replaces the 62nd and 63rd characters with - and _ instead of using + and /. This matches RFC4648’s definition of URL-safe base64.

Event IDs are still prefixed with $ and might result in looking like $Rqnc-F-dvnEYJTyHq_iKxU2bZ1CI92-kuZq3a5lr5Zg.

-- https://spec.matrix.org/v1.16/rooms/v12/#event-ids

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I want this to look like a normal event ID, else HSes may reject early because it isn't a valid SHA256 (length).

-- @kegsay, #841 (comment)

We can pad it out however long it needs to be. But seems like servers should treat these opaquely.

Suggested change
PrevStateEvents: []string{"$4PRgaFIMcD9z4vzgkUUm0YI5CZHYORUPzWGJac6guAo"},
PrevStateEvents: []string{"$eventIdDoesNotExist000000000000000000000000"},

}

/*
func TestMSC4242SendJoinFasterSJ03Inbound(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why commented out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the Synapse implementation does not support faster room joins yet.

defer cancel()
bob := srv.UserID("bob")
testCases := faultyEventTestCases
var mu sync.Mutex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this lock? (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added:

	var mu sync.Mutex // protects faultyEvents

We should name it more obviously

if endIndex > len(eventIDsPerRequest) {
endIndex = len(eventIDsPerRequest)
}
wantEventIDs := eventIDsPerRequest[startIndex:endIndex]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might as well move this lower down to the actual comparison

}
// Repeatedly calls /get_missing_events on hs1 starting from fromEvent, in batches of batchSize.
// If stateDAG is set, asks the server to walk the state DAG.
// Asserts that events are returned (over multiple requests) in the order provided by eventIDsPerRequest.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document which order eventIDsPerRequest should be in

Comment on lines +121 to +124
if !slices.Equal(got, wantEventIDs) {
ct.Errorf(t, "failed to see correct event IDs in test '%s'. \nGot %v \nWant %v", name, got, wantEventIDs)
return
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get a nicer diff, it would be nice to re-use the assertEventsInOrder utility from tests/csapi/room_messages_test.go

ct.Errorf(t, "failed to see correct event IDs in test '%s'. \nGot %v \nWant %v", name, got, wantEventIDs)
return
}
if endIndex == len(eventIDsPerRequest) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the explicit endIndex = len(eventIDsPerRequest) above to prevent over-runs but seems sketchy as code evolves over time.

Perhaps we should adjust the condition to be more forgiving or have an additional assert(endIndex <= eventIDsPerRequest, "programming error")

Suggested change
if endIndex == len(eventIDsPerRequest) {
if endIndex >= len(eventIDsPerRequest) {

must.Equal(t, lastEvent.Exists(), true, "last timeline entry for room does not exist")
expectedJoinPrevStateEvents := prevStateEvents(t, lastEvent)
bob := srv.UserID("bob")
_, sendJoinResp := MustJoinRoom(t, srv, deployment, spec.ServerName("hs1"), roomID, bob)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_, sendJoinResp := MustJoinRoom(t, srv, deployment, spec.ServerName("hs1"), roomID, bob)
_, sendJoinResp := MustJoinRoom(t, srv, deployment, deployment.GetFullyQualifiedHomeserverName(t, "hs1"), roomID, bob)

return room.Timeline[i].EventID()
}
}
t.Logf("%s: failed to find any state event in %d timeline events, no prev_state_events will be set!", room.RoomID, len(room.Timeline))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ever desired? (or programming error)

Comment on lines +164 to +167
serversInRoom := room.ServersInRoom()
for _, srvName := range serversInRoom {
res.ServersInRoom = append(res.ServersInRoom, string(srvName))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this should be gated by omitServersInRoom

Comment on lines +411 to +430
// LoadStateDAG loads all the events under state_dag and puts their event_id: prev_state_events into a map.
// Performs no verification that the DAG is connected.
func LoadStateDAG(t ct.TestLike, sendJoinResp fclient.RespSendJoin) (joinEventID string, stateDAG map[string][]string) {
roomVer := gomatrixserverlib.MustGetRoomVersion(roomVersion)
joinEvent, err := roomVer.NewEventFromTrustedJSON(sendJoinResp.Event, false)
must.NotError(t, "failed to load join event", err)
stateDAG = make(map[string][]string) // event_id => prev_state_events
for _, stateEventJSON := range sendJoinResp.StateDAG {
stateEvent, err := roomVer.NewEventFromTrustedJSON(stateEventJSON, false)
must.NotError(t, "failed to load event json", err)
stateEventGJSON := gjson.ParseBytes(stateEventJSON)
prevStateEvents := prevStateEvents(t, stateEventGJSON)
stateDAG[stateEvent.EventID()] = prevStateEvents
}

// Include the prev state events of the join event itself as that is what we care about
joinEventPrevStateEvents := prevStateEvents(t, gjson.ParseBytes(sendJoinResp.Event))
stateDAG[joinEvent.EventID()] = joinEventPrevStateEvents
return joinEvent.EventID(), stateDAG
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoadStateDAG is unused

return joinEvent.EventID(), stateDAG
}

// We have to implement our own /make|send_join dance here as gmsl is unaware of state DAGs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gmsl ?


func TestMain(m *testing.M) {
complement.TestMain(m, "msc4242")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: While I have read the entire MSC and this PR, I don't think I've onboarded enough to recognize if there are missing test cases. It's my first time with my own hand in the fire when it comes to State DAG's.

This PR is so large, it's probably worth having an LLM also sanity check this against the MSC for completeness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants