Skip to content

RFC: pubOp for creating and updating pubs more easily#965

Merged
3mcd merged 27 commits intomainfrom
tfk/pub-op
Feb 12, 2025
Merged

RFC: pubOp for creating and updating pubs more easily#965
3mcd merged 27 commits intomainfrom
tfk/pub-op

Conversation

@tefkah
Copy link
Copy Markdown
Member

@tefkah tefkah commented Feb 11, 2025

Issue(s) Resolved

Adds a new PubOp system for creating, updating and managing relationships between Pubs using a fluent-like API/

High-level Explanation of PR

Note

This PR is an RFC! Meaning I'm totally happy not merging it!
I'm more interested in your general thoughts on this approach, please nitpick it to hell!

This PR introduces a new PubOp system that provides a fluent, builder-style API for managing Pub operations. The system handles complex scenarios like creating multiple related pubs, managing nested relationships, and handling pub value updates in a single transaction.

API

The API follows a builder pattern, allowing for chaining operations:

const pub = await PubOp.create({
  communityId: community.id,
  pubTypeId: pubType.id,
  lastModifiedBy: createLastModifiedBy({ userId: user.id }),
})
  .set('title', 'My Pub')
  .relate('someRelation', 'relation value', relatedPubId)
  .execute();

All builders

Currently there's 3 kinds of operations you can do:

  • create
  • update
  • upsert

For create, there's also an explicit createWithId.
For both update and upsert, there's an updateByValue and an upsertByValue. These allow you to, instead of selecting a pub by id, select a pub by a specific value, eg a google drive id.

Basic capabilities

Currently, the PubOp allows you to do a bunch of things, here's a small table that shows what you can do with each of the operations:

Capability Create Update Upsert
Set values
Connect/create related pubs
Set stage/move pub
Explicitly disconnect relations - -
Explicitly disconnect values - -
Override existing values -
Replace existing relations -
Set values
const pub = await PubOp.createWithId(pubId, {
  /*...*/
})
  .set('community-slug:title', 'My Pub')
  .execute();
Set many values in one go
const pub = await PubOp.upsert(pubId, {
  /*...*/
})
  .set({
    'community-slug:title': 'My Pub',
    'community-slug:description': 'My Pub description',
  })
  .execute();
Relate pubs
const pub = await PubOp.create({
  /*...*/
})
  .relate('community-slug:someRelation', 'relation value 1', relatedPubId1)
  .relate('community-slug:someRelation', 'relation value 2', relatedPubId2)
  .execute();
Relate multiple pubs in one go
const pub = await PubOp.create({
  /*...*/
})
  .relate('community-slug:someRelation', [
    {
      value: 'relation value 1',
      target: relatedPubId1,
    },
    {
      value: 'relation value 2',
      target: relatedPubId2,
    },
  ])
  .execute();
Nested relate

You can nest pub operations, allowing you to create, relate, and/or update multiple pubs in a single operation:

const pub = await PubOp.upsert(pubId, {
  /*...*/
})
  .relate(
    'someRelation',
    'value',
    PubOp.upsert(relatedPubId, {
      /*...*/
    })
      .set('title', 'Related Pub')
      .relate(
        'anotherRelation',
        'value2',
        PubOp.upsert(relatedPubId2, {
          /*...*/
        }).set('title', 'Related Pub 2')
      )
  )
  .execute();

If you don't want to keep repeating { communityId: ..., lastModifiedBy: ... }, you also define a function instead
This just returns a new PubOp with communityId etc already set.

You'll still need to provide the pubTypeId for upsert and create operations.

const pub = await PubOp.create({
  communityId,
  pubTypeId,
  lastModifiedBy: createLastModifiedBy({ userId }),
})
  .set('community-slug:title', 'Community Title')
  .relate('community-slug:someRelation', 'relation value', (pubOp) =>
    pubOp
      .create({
        pubTypeId,
      })
      .set('pub-type-slug:title', 'Nested Pub')
  )
  .execute();

Test Plan

  1. Look at the tests in
    https://github.com/pubpub/platform/blob/64ebece2eab27b80e4c1b7d8c3c6a988d863f193/core/lib/server/pub-op.db.test.ts#L1-L1329

  2. Write one extra test to get a feel for it (don't need to push it, but would be appreciated!)

Screenshots (if applicable)

Notes

Note

This new API isn't used anywhere yet, that comes later!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are helper methods to make matching pubvalues easier.

eg instead of doing

expect(pub.values).toHaveLength(3);
pub.values.sort(/* sorting bc you want the order of the values to be stable over different tests */)
expect(pub.values).toMatchObject([ 
	...
]);

you just do

expect(pub).toHaveValues([...])

Similarly for

expect(pubId).toExist()

just easier than manually looking up the pub yourself

Comment on lines +796 to +826
it("should handle complex nested relation scenarios", async () => {
const trx = getTrx();
// manual rollback try/catch bc we are manually setting pubIds, so a failure in the middle of this will leave the db in a weird state
try {
// Create all pubs with meaningful IDs
const pubA = "aaaaaaaa-0000-0000-0000-000000000000" as PubsId;
const pubB = "bbbbbbbb-0000-0000-0000-000000000000" as PubsId;
const pubC = "cccccccc-0000-0000-0000-000000000000" as PubsId;
const pubD = "dddddddd-0000-0000-0000-000000000000" as PubsId;
const pubE = "eeeeeeee-0000-0000-0000-000000000000" as PubsId;
const pubF = "ffffffff-0000-0000-0000-000000000000" as PubsId;
const pubG = "11111111-0000-0000-0000-000000000000" as PubsId;
const pubH = "22222222-0000-0000-0000-000000000000" as PubsId;
const pubI = "33333333-0000-0000-0000-000000000000" as PubsId;
const pubJ = "44444444-0000-0000-0000-000000000000" as PubsId;
const pubK = "55555555-0000-0000-0000-000000000000" as PubsId;
const pubL = "66666666-0000-0000-0000-000000000000" as PubsId;

// create the graph structure:
// A J
// / \ |
// / \ |
// B C --> I
// | / \
// G --> E D
// / \
// F H
// / \
// K --> L

// create leaf nodes first
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is by far the most complex test, and very curious whether you agree with whether this should be the behavior.
i spent a bit too long trying to figure this out haha

see my comment in pub-op.ts for a more in depth explanation about the "algorithm" for determining orphans

Comment on lines +833 to +999
/**
* remove pubs that have been disconnected/their value removed,
* has `deleteOrphaned` set to true for their relevant relation operation,
* AND have no other relations
*
* curently it's not possible to forcibly remove pubs if they are related to other pubs
* perhaps this could be yet another setting
*
* ### Brief explanation
*
* Say we have the following graph of pubs,
* where `A --> C` indicates the existence of a `pub_value`
* ```ts
* {
* pubId: "A",
* relatedPubId: "C",
* }
* ```
*
* ```
* A J
* ┌──┴───┐ │
* ▼ ▼ ▼
* B C ────────► I
* │ ┌─┴────┐
* ▼ ▼ ▼
* G ─► E D
* │ │
* ▼ ▼
* F H
* ┌─┴──┐
* ▼ ▼
* K ──► L
* ```
*
* Say we now disconnect `C` from `A`, i.e. we remove the `pub_value` where `pubId = "A"` and `relatedPubId = "C"`
*
*
* Now we disrelate C from A, which should
* orphan everything from D down,
* but should not orphan I, bc J still points to it
* and should not orphan G, bc B still points to it
* it orphans L, even though K points to it, because K is itself an orphan
* ```
* A J
* ┌──┴ │
* ▼ ▼
* B C ────────► I
* │ ┌─┴────┐
* ▼ ▼ ▼
* G ─► E D
* │ │
* ▼ ▼
* F H
* ┌─┴──┐
* ▼ ▼
* K ──► L
* ```
*
* Then by using the following rules, we can determine which pubs should be deleted:
*
* 1. All pubs down from the disconnected pub
* 2. Which are not reachable from any other pub not in the tree
*
* Using these two rules, we can determine which pubs should be deleted:
* 1. C, as C is disconnected is not the target of any other relation
* 2. D, F, H, K, and L, as they are only reachable from C, which is being deleted
*
* Notably, E and I are not deleted, because
* 1. E is the target of a relation from G, which, while still a relation itself, is not reachable from the C-tree
* 2. I is the target of a relation from J, which, while still a relation itself, is not reachable from the C-tree
*
* So this should be the resulting graph:
*
* ```
* A J
* ┌──┴ │
* ▼ ▼
* B I
* │
* ▼
* G ─► E
* │
* ▼
* F
* ```
*
*
*/
private async cleanupOrphanedPubs(
trx: Transaction<Database>,
orphanedPubIds: PubsId[]
): Promise<void> {
if (orphanedPubIds.length === 0) {
return;
}

const pubsToDelete = await trx
.withRecursive("affected_pubs", (db) => {
// Base case: direct connections from the to-be-removed-pubs down
const initial = db
.selectFrom("pub_values")
.select(["pubId as id", sql<string[]>`array["pubId"]`.as("path")])
.where("pubId", "in", orphanedPubIds);

// Recursive case: keep traversing outward
const recursive = db
.selectFrom("pub_values")
.select([
"relatedPubId as id",
sql<string[]>`affected_pubs.path || array["relatedPubId"]`.as("path"),
])
.innerJoin("affected_pubs", "pub_values.pubId", "affected_pubs.id")
.where((eb) => eb.not(eb("relatedPubId", "=", eb.fn.any("affected_pubs.path")))) // Prevent cycles
.$narrowType<{ id: PubsId }>();

return initial.union(recursive);
})
// pubs in the affected_pubs table but which should not be deleted because they are still related to other pubs
.with("safe_pubs", (db) => {
return (
db
.selectFrom("pub_values")
.select(["relatedPubId as id"])
.distinct()
// crucial part:
// find all the pub_values which
// - point to a node in the affected_pubs
// - but are not themselves affected
// these are the "safe" nodes
.innerJoin("affected_pubs", "pub_values.relatedPubId", "affected_pubs.id")
.where((eb) =>
eb.not(
eb.exists((eb) =>
eb
.selectFrom("affected_pubs")
.select("id")
.whereRef("id", "=", "pub_values.pubId")
)
)
)
);
})
.selectFrom("affected_pubs")
.select(["id", "path"])
.distinctOn("id")
.where((eb) =>
eb.not(
eb.exists((eb) =>
eb
.selectFrom("safe_pubs")
.select("id")
.where(sql<boolean>`safe_pubs.id = any(affected_pubs.path)`)
)
)
)
.execute();

if (pubsToDelete.length > 0) {
await deletePub({
pubId: pubsToDelete.map((p) => p.id),
communityId: this.options.communityId,
lastModifiedBy: this.options.lastModifiedBy,
trx,
});
}
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is by far the most complex part of the PR, am curious if you think the general approach is correct! obviously very annoying to check the actual sql, but im more curious whether you think my definition of orphans here tracks

Comment on lines +631 to +640
protected async executeWithTrx(trx: Transaction<Database>): Promise<PubsId> {
const operations = this.collectOperations();

await this.createAllPubs(trx, operations);
await this.processStages(trx, operations);
await this.processRelations(trx, operations);
await this.processValues(trx, operations);

return this.id;
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is basically entirely what a pubOp can do. First create all pubs, add them to stages, do relation stuff, then do value stuff (those last two might be able to be done at the same time, or changed)

notably, all of these things happen at the same time for all pubs. so no more recursive calls, we first collect all operations and then do these things one by one. much more easy to reason about that way i think

Comment on lines +1098 to +1130
if (nullStages.length > 0) {
await autoRevalidate(
trx.deleteFrom("PubsInStages").where(
"pubId",
"in",
nullStages.map(({ pubId }) => pubId)
)
).execute();
}

const nonNullStages = stagesToUpdate.filter(({ stageId }) => stageId !== null);

if (nonNullStages.length > 0) {
await autoRevalidate(
trx
.with("deletedStages", (db) =>
db
.deleteFrom("PubsInStages")
.where((eb) =>
eb.or(
nonNullStages.map((stageOp) => eb("pubId", "=", stageOp.pubId))
)
)
)
.insertInto("PubsInStages")
.values(
nonNullStages.map((stageOp) => ({
pubId: stageOp.pubId,
stageId: stageOp.stageId,
}))
)
).execute();
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully thisll be a bit simpler once #967 lands!

Copy link
Copy Markdown
Member Author

@tefkah tefkah Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seperated this out into a separate file so you can createSeed while still having the automatic rollbacks work

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah missed this earlier, awesome!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please look here for usage!
im really curious what you all think. please comment any ideas/feedback you have for better naming or anything else regarding the api! really interested in anything, especially "i hate this" or something!

@tefkah tefkah changed the title dev: pubOp for creating and updating pubs more easily RFC: pubOp for creating and updating pubs more easily Feb 11, 2025
@tefkah tefkah marked this pull request as ready for review February 11, 2025 18:40
import { CoreSchemaType, MemberRole } from "db/public";

import type { Seed } from "~/prisma/seed/seedCommunity";
import { createSeed } from "~/prisma/seed/createSeed";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc importing createSeed up here did make data persist whereas importing just the type { Seed } didn't?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but i think that was bc they were both imported from /seedCommunity. I separated out the createSeed to a different file, which (I did not explicitly check) i think should fix that issue.
i do think using createSeed is slightly better as that preserves the type info once you pass it to seedCommunity, eg if you have pubTypes: { 'SomepubType': ... } in your seed, the output of seedcommunity will have pubTypes.SomePubType as well

expected: Partial<ProcessedPub["values"][number]>[]
) {
if (typeof received === "string") {
throw new Error("toHaveValues() can only be called with a ProcessedPub");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come received: PubsId | ProcessedPub up above instead of received: ProcessedPub then?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! it was left over bc i think i wanted to be able to pass in both in either case, and was having some trouble with the types. i later decided to just use PubsId for toExist and ProcessedPub for toHaveValues, but never updated it! I'll fix it

Comment on lines +172 to +178
const pub2 = await PubOp.upsert(crypto.randomUUID() as PubsId, {
communityId: seededCommunity.community.id,
pubTypeId: seededCommunity.pubTypes["Basic Pub"].id,
lastModifiedBy: createLastModifiedBy("system"),
})
.relate(seededCommunity.pubFields["Some relation"].slug, "test relations value", pub.id)
.execute();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quite nice! just so I understand, if pub2 were already created, would we also have to call .upsert(...).relate(...)? i.e. there's nothing like PubOp.relate

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah correct!
You'd more likely do PubOp.update(pub2Id).relate() if you did not want to override the existing relations, but yes.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for context, initially i did only have upsert, not create and update, but i thought it would be best to have separate ones bc

  • pubOp.upsert().unrelate() is kind of weird I think
  • I wanted to have upsert more act like a PUT than a PATCH if the pub already exists, ie get rid of the existing values/relations by default if they are not mentioned in the upsert. But i ofc did want to keep the functionality to only update a subset of values, so we would need an update (or force consumers of this api to do a bunch of unnatural configuration, which is what i was doing before in feat: add proper pub upsert functions #914 )
  • it's good to get an error if you are explicitly trying to create a pub that already exists, rather than silently update it when that's maybe not what you wanted.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks for the context, makes sense!

Comment on lines +787 to +790
/**
* remove pubs that have been disconnected/their value removed,
* has `deleteOrphaned` set to true for their relevant relation operation,
* AND have no other relations
Copy link
Copy Markdown
Member Author

@tefkah tefkah Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's a better idea to have this be a configuration setting on the PubType (or the pubField?)
like, for some relations this could be desireable, like versions and discussions in the current arcadia community. you probably want to delete all the versions of the pub if that pub is deleted.

but for others, like tag or something, you would not want this. you wouldn't want to delete a Tag pub if the you delete the last Pub that is relating to it, you probably want to keep it around.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the UI i think we can just force the user to manually delete all the relations first, but i think it would be good in certain places to automatically decide this (import actions)

lastModifiedBy: createLastModifiedBy("system"),
})
.set(seededCommunity.pubFields["Title"].slug, "Test")
.relate(seededCommunity.pubFields["Some relation"].slug, "relation1", (pubOp) =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's really nice! 🙌

Copy link
Copy Markdown
Collaborator

@3mcd 3mcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests look like they cover most of the PubOp behavior and the logic for pub orphans seems sound. Looking forward to tomorrow's chat about the recursive CTE.

@3mcd 3mcd merged commit 012772d into main Feb 12, 2025
6 checks passed
@3mcd 3mcd deleted the tfk/pub-op branch February 12, 2025 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants