Skip to content

Conversation

@zeroshade
Copy link
Member

Extracted from #3607 with influence by the comments there and main...CurtHagenlocher:arrow-adbc:MoreResults, this contains a proposal for handling multi-result set query execution via ADBC by adding a new function for drivers, AdbcStatementNextResultSet.

This also includes the necessary changes for an ADBC API Revision 1.2.0 (macro defines and so on). The comment above the function includes all the semantic definitions of the behavior.

@zeroshade
Copy link
Member Author

After discussion and agreement on the proposed function, I'll update the relevant implementations here of the other driver managers to leverage and expose this (python, rust, Go, etc.)

@lidavidm
Copy link
Member

lidavidm commented Jan 9, 2026

Should we make a branch for spec changes?

@zeroshade
Copy link
Member Author

Should we make a branch for spec changes?

That makes sense, gather the spec updates in the branch and then eventually decide to push it as one thing.

@lidavidm
Copy link
Member

lidavidm commented Jan 9, 2026

I made https://github.com/apache/arrow-adbc/tree/spec-1.2.0

@zeroshade zeroshade changed the base branch from main to spec-1.2.0 January 9, 2026 15:39
@zeroshade
Copy link
Member Author

I've changed the base for this PR to point to the spec-1.2.0 branch.

Now we just need people to weigh in on the proposal 😄

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

The previous PR also had:

ADBC_EXPORT
AdbcStatusCode AdbcStatementNextResultSetSchema(struct AdbcStatement* statement,
                                                struct ArrowSchema* schema,
                                                struct AdbcError* error);

Is that no longer planned or no longer needed?

Comment on lines 2117 to 2122
ADBC_EXPORT
AdbcStatusCode AdbcStatementNextResultSet(struct AdbcStatement* statement,
struct ArrowArrayStream* out,
struct AdbcPartitions* partitions,
int64_t* rows_affected,
struct AdbcError* error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to have this just be:

ADBC_EXPORT
AdbcStatusCode AdbcStatementNextResultSet(struct AdbcStatement* statement,
                                          struct AdbcError* error);

...and then subsequent calls to AdbcStatementExecute(), AdbcStatementExecuteSchema(), AdbcStatementExecutePartitions(), and future async versions of all of those don't have to be redefined?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a reasonable idea. Maybe we can specify that the query must be NULL in those cases? (Though, that would translate weirdly into many of the language bindings, albeit they also don't necessarily have to be 1:1.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think we need some way to differentiate, in that case, "I want to start a new query" vs "I want to continue executing the query". And ideally we remain backwards-compatible, so that calling ExecuteQuery again would start a new query as it currently does.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to essentially handle all of the cases with the one function here (minus async) by specifying particular inputs as null or not. I'm don't particularly like having to figure out how to differentiate between the Execute* methods starting a new execution vs continuing after a call to NextResultSet, though that might just be bias from existing APIs I've used.

My thoughts were that the Execute* methods return a stream/schema/list of partitions, so calling NextResultSet should also do the same.

The other pattern I've seen would be that we'd create an AdbcResult struct which would have a NextResultSet callback on it (which IMO we should have done in the first place), but this would be a breaking change or at least require an entirely new set of functions.

@CurtHagenlocher @davidhcoe do either of you have any opinions on this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can specify that the query must be NULL in those cases?

Does that mean the user has to explicitly set it back to NULL? Or that the statement should have internally set it to NULL when it was executed? That's kind of annoying if you're trying to re-use the same Statement instance to execute the same query multiple times if you have to re-set the query to execute it again. Some users might expect that they need to hold on to the Statement to execute it without paying the cost of preparing it again. They might even try to cache Statements that they've already prepared alongside the parent connection.


Including the partitions API makes this method weirdly modal, I would say if you wanted to lean into the partitions API then you should commit to it and just return a partition from here. Or just split it into two separate methods.

But it's already hard to understand from the API design how you're supposed to use partitions, much less how you're supposed to implement them in the driver. Having to go back to the connection to read the partition doesn't make a ton of sense to me, unless it's a separate query or protocol command to read from it, like fetching from a named cursor.


I suggested privately that there should be a new ExecuteMulti method that explicitly returns a multi-result-set stream. That encodes the user's expectation up-front that there will be multiple result sets from the query, and doesn't require Statement itself to become any more stateful than it is now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paleolimbot and @davidhcoe / @CurtHagenlocher?

If everyone's okay with this idea I'll take a stab at updating this proposal accordingly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a great point that there's already a lot of state to keep track of in a statement, although that bookeeping still has to happen somewhere (both for the driver, ensuring it doesn't call something that will crash, and the caller, ensuring it doesn't violate any constraint about holding multiple results). Possibly there is just no getting around a lot of new C functions (whether they're on some new AdbcResultSet or the existing AdbcStatement).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I struggle with is how do you know when additional results are expected? Is there something that would split the SQL statements to know there are multiple? Do you always call AdbcStatementNextResultSet just to check if there is a second one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only familiar with the libpq way of doing this where you call PQgetResult() until it returns NULL (although some libraries probably expose a way to access the number of statements in something that is prepared).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ODBC, JDBC and ADO.NET APIs all work in this fashion. One thing to consider for databases which support stored procedures is that it may not be possible to know in advance how many result sets they will return. I know this is true for MSSQL and think it might be true for PostgreSQL as well.

@zeroshade
Copy link
Member Author

I've updated the PR based on the comments and discussion. Everyone please take a look and let me know what you think!

/// with the result set.
///
/// \since ADBC API revision 1.2.0
struct ADBC_EXPORT AdbcResultSet {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd vaguely prefer these as free functions to make it so that there's only one struct that needs possible extension later.

/// \since ADBC API revision 1.2.0
struct ADBC_EXPORT AdbcResultSet {
/// \brief opaque implementation-defined state
void* private_data;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should also have a pointer to the AdbcDriver as with the other structs?

/// \return ADBC_STATUS_NOT_IMPLEMENTED if the driver only supports fetching results
/// as partitions, ADBC_STATUS_INVALID_STATE if called at an inappropriate time,
/// and ADBC_STATUS_OK (or an appropriate error code) otherwise.
AdbcStatusCode (*NextResultSet)(struct AdbcResultSet* self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we also want a NextSchema callback?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, it'd probably be better to indicate the desire for that up front so that the driver can choose what to do properly. (And actually, I'm not sure if database clients generally support this in the first place...)

/// \brief opaque implementation-defined state
void* private_data;

/// \brief Release the result set and any associated resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can say this also cancels the query if not completed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants