Skip to content

Conversation

@psteinroe
Copy link
Contributor

@psteinroe psteinroe commented May 17, 2025

EDIT: Removed the tests - this PR is just the xtask now :) can't tag you as a reviewer for some reason, but this is ready @sbdchd

adds the postgress regression test suite to stress-test the parser:

  • a xtask that downloads and preprocesses the sql files from the postgres repo
  • a test in the squawk_parser crate that runs all of them through the parser

In the preprocessing step, we remove everything that we do not want to support (yet), mainly plpgsql commands such as \gset.

Right now, all test fail. I peeked through some of the tests and it's mainly due to :name vars. I think it makes sense to lex them as IDENT tokens? Happy to add support for this.

Overall I think this test suite can help stabilise the parser. My hope is that it becomes stable enough to replace libpg_query in the postgres language server.

Also: not sure where to put the tests since currently they all are part of the crate itself. For this kind of tests, I usually integrate them as integration tests. Let me know what you prefer! :)

also happy to not add them at all, or maybe not do snapshot testing, or maybe download the files and run the tests on-demand to not pollute the codebase! maybe a separate xtask that can be executed on-demand in CI / locally makes sense too - like https://areweturboyet.com

@netlify
Copy link

netlify bot commented May 17, 2025

👷 Deploy request for squawkhq pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 7f1cd0a

@sbdchd
Copy link
Owner

sbdchd commented May 17, 2025

Ah I really like this idea!! Checking against postgres is a solid plan

I was also thinking in the future, a similar thing where we check against some open source projects to make sure we're parsing things properly!

I wonder if there's a way to make it not save the 1.7 million lines in the repo? Maybe it doesn't matter though -- if postgres is doing it it's probably fine?

@sbdchd
Copy link
Owner

sbdchd commented May 17, 2025

Yeah putting it in the repo is good, easier that way

I think doing it in two PRs would be easier to review, one for the rust stuff to download and run the tests and then one that actually saves all the SQL & snapshots!

@sbdchd
Copy link
Owner

sbdchd commented May 17, 2025

Are :name style placeholders part of the postgres grammar?

@psteinroe
Copy link
Contributor Author

Are :name style placeholders part of the postgres grammar?

It's supported in psql and used oftentimes when executing sql files. Reference here: https://www.postgresql.org/docs/current/app-psql.html

Relevant part:
image

@psteinroe
Copy link
Contributor Author

psteinroe commented May 17, 2025

Yeah putting it in the repo is good, easier that way

I think doing it in two PRs would be easier to review, one for the rust stuff to download and run the tests and then one that actually saves all the SQL & snapshots!

Yeah will clean it up! Was thinking maybe we don't want to store snapshots of successful attempts for now? This would reduce the loc count significantly. The failing ones should also not fail the test (for now). I would use their output as a reference when fixing compatibility issues. wdyt?

I believe the snapshots tests are not of much value since we are only checking whether the parser can parse valid sql and not whether the output is actually correct?

@psteinroe
Copy link
Contributor Author

I think we should support template parameters in different variations too: supabase-community/postgres-language-server#396

@sbdchd
Copy link
Owner

sbdchd commented May 17, 2025

Yeah skipping snapshots for the postgres' SQL makes sense, we can always add to the existing test suite if necessary!

And yeah we don't want to fail CI with the current incompatibility against the postgres test suite

covering those params sounds good, in the future, I want to support parsing SQL inside string literals in other languages, like python, so we'd need to support things like:

We might not even need to update the lexer for all of them, can do it piecemeal

@psteinroe
Copy link
Contributor Author

Sounds good! Will open another pr with just the script then. If you add support for one of these params, I can also take over the rest. Would then also go through the remaining failing tests and fix what needs to be fixed.

@psteinroe psteinroe changed the title chore: add regression test suite chore: add xtask to download regression tests May 19, 2025
@psteinroe psteinroe marked this pull request as ready for review May 19, 2025 05:54
.collect();

if urls.is_empty() {
return Err(anyhow::anyhow!("No SQL files found"));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Err(anyhow::anyhow!("No SQL files found"));
bail!("No SQL files found");

Comment on lines 71 to 74
return Err(anyhow::anyhow!(
"Failed to fetch SQL files: {}",
String::from_utf8_lossy(&output.stderr)
));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Err(anyhow::anyhow!(
"Failed to fetch SQL files: {}",
String::from_utf8_lossy(&output.stderr)
));
bail!(
"Failed to fetch SQL files: {}",
String::from_utf8_lossy(&output.stderr)
);

Comment on lines 35 to 39
return Err(anyhow::anyhow!(
"Failed to download '{}': {}",
url,
error_msg
));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Err(anyhow::anyhow!(
"Failed to download '{}': {}",
url,
error_msg
));
bail!(
"Failed to download '{}': {}",
url,
error_msg
);

use anyhow::Result;
use std::fs::{create_dir_all, remove_dir_all, File};
use std::io::{BufRead, Cursor, Write};
use std::path::PathBuf;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you use use camino::Utf8PathBuf; instead of PathBuf you'll avoid some of the utf8 nonsense but doesn't matter that much

for line in source.lines() {
let mut line = line?;

// Detect the start of the COPY block
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does a copy block look like? curious what it has that we don't support parsing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the regression test is executed via psql, and they use COPY to load data into test tables FROM STDIN

COPY bitwise_test FROM STDIN NULL 'null';
1	1	1	1	1	B0101
3	3	3	null	2	B0100
7	7	7	3	4	B1100
\.

@sbdchd
Copy link
Owner

sbdchd commented May 20, 2025

Looks good, left some minor style nits!

@psteinroe psteinroe requested a review from sbdchd May 20, 2025 07:02
@psteinroe
Copy link
Contributor Author

Hey @sbdchd, I am not able to merge or add the automerge label, but this is ready

Copy link
Owner

@sbdchd sbdchd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!!

@sbdchd sbdchd added the automerge automerge with kodiak label May 20, 2025
@kodiakhq kodiakhq bot merged commit 4394106 into sbdchd:master May 20, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge automerge with kodiak

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants