Performance improvements

Queries running via trino-lb take significantly longer than running them on Trino directly.
On design choice of trino-lb is that it proxies not only the initial `POST /v1/statement`, but all following requests.

This has the following benefits:

1. Trino clients can be network-separated from the Trino cluster
2. trino-lb needs to proxy all traffic, so that it knows how many queries are running on each cluster. Otherwise it won't notice when a query completes or fails.
3. It only has an impact when transferring data from the Trino cluster to the trino-client. Simple `count(*)`are not really affected, as they don't transfer much data
  a. In addition, there is work going on with a [client spooling protocol](https://trino.io/docs/current/client/client-protocol.html), which would mean the data would not flow through trino-lb any more

Nevertheless, I spend some hours figuring out if we can improve the performance, especially focusing on data transforms using `select *`. Command used is
`time echo "select * from tpch.sf1.customer;" | TRINO_PASSWORD=XXX java -jar ~/Downloads/trino-cli-*-executable.jar --server https://X.X.X.X:8443 --insecure --user admin --password --file /dev/stdin --http-proxy 127.0.0.1:1234`
in combination with `nix-shell -p mitmproxy --command 'mitmproxy --listen-port 1234 --ssl-insecure'` too see latency information in mitmproxy.

The following PR came out of this:

- [x] https://github.com/stackabletech/trino-lb/pull/70
- [X] https://github.com/stackabletech/trino-lb/pull/71
- [x] https://github.com/stackabletech/trino-lb/pull/73
- [x] https://github.com/stackabletech/trino-lb/pull/74

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Performance improvements #72

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Performance improvements #72

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions