Skip to content

Performance improvements #72

@sbernauer

Description

@sbernauer

Queries running via trino-lb take significantly longer than running them on Trino directly.
On design choice of trino-lb is that it proxies not only the initial POST /v1/statement, but all following requests.

This has the following benefits:

  1. Trino clients can be network-separated from the Trino cluster
  2. trino-lb needs to proxy all traffic, so that it knows how many queries are running on each cluster. Otherwise it won't notice when a query completes or fails.
  3. It only has an impact when transferring data from the Trino cluster to the trino-client. Simple count(*)are not really affected, as they don't transfer much data
    a. In addition, there is work going on with a client spooling protocol, which would mean the data would not flow through trino-lb any more

Nevertheless, I spend some hours figuring out if we can improve the performance, especially focusing on data transforms using select *. Command used is
time echo "select * from tpch.sf1.customer;" | TRINO_PASSWORD=XXX java -jar ~/Downloads/trino-cli-*-executable.jar --server https://X.X.X.X:8443 --insecure --user admin --password --file /dev/stdin --http-proxy 127.0.0.1:1234
in combination with nix-shell -p mitmproxy --command 'mitmproxy --listen-port 1234 --ssl-insecure' too see latency information in mitmproxy.

The following PR came out of this:

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions