-
-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Queries running via trino-lb take significantly longer than running them on Trino directly.
On design choice of trino-lb is that it proxies not only the initial POST /v1/statement, but all following requests.
This has the following benefits:
- Trino clients can be network-separated from the Trino cluster
- trino-lb needs to proxy all traffic, so that it knows how many queries are running on each cluster. Otherwise it won't notice when a query completes or fails.
- It only has an impact when transferring data from the Trino cluster to the trino-client. Simple
count(*)are not really affected, as they don't transfer much data
a. In addition, there is work going on with a client spooling protocol, which would mean the data would not flow through trino-lb any more
Nevertheless, I spend some hours figuring out if we can improve the performance, especially focusing on data transforms using select *. Command used is
time echo "select * from tpch.sf1.customer;" | TRINO_PASSWORD=XXX java -jar ~/Downloads/trino-cli-*-executable.jar --server https://X.X.X.X:8443 --insecure --user admin --password --file /dev/stdin --http-proxy 127.0.0.1:1234
in combination with nix-shell -p mitmproxy --command 'mitmproxy --listen-port 1234 --ssl-insecure' too see latency information in mitmproxy.
The following PR came out of this:
Metadata
Metadata
Assignees
Labels
Type
Projects
Status