-
Notifications
You must be signed in to change notification settings - Fork 334
cope with distinct: true #655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks @dvv. I'm not sure this is actually an issue. When you don't provide a select statement it selects the I'm not sure it's something we should change because now we are adding fields to the query that cannot be put into the resulting elixir struct. |
|
@greg-rychlewski I see. |
|
Ecto.Repo.Queryable {%{__struct__: name}, %{}} ->
for {key, _} <- right, key != false and not Map.has_key?(left, key) do
raise ArgumentError, "struct #{inspect(name)} does not have the key #{inspect(key)}"
end
Map.merge(left, right |> Map.drop([false]))just does the trick |
|
@greg-rychlewski my gut feeling is that it makes sense to make it more convenient to the user and automatically include any order by in the select. The issue, however, is that we will be adding more fields to all queries written up to this day and in some cases, but I guess we could fix those by doing a Enum.uniq on the string values? |
|
The main potential issue I see is when getting the results from the db and trying to collect them at the end. We will have more fields than asked for and they will potentially be incompatible. For example a field that doesn't belong to a struct. If we try to de-dup I'm not 100% sure if that will cause complications as well. Because we generally iterate through a list of returned columns and assume they are in a certain order. |
|
@greg-rychlewski this PR is exactly for that -- just before composing SQL it analyzes order_bys and appends mentioned fields (if they refer to joins) to select fields. |
|
@dvv That is the query that goes to the database. Now the result comes back to Elixir and it has to be packed inside of the Elixir structure you requested. |
|
The extra fields appended at this stage do not spoil the result because packing instructions were collected before. |
|
I'm not personally sure if that is true in all cases. But if you and Jose have evaluated that thoroughly and know for sure it's not an issue then that is the main thing I was concerned about. |
|
@greg-rychlewski I'm not sure too. I just have no sufficient expertise. That is why I also proposed a lighter way: provide user with possibility to |
|
Another option is to add a However, it still sounds weird to return all of this over the wire only to discard it. I think ultimately the problem here is in the query. Do you really need to pass |
|
@josevalim I namely do need to filter over association while returning distinct set of primary relation. Do you see real problem pruning ephemeral keys from |
|
@dvv but if you have primary keys, you can group by the primary key instead, or anything that uniquely identity them. |
|
It forces to craft ad-hoc queries while the point was to fix (postgresql-bound only?) deficiency. Otherwise, you wrote "make it more convenient to the user and automatically include any order by in the select" and it sounds great. We can skip appending order-bys that come from primary schema, I believe. I drafted the patch and it passed tests AFAICS, but I can not state for sure I did it right. |
There is nothing artificial given it is coming from the database. Perhaps if you can argue to PostgreSQL folks that it is artificial, then they will remove the restriction from their side, and it will automatically work on Ecto too. But given the database is the one that is imposing it, we need to balance the trade-offs from the database point of view, and the current solution may cause existing apps to load unnecessary data from the database. :( Another option, if you want this to work transparently, is to use subqueries. If you use |
|
By artificial limitation I meant only that mentioned here: elixir-ecto/ecto#4589 (comment) |
|
Or, maybe a special kinda helper would fix:
|
|
Actually I am wrong sorry |
|
@dvv Is this a hypothetical query or is it something you ran into in real life? The reason I'm asking is because I'm struggling to identify a real use case for combining these:
Since you are selecting the main table and also de-duplicating by the entire row of the main table, I'm not sure what is gained by ordering by the join table. No matter which row you pick from the join table you will always return the same results. You will have something like this main_table_row1 join_table_row10 then you pick one of those and remove the join_table_rowX part. you always get main_table_row1 the only time you get a difference is if you do distinct on a subset of the main table or you include columns from the join table in the result |
Unfortunately the overfetching would still be there, you are just not doing it twice (which is indeed worse). That's why rewriting the query to a subquery or similar should be the best choice. Or, since you have a helper function that builds a query, you could also have a function that runs the query and maps the results, this way you could strip the additional columns added by distinct. I don't understand why the functionality has to be provided by Ecto, you should be able to build abstractions on top. In any case, it seems we are not going down this path, so we can close this PR. Thank you! I recommend exploring subqueries and rewriting the query for now. |
|
@greg-rychlewski master...dvv:ecto_sql:patch-1#diff-df64a1e98eb69a93a67e388be3fb937b476bc769705a2a682d3e8e1ecdc8a38cR603 SELECT DISTINCT s0."x", s1."id" FROM "schema" AS s0 INNER JOIN "schema3" AS s1 ON s1."id" = s0."y" ORDER BY s1."id"Indeed the order of So in your example's terms my situation is rather: Still thanks for the heads up |
|
That is being distinct on a subset of the row then, is it not? What's the reason for using distinct: true in this case? |
|
As I said I'm developing a helper, not trying to solve way particular problem. |
This might be way off topic, but I wonder if something like https://github.com/elixir-dbvisor/sql would be helpful in your situation, you can compose and use parameterized queries with your Ecto.Repo. Although there is currently no support for Ecto.Schema so you would have to tie that together yourself. But it gives you complete control over the query. |
An attempt to fix elixir-ecto/ecto#4589
Please consider applying