-
Notifications
You must be signed in to change notification settings - Fork 556
Description
Not really an issue for debugging, just some addition to the answer of the third question in the Try-It-Yourself challenges of chapter 09:
I use a shorter SELECT just returning the fscskeys of the libraries:
SELECT pls18.fscskey AS pls_key_18, pls17.fscskey AS pls_key_17, pls16.fscskey AS pls_key_16
FROM pls_fy2018_libraries pls18
FULL OUTER JOIN pls_fy2017_libraries pls17 ON pls18.fscskey = pls17.fscskey
FULL OUTER JOIN pls_fy2016_libraries pls16 ON pls18.fscskey = pls16.fscskey
WHERE pls16.fscskey IS NULL OR pls17.fscskey IS NULL;
This query returns "duplicates" as rows of the tables pls16 and pls17 sharing the same fscskey are not joined if the key is not present in pls18.
Example: The fscskey AK0053 is missing in the table pls_fy2018_libraries, but is available in the other two; so the query returns i.a. these two rows:
pls_key_18 | pls_key_17 | pls_key_16
null | AK0053 | null
null | null | AK0053
If we wanted to count the number of libraries only available in one or two surveys, this approach would return a wrong number.
My thoughts: After all rows of pls18 are examined, the rows of the other two tables are appended without further examination for possible joins, so null is returned for the pls_key_16 column of rows from pls17 and vice versa. This is also the reason why we don't need to filter for pls18.fscskey IS NULL: If pls_key_18 is null, one of the other two columns must be null also.
Sorry for the extensive post, I just couldn't wrap my head around the fact that we don't need to filter for pls18.fscskey IS NULL. Maybe you're able to condense my explanations and - if you see fit - add them to the respective file.