Skip to content

Chapter 09: Try-it-yourself Question 3 - Add note to answer regarding duplicates #38

@TassiloPitrasch

Description

@TassiloPitrasch

Not really an issue for debugging, just some addition to the answer of the third question in the Try-It-Yourself challenges of chapter 09:
I use a shorter SELECT just returning the fscskeys of the libraries:

SELECT pls18.fscskey AS pls_key_18, pls17.fscskey AS pls_key_17, pls16.fscskey AS pls_key_16
FROM pls_fy2018_libraries pls18
       FULL OUTER JOIN pls_fy2017_libraries pls17 ON pls18.fscskey = pls17.fscskey
       FULL OUTER JOIN pls_fy2016_libraries pls16 ON pls18.fscskey = pls16.fscskey
WHERE pls16.fscskey IS NULL OR pls17.fscskey IS NULL;

This query returns "duplicates" as rows of the tables pls16 and pls17 sharing the same fscskey are not joined if the key is not present in pls18.
Example: The fscskey AK0053 is missing in the table pls_fy2018_libraries, but is available in the other two; so the query returns i.a. these two rows:

pls_key_18 | pls_key_17 | pls_key_16
null | AK0053 | null
null | null | AK0053

If we wanted to count the number of libraries only available in one or two surveys, this approach would return a wrong number.

My thoughts: After all rows of pls18 are examined, the rows of the other two tables are appended without further examination for possible joins, so null is returned for the pls_key_16 column of rows from pls17 and vice versa. This is also the reason why we don't need to filter for pls18.fscskey IS NULL: If pls_key_18 is null, one of the other two columns must be null also.

Sorry for the extensive post, I just couldn't wrap my head around the fact that we don't need to filter for pls18.fscskey IS NULL. Maybe you're able to condense my explanations and - if you see fit - add them to the respective file.

Metadata

Metadata

Assignees

Labels

help wantedExtra attention is needed

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions