Advanced Filtering in SQL

Using the LIKE and IN keywords can filter out nulls

We had an issue a while back with oddly-formatted values in a column designed to capture the duration of an agreement. Typically it was in the format “1-Month” or “2 Year”, but after we changed the business rules to allow for custom term lengths, we saw fractional months come through. So we had values like 1.5-Month Until we had it fixed upstream, the solution the data engineer and I adopted was to add a filter to remove any records that were in decimal format. In other words, we added a filter like so:

where agreement_terms__c not like '%.%'

But the unexpected impact of that was to remove all the records in the opportunity table where that field was null altogether: in particular, all the pipeline opportunities were getting filtered out.

This also applies to the in keyword. If you’re applying a filter that says not in ('A', 'B', 'C') but you want it to count blanks or null values, then you can’t write it that way. You have to say: field is null or not field = 'A' for example.

IN keyword with a tuple

Something I saw in our interview with our current data engineer is that certain dialects of SQL in certain contexts allow you to use the IN keyword with a tuple, like below. I don’t know if this is truly recommended though. I think it might be computationally intensive, and it doesn’t result in particularly clear code.

select product_id from dim_products
where (product_group, product_family) in (select distinct product_group, product_family from fact_reservations)

Notes mentioning this note


Here are all the notes in this garden, along with their links, visualized as a graph.