Postgres RDBMS/Trickery/Select without elimination

From Wikibooks, open books for an open world
Jump to: navigation, search

One common problem faced with SQL is using boolean conditionals to narrow down a set without any desire of eliminating all possible results. In this example you seek a best match, if one can be given or NULL. This can be easily and cleanly achieved relationally with a slight abuse of UNION ALL. There are many ways to achieve this procedurally such as fuzzy matching. [1]


Description[edit]

The basic idea is to SELECT * eliminating nothing, UNION ALL against a SELECT on that same set (different instance) such that there is a possibility of tuples being eliminated in the UNION'ed set. Then you are going to count() the occurrence of all tuples, each tuple will have an occurrence of at least one (the LHS of the first UNION ALL) this will result in no eliminations. This resulting set you will select the <count>max() of the count().

Synopsis[edit]

Source table[edit]

db=# SELECT * FROM foo;
 bar 
-----
   5
   4
   3
   6
(4 rows)

SQL Non-elimination conditional[edit]

SELECT COUNT(foo.bar), bar
FROM (
     SELECT *
     FROM foo
 
     UNION ALL
 
     SELECT *
     FROM foo
     WHERE bar = 4 -- Conditional that will not eliminate
)
AS foo
GROUP BY bar
;

Result[edit]

The key here, is if the conditional bar = 50 was used, you would still be left with the original dataset.

 count | bar 
-------+-----
     1 |   5
     2 |   4
     1 |   3
     1 |   6
(3 rows)

Example[edit]

A third party data provider has given you a set of data that you wish to select the best match from. You're equipped with an array of things that might help you identify a best match, but you want to always return one tuple, that is to say you don't ever want to eliminate all tuples returned. Your data that is supposed to only trim down the superfluous matches.

Let's use for instance a vin 1GCEC14C57Z590788:

  • They call a trim "Work Truck" you call the trim "1WT".

You don't want to eliminate all vehicles because of this. So here using a conditional trim LIKE '%1WT%', will not work, because 1WT is not contained in "Work Truck".

Now let's assume that other information provided would help you better identify what style your vehicle matches up against. As is the case of the "MSRP", if is within a range of 1.00 USD of 18,123 you can successfully eliminate all but one style.

Observe one way we could do this:

 

Footnotes[edit]

  1. ^ Fuzzy matching often uses a simple Levenshtein edit distance formula. Using plperl (Perl), you can easily accomplish this using String::Approx module's adist().