"Be the Query Ninja" - A sub-selecting technique for selecting records from one table that are not in another
| ||OK. That is a pretty boring title but this information could be very useful to you. |
In last month's issue we introduced the concept of using Sub-selects. If you did not read that article you might find it to be helpful background.
To take the concept of using sub-selects further, we will introduce a technique to select records (or more technically speaking, records with particular values) from one table that are not in a second table. This will require the use of the In() keyword which was also introduced in a previous article.
Example one: Determining all of the regions without a customer in them.
Let's say we have a table of customers and a table of regions. In this example we will use counties for our regions. Our goal is to determine which counties do not have any customers in them. Of course this could be done outside of the SQL capabilities (visually might be one way) and this could be done with other techniques as well. In many cases these might require multiple steps. Using a sub-select we can get a list of the counties without customers in one query.
The map above may help to better explain this example. Which counties have no customers?
Here is the SQL Select dialog box filled in to run the query.
In somewhat "plain language" the above query is doing this; "select all of the counties from the UK_county table where the name of the county never appears in the county field from the list of customers". OK, I did say "somewhat plain language" - I hope that is better than SQL.
And if you are curious about the results, see below! There are twelve counties without any customers in them.
The map above helps to visualise what we are trying to achieve, but ...
...note that this result was achieved by comparing values in the underlying data. It would be possible to have customers map objects appear in a county but the data column is for a different county (maybe as a result of a data or a geocoding error). This example used the County field in the customer table. In future articles we will explore how map objects can be used in sub-selecting.
The following are more examples of this sub-selecting technique. These are written out as they would appear in the MapBasic window.
Comparing two tables of customers/citizens - selecting customers from one table that do not appear in the second
Let's say you have two sets of customer (or citizen) data. Perhaps one set comes in from e-mail based inquiries. We will call this table Email_cust. The second set of data comes from web based inquiries. We will call this table Web_cust. Let's also say that your customers or citizens have a unique identifying number so you can tell when the same customer has used more than one service. This field is called CUST_ID in both tables.
Our goal is to find out all of the customers who have used the Email service who do not use the web based service. As such we want to select all of the customers from the table of e-mail queries that are not also found in the table of customers using the web based service.
Select * from Email_cust where CUST_ID not in( Select CUST_ID from
Web_cust ) into Not_Using_Web
In the above query Not_Using_Web is the temporary table name.
Another useful technique - (sometimes you can) invert a selection
It is possible to use this sub-selecting technique to "select what is not selected". This is the SQL equivalent of the Invert Selection button (from the Main toolbar). You may wonder why one would want to do this in an SQL instead of just using the button - the idea here is to have something automated and easily repeatable in a workspace.
Here is an example query where the selection is inverted.
Select * from Customer_table where CUST_ID not in ( Select CUST_ID from Selection ) into Selection
The reason that the title of this example is "(Sometimes you can) invert a selection" is because the column that is used must contain unique values for this technique to be reliable. If the column being used does not contain unique values then it is possible for some records to be excluded that otherwise should not be. If the a value appears more than once in a column then it is possible for it to be in the set of selected records and in the set of unselected records. Inverting the selection would exclude the matching record from the set of unselected records.
An example of how this technique might be used is where a selection of customers within a circle is made. If it is useful to analyse the customers outside of the circle then this query technique can be used. In a previous article we covered how to select within a circle.
Making this efficient
I would like to wind up this article with a contribution from the community. Fawaz Bathiche, a very experienced MapInfo Professional and MapBasic user wrote a short post in our in our LinkedIn Group. This was in response to the article on using the In() operator that appeared in the February issue.
When we are using this technique it can be inefficient as the sub-select could be pulling many duplicate values. The first example in this article is a case in point, you can have many customers in a county. Fawaz also pointed out that if you have too many values when are using the In() keyword you may not be able to fit them all in.
In those situations where you have a set list of values, it may be worth the investment to create a special table of these to use in your sub-selects.
Here are Fawaz's comments on this technique.
If I may add, using the IN or = ANY is ok if there aren't too many values in the list as there is a limit on how many characters the SQL window can hold.
In case of too many values, then I would create a temp table(ListOfValues) which contains ONE column (LOV) and populated with the desired values and save. You can then use one the following queries to get the required records from the main table:
Select * from MainTable
where MyColum = ANY (select LOV from ListOfValues)
Select * from MainTable
where MyColum IN (select LOV from ListOfValues)
Select * from MainTable , ListOfValues
where MainTable.MyColum = ListOfValues.LOV
Thanks to Fawaz for sharing this. This does introduce the =Any() keyword which we've not covered yet. For the most part this is similar to the In() keyword.
Here again is a link to the MapInfo Professional Journal Group. Feel free to comment on the articles, ask questions or suggest topics.
Coming up next...
There is more to explore with sub-selecting. In the next article we will explore using sub-selecting to automate selecting all of the objects within a region. Think of this as a way to automate using the Boundary Select tool from the Main Toolbar.
Article by Tom Probert, Editor of "The MapInfo Professional" journal
When not writing articles for "The MapInfo Professional", Tom enjoys talking to MapInfo Professional users at conferences and events. When not working he likes to see movies with car chases, explosions and kung-fu fighting.