Solved: Default joins work differently in KQL vs. SQL

frithjof_v · ‎11-16-2024

When doing a default join in KQL (without specifying join kind), it seems that the left or right position of the tables affect the number of rows returned by the query.

This is not the case with the default joins in SQL.

KQL version

1. Below is one of the tables involved in the join:

2. Below is the other table involved in the join:

3. When the first table above is on the left side of the join, and the second table above is on the right side:

In this case (3.), why does it only return three rows? In SQL, such a default join would return all the matching rows (similar to 4. below)

4. When the tables switch sides (the second table above is now on the left side, and the first table above is now on the right side), it returns all the rows as is familiar from SQL:

What is the reason for the behaviour in step 3?

This means that the default join type in KQL give different results (behave differently) than the default join type in SQL (see below).

SQL version

1.

2.

3. Here we get all the matching rows, instead of just 3 rows (we only got 3 rows with the similar query in KQL)

4.

frithjof_v · ‎11-16-2024

It works similar like SQL's default join type (which is inner join) if I explicitly add kind=inner to the join in KQL.

let varStockMarketLast10 = 
StockMarketRaw
| project ['time'], symbol, bidPrice, bidSize, askPrice, askSize, volume
| order by ['time']
| take 10;

varStockMarketLast10;

let varDimStockMarket = 
StockMarketRaw
| distinct symbol, sector, securityType;

varDimStockMarket;

varStockMarketLast10
| join kind=inner varDimStockMarket on symbol;

varDimStockMarket
| join kind=inner varStockMarketLast10 on symbol;

So the default join kind in KQL is not an inner join, whereas in SQL, the default join kind is inner join.

The default join kind in KQL is:

innerunique (default)

Inner join with left side deduplication
Schema: All columns from both tables, including the matching keys
Rows: All deduplicated rows from the left table that match rows from the right table

https://learn.microsoft.com/en-us/training/modules/multi-table-queries-with-kusto-query-language/2-m...

To be honest, I find the results provided by KQL's default join kind confusing. What is the practical use case for those results? (Ref. step 3 in the original post).

What is the benefit of KQL's default join kind (innerunique)?

I think I will always need to specify join kind in KQL. (Or, be careful about which table is on the right side or left side of the join, when using the default join.)

In SQL, by contrast, it can many times make sense to use the default join kind (inner join), and the left/right position won't affect the number of rows returned.
Still, it is always a good practice to specify join kind explicitly also in SQL, for readability.

View solution in original post

v-yajiewan-msft · ‎11-17-2024

Hi @frithjof_v,

I agree with your insights.

Quote from Microsoft:

By default, the innerunique join flavor is used if the kind parameter isn't specified.

This default implementation is useful in log/trace analysis scenarios, where you aim to correlate two events based on a shared correlation ID.

It allows you to retrieve all instances of the phenomenon while disregarding duplicate trace records that contribute to the correlation.

More information for your reference:

innerunique join - Kusto | Microsoft Learn

Best regards,

Joyce

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

frithjof_v · ‎11-16-2024

It works similar like SQL's default join type (which is inner join) if I explicitly add kind=inner to the join in KQL.

let varStockMarketLast10 = 
StockMarketRaw
| project ['time'], symbol, bidPrice, bidSize, askPrice, askSize, volume
| order by ['time']
| take 10;

varStockMarketLast10;

let varDimStockMarket = 
StockMarketRaw
| distinct symbol, sector, securityType;

varDimStockMarket;

varStockMarketLast10
| join kind=inner varDimStockMarket on symbol;

varDimStockMarket
| join kind=inner varStockMarketLast10 on symbol;

So the default join kind in KQL is not an inner join, whereas in SQL, the default join kind is inner join.

The default join kind in KQL is:

innerunique (default)

Inner join with left side deduplication
Schema: All columns from both tables, including the matching keys
Rows: All deduplicated rows from the left table that match rows from the right table

https://learn.microsoft.com/en-us/training/modules/multi-table-queries-with-kusto-query-language/2-m...

To be honest, I find the results provided by KQL's default join kind confusing. What is the practical use case for those results? (Ref. step 3 in the original post).

What is the benefit of KQL's default join kind (innerunique)?

I think I will always need to specify join kind in KQL. (Or, be careful about which table is on the right side or left side of the join, when using the default join.)

In SQL, by contrast, it can many times make sense to use the default join kind (inner join), and the left/right position won't affect the number of rows returned.
Still, it is always a good practice to specify join kind explicitly also in SQL, for readability.

frithjof_v · ‎11-16-2024

After further reading, it seems that the reason for this different behaviour is performance optimization in KQL's default join type (inneruniqe).

Bottom line: it's important to know that the default join type in KQL is different than the default join type in SQL.

When using the default join in KQL, it seems to make sense to have the Dimension table on the left side, and the Fact table on the right side.

This returns all the matching rows (assuming a 1:many relationship between dim:fact).

It seems to generally be a good practice in KQL to have the Dim table on the left side of the join, and the fact table on the right side of the join.

Pseudo-code:

dimTable
| join factTable on keyColumn;

I am new to KQL, and curious what others think.

So - what do you think?
Does this align with your understanding?

v-yajiewan-msft · ‎11-17-2024