Pages

Monday, January 16, 2012

Oracle 11g denormalizing hierarchy with recursive subquery factoring and LISTAGG function

In Oracle 11g there are many new features. From the developer's point of view there are a bit less such features, but still there are some very impressive. Among these features there are recursive subquery factoring and LISTAGG grouping function. Here is how they can be combined together:

WITH plain_hierarchy AS ( 
  SELECT 1 id, NULL pid, 'Level 0' value FROM dual UNION ALL
  SELECT 2 id, 1 pid, 'Level 1' value FROM dual UNION ALL
  SELECT 3 id, 2 pid, 'Level 2' value FROM dual UNION ALL
  SELECT 4 id, 3 pid, 'Level 3' value FROM dual UNION ALL
  SELECT 5 id, 4 pid, 'Level 4' value FROM dual UNION ALL
  SELECT 6 id, 5 pid, 'Level 5' value FROM dual UNION ALL
  SELECT 11 id, 1 pid, 'FooBar 1' value FROM dual UNION ALL
  SELECT 12 id, 11 pid, 'FooBar 2' value FROM dual UNION ALL
  SELECT 13 id, 12 pid, 'FooBar 3' value FROM dual UNION ALL
  SELECT 21 id, 1 pid, 'BarFoo 1' value FROM dual
  ),
  combined_hierarchy (initial_id, id, pid, value, reportLevel) AS (
  SELECT id AS initial_id, id, pid, value, 0 AS reportLevel
  FROM plain_hierarchy
  UNION ALL
  SELECT comb.initial_id, plain.id, plain.pid, plain.value,  comb.reportLevel + 1
  FROM combined_hierarchy comb
  JOIN plain_hierarchy plain
  ON comb.pid = plain.id
  )
SELECT initial_id,
  LISTAGG(value,' -> ') within GROUP(ORDER BY reportlevel DESC) combined_value
FROM combined_hierarchy
GROUP BY initial_id
Here the first subquery in the "with" is just test data. The second is the recursive one and where all the interesting things happen. As far as I undestand sql, this was not possible in Oracle 10g version of sql (see UPDATE), even if we ignore absence of LISTAGG function. However if we need not all full paths in the hierarchy, but only for some exact node in it, there is more productive solutuion that simple adding "where initial_id=" as the but-last string. (Same testing data)

SELECT initial_id,  LISTAGG(value,' -> ') within GROUP(ORDER BY inner_level) combined_value
FROM
  (SELECT 6 AS initial_id, level inner_level, combined_hierarchy.*
  FROM
    ( SELECT id, pid, value FROM plain_hierarchy
    ) combined_hierarchy
    START WITH id       =6
    CONNECT BY prior pid=id
  )
GROUP BY initial_id
The cost of the first query for only one denormalized path is 92, while the cost of the second is 78. On the real data and real tables the difference is much bigger.

UPDATE:
Ignoring the absence of LISTAGG grouping function in the Oracle 10g we may write sql that will produce the same result for all items in the hieararchy:


WITH plain_hierarchy AS ( 
  SELECT 1 id, NULL pid, 'Level 0' value FROM dual UNION ALL
  SELECT 2 id, 1 pid, 'Level 1' value FROM dual UNION ALL
  SELECT 3 id, 2 pid, 'Level 2' value FROM dual UNION ALL
  SELECT 4 id, 3 pid, 'Level 3' value FROM dual UNION ALL
  SELECT 5 id, 4 pid, 'Level 4' value FROM dual UNION ALL
  SELECT 6 id, 5 pid, 'Level 5' value FROM dual UNION ALL
  SELECT 11 id, 1 pid, 'FooBar 1' value FROM dual UNION ALL
  SELECT 12 id, 11 pid, 'FooBar 2' value FROM dual UNION ALL
  SELECT 13 id, 12 pid, 'FooBar 3' value FROM dual UNION ALL
  SELECT 21 id, 1 pid, 'BarFoo 1' value FROM dual
  ),
combined_hierarchy AS (
  select CONNECT_BY_ROOT id initial_id, level inner_level, plain_hierarchy.* 
  from plain_hierarchy
  connect by prior pid=id
  )
SELECT initial_id,
  LISTAGG(value,' -> ') within GROUP(ORDER BY inner_level DESC) combined_value
FROM combined_hierarchy
GROUP BY initial_id;

The thing that does the right job is CONNECT_BY_ROOT function (or expression?) that just picks the value from the root of recursive query written with "connect by prior". It seems that this query in Oracle 11g (where it only can work due to LISTAGG) is much faster than the variant with recursive subquery factoring.
Update 2:
Same (or almost same) results with different techniques. First is compatible with Oracle 10g.

--Without both recursive subquery factoring and LISTAGG (one of variants - not very elegant)
WITH plain_hierarchy AS ( 
  SELECT 1 id, NULL pid, 'Level 0' value FROM dual UNION ALL
  SELECT 2 id, 1 pid, 'Level 1' value FROM dual UNION ALL
  SELECT 3 id, 2 pid, 'Level 2' value FROM dual UNION ALL
  SELECT 4 id, 3 pid, 'Level 3' value FROM dual UNION ALL
  SELECT 5 id, 4 pid, 'Level 4' value FROM dual UNION ALL
  SELECT 6 id, 5 pid, 'Level 5' value FROM dual UNION ALL
  SELECT 11 id, 1 pid, 'FooBar 1' value FROM dual UNION ALL
  SELECT 12 id, 11 pid, 'FooBar 2' value FROM dual UNION ALL
  SELECT 13 id, 12 pid, 'FooBar 3' value FROM dual UNION ALL
  SELECT 21 id, 1 pid, 'BarFoo 1' value FROM dual
  ),
combined_hierarchy AS (
  select CONNECT_BY_ROOT id initial_id,SYS_CONNECT_BY_PATH(reverse(value),' >- ') as combined_value2, level inner_level, plain_hierarchy.* 
  from plain_hierarchy
  connect by prior pid=id
  )
SELECT ch1.initial_id,reverse(ch1.combined_value2),ch1.inner_level
FROM combined_hierarchy ch1
join (
select initial_id, max(inner_level) as inner_level
from combined_hierarchy
group by initial_id
) ch2
on ch1.initial_id=ch2.initial_id
and ch1.inner_level=ch2.inner_level;

--Without LISTAGG function
WITH plain_hierarchy AS ( 
  SELECT 1 id, NULL pid, 'Level 0' value FROM dual UNION ALL
  SELECT 2 id, 1 pid, 'Level 1' value FROM dual UNION ALL
  SELECT 3 id, 2 pid, 'Level 2' value FROM dual UNION ALL
  SELECT 4 id, 3 pid, 'Level 3' value FROM dual UNION ALL
  SELECT 5 id, 4 pid, 'Level 4' value FROM dual UNION ALL
  SELECT 6 id, 5 pid, 'Level 5' value FROM dual UNION ALL
  SELECT 11 id, 1 pid, 'FooBar 1' value FROM dual UNION ALL
  SELECT 12 id, 11 pid, 'FooBar 2' value FROM dual UNION ALL
  SELECT 13 id, 12 pid, 'FooBar 3' value FROM dual UNION ALL
  SELECT 21 id, 1 pid, 'BarFoo 1' value FROM dual
  ),
  combined_hierarchy (initial_id, id, pid, value, reportLevel,combined_value) AS (
  SELECT id AS initial_id, id, pid, value, 0 AS reportLevel,CAST(value AS VARCHAR2(2000)) as combined_value
  FROM plain_hierarchy
  UNION ALL
  SELECT comb.initial_id, plain.id, plain.pid, plain.value,  comb.reportLevel + 1,
  CAST(plain.value|| ' -> '||comb.combined_value AS VARCHAR2(2000))
  FROM combined_hierarchy comb
  JOIN plain_hierarchy plain
  ON comb.pid = plain.id
  )
SELECT ch0.initial_id, ch0.combined_value
FROM combined_hierarchy ch0
join 
(select initial_id, max(reportLevel) as reportLevel
from combined_hierarchy GROUP BY initial_id
) ch1
on ch0.initial_id=ch1.initial_id 
and ch0.reportLevel=ch1.reportLevel;

No comments:

Post a Comment