Recursive (Hierarchical) queries in PostgreSQL

After the Oracle with his ‘connet by prior all other DBMS introduce their implementation of hierarchical queries (FROM). I would like to tell the audience how it's done in PostgreSQL.

the

History


It all started like this
From: Evgen Potemkin <evgent@ns.terminal.ru>
Subject: Proposal of hierachical queries (a la Oracle)
Date: 2002-11-14 11:52:28 GMT
Hi there!
I want to propose the patch for adding the hierarchical queries
posibility. It allows to construct queries a la Oracle for ex:
SELECT a,b FROM t CONNECT BY a PRIOR b START WITH cond;B
I've seen this type of queries often made by adding a new type, which
stores position of row in the tree. But sorting such tree are very
tricky (i think).
Patch allows result tree to be sorted, i.e. subnodes of each node will
be sorted by ORDER BY clause.
with regards, evgen


then
From: Tatsuo Ishii <ishii@postgresql.org>
Subject: RFP: Recursive query in 8.4
Date: 2008-02-19 08:36:00 GMT (1 year, 12 weeks, 6 days ago)
Hi,
As I promised before we would like to propose implementing the
recursive query as defined in the standard for PostgreSQL 8.4.
The work is supported by Sumitomo Electric Information Systems Co.,
Ltd. (http://www.sei-info.co.jp/) and SRA OSS, Inc. Japan
(http://www.sraoss.co.jp).


Well, starting with the 8.4 version of Postgresql to support recursive query.
the

the Theory



OF in PostgreSQL is implemented on the basis of standard SQL WITH clause. Let's go a little further: not recursive, which allows you to adesivil WITH repeated subqueries to divide a complex query into a few smaller, is a convenient way to tell the shortcut to access the subquery itself posebe convenient in terms of saving time when writing code. as in the example below managed to avoid using a subquery in the WHERE clause through the use of time table top_regions formirovanii specifically for this request.

    WITH regional_sales AS (

    SELECT region SUM(amount) AS total_sales

    FROM orders

    GROUP BY region the

  1. ), top_regions AS (
  2. SELECT region

    FROM regional_sales

    WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) the

  3. )
  4. SELECT region, the

  5. product
  6. SUM(quantity) AS product_units,

    SUM(amount) AS product_sales

    FROM orders

    WHERE region IN (SELECT region FROM top_regions)

    GROUP BY region, product;

* This source code was highlighted with Source Code Highlighter.


Adding optional operator RECURSEVE allows the query in Postgre apply to svoimu output. the algorithm of request must socoety of two parts. the first part is the basis, usually return a single row with a starting point of the hierarchy or part hierarchy. Ie place in the hierarchy where it will be marked ( for example root). and echoing the recursive part that will mess with the time table which we announced after WITH. combine the first and second part of the operator UNION or UNION ALL. what would this confused set of words to put in order privedem example.
We will create a table which will be described in structure one company
the CREATE TABLE ULS
(
"ID" character varying(55)
"DESCRIPTION" character varying(255),
"PARENT" character varying(55
) ;


after entering the data:
the Select * from kpo



the the the the the the the the the the the the
ID DESCRIPTION PARENT a
== ====== ================================ =======
KPO KARACHAGANAK PETROLEUM OPERATING {null}
AKSAY AKSAY KPO
URALSK KPO KPO
LONDON LONDON KPO
KPC KPC AKSAY
U2 UNIT 2 AKSAY
U3 UNIT 3 AKSAY
PROD PRODACTION KPC
MAINT MAINTENANCE AKSAY
CMMS CMMS TEAM MAINT


Now he recursive query:

    WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH LEVEL ) AS (

    FROM KPO T1 WHERE T1."PARENT" IS NULL

    union

    select T2."ID", T2."PARENT", T2."DESCRIPTION" CAST ( temp1.PATH ||'->'|| T2."ID" AS VARCHAR(50)) LEVEL + 1

    FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") )

    select * from temp1 ORDER BY PATH LIMIT 100

* This source code was highlighted with Source Code Highlighter.


The first part (lines 2-3) returns in the time table the first row in this case, the root record naesa structure, which will start the countdown in our hierarchy. the second part( lines 4-5) adds to the same time table entry associated with the already soderjaschiesya in temp1 string using JOIN (ID = PARENT), and so on until the end, until all the leaves of our ROOTa will not be in temp1.
Also in this example was symitiroval Rakolovka function sys_connect_by_path.



the the the the the the the the the the the
ID "PARENT" DESCRIPTION path level
KPO KARACHAGANAK PETROLEUM OPERATING KPO 1
AKSAY KPO AKSAY KPO->AKSAY 2
KPC AKSAY KPC KPO->AKSAY- > KPC 3
PROD KPC PRODAUCTION KPO->AKSAY->KPC->PROD 4
MAINT AKSAY MAINTENANCE KPO->AKSAY- > MAINT 3
CMMS MAINT CMMS TEAM KPO->AKSAY- > MAINT- > CMMS 4
U2 AKSAY UNIT 2 KPO->AKSAY->U2 3
U3 AKSAY UNIT 3 KPO->AKSAY->U3 3
LONDON KPO LONDON KPO->LONDON 2
URALSK KPO URALSK KPO- > URALSK 2


In Postgre there are no built-in check for looping, so if the data we got from the guys who were involved in directly creating a structure in Excel, we need to check the structure integrity. sometimes it is enough to use UNION instead of UNION ALL but this is only sometimes. if you the first part asked the starting point in the hierarchy and, even if somewhere in the hierarchy there are breaks in the principle of kveri sustiv the above error will not be just the string "otschipentsy" will be ignored. But we also need to know where is the error, and this can be realized by implementing additional checks before performing the UNION.

    WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH LEVEL cycle ) AS (

    SELECT T1."ID",T1."PARENT", T1."DESCRIPTION" cast (array[T1."ID"] as varchar(100)[]) , 1 , FALSE

    FROM KPO T1

    union all

    select T2."ID", T2."PARENT", T2."DESCRIPTION" cast(temp1.PATH || T2."ID" as varchar(100) []) ,LEVEL + 1 , the

  1. T2."ID" = any (temp1.PATH)
  2. FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") AND NOT CYCLE )

    select * from temp1 WHERE CYCLE IS TRUE LIMIT 100;

* This source code was highlighted with Source Code Highlighter.


As you can see here is created the same field Path but all previous parents are organized in an array, which gives us possible to compare us to each new ID to the duplicate well, if in the array there is already such entry then in the temporary table row is filled in with the flag in the next aisle we have not ispolzuyut the string to search for descendants, thanks to this avoided zatsiklivayas (union all... WHERE ... AND NOT CYCLE).

Tip: use the LIMIT operator as I did in the examples. this will help you keep your nerves.
the

Practical examples



The theory is right of course, but where all this can be put into practice, chambolle in light of the cost of such requests.
In addition that would be nice to draw the hierarchy in a single query, and find for example all the sheets, tarogato “ID” there are other problems.
You will often need to make changes, such as for example, change the phone code to all employees of a certain Department in the phone book. of course you can use a column in the employee table or even make the prefix to “ID”. but all this makes the database not flexible and not scalable. Much better to make a separate table Employees which will display a hierarchy with three columns “ID “, “PARENT “, “HIERARCHID”. the field “HIERARCHID” will make you more than one hierarchical structure. Table name for example ANCESTOR which will also consist of three columns “ID”, “ANCESTOR”, “HIERARCHID”. in the "ANCESTOR" will contain all the ancestors, remember the array "Path" of example 3. so to fill the table with the recursive query.

    insert into ancestor ( “ID”, "ANCESTOR", "HIERARCHID")

    WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH LEVEL cycle ) AS (

    SELECT T1."ID",T1."PARENT", T1."DESCRIPTION" cast (array[T1."ID"] as varchar(100)[]) , 1 , FALSE

    FROM KPO T1

    union all

    select T2."ID", T2."PARENT", T2."DESCRIPTION" cast(temp1.PATH || T2."ID" as varchar(100) []) ,LEVEL + 1 , the

  1. T2."ID" = any (temp1.PATH)
  2. FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") AND NOT CYCLE the

  3. )
  4. select ID AS LOCATION PATH[1] AS "ANCESTOR" , 'DEPT' AS "DID" from temp1

* This source code was highlighted with Source Code Highlighter.


got this sign



the the the the the the the the the the the the the the the the the the the the the the the the the the the the the
LOCATION ANCESTOR HIERARCHID
========= ========= ==========
AKSAY AKSAY DEPT
AKSAY KPO DEPT
CMMS KPO DEPT
CMMS MAINT DEPT
CMMS CMMS DEPT
CMMS AKSAY DEPT
KPC AKSAY DEPT
KPC KPC DEPT
KPC KPO DEPT
KPO KPO DEPT
LONDON LONDON DEPT
LONDON KPO DEPT
MAINT AKSAY DEPT
MAINT MAINT DEPT
MAINT KPO DEPT
PROD KPO DEPT
PROD AKSAY DEPT
PROD KPC DEPT
PROD PROD DEPT
U2 AKSAY DEPT
U2 KPO DEPT
U2 U2 DEPT
U3 U3 DEPT
U3 AKSAY DEPT
U3 KPO DEPT
URALSK KPO DEPT
URALSK URALSK DEPT


Now all of our selects and updates will be done using this table. for example, tamie phone numbers it will look like this:

Update EMPLOYEE SET “TELNUM” = ‘545-454’ FROM ANCESTOR WHERE “ANCESTOR”.”ANCESTOR” = ‘AKSAY’ AND EMPLOYEE.”LOCATION” = ANCESTOR.”LOCATION” AND "EMPLOYEE".ISDPRT = ‘N’ ;

Yes, it will be necessary predusmotreny Triger to update the table when introducing new or deleted records in a table KPO.

Will continue

Let's say that you have a table which recorded the logs, imagine that you need to count how many records for the previous month was done and grouped by day. for this you will need a reference calendar for last month, for example, that would not miss a day when records fiction. here is a calendar ( a list of days in one column) can you get this request.

WITH RECURSIVE t(n) AS (

VALUES (1)

UNION ALL

SELECT n+1 FROM t WHERE n < (SELECT cast (extract ('day' from date_trunc('month'current_date) - interval '24 hour') as integer)) the

  • )
  • SELECT to_date ( n || '-'||extract ('month' from date_trunc('month'current_date) - interval '24 hour') the

  • || '-'||extract ('year' from date_trunc('month'current_date) - interval '24 hour') 'dd-mm-yyyy')
  • FROM t; * This source code was highlighted with Source Code Highlighter.

    Another very useful example for a snack. The original idea of Graham Jobe (Graeme Job). the result is better to look in the text file.

      WITH RECURSIVE z(ix, iy, cx, cy, x, y, i) AS (

      SELECT ix, iy, x::float, y::float, x::float, y::float 0

      FROM (select -1.88+0.016*i, i from generate_series(0,150) as i) as xgen(x,ix), the

    1. (select -1.11+0.060*i, i from generate_series(0,36) as i) as ygen(y,iy)
    2. UNION ALL

      SELECT ix, iy, cx, cy, x*x - y*y + cx, y*x*2 + cy, i+1

      FROM z

      WHERE x * x + y * y < 16::float

      AND i < 27 the

    3. )
    4. SELECT array_to_string(array_agg(substring(' .,,,-----++++%%%%@@@@#### ' the

    5. greatest(i,1), 1))")
    6. FROM (

      SELECT ix, iy max(i) AS i

      FROM z

      GROUP BY iy, ix

      ORDER BY iy, ix the

    7. ) AS zt
    8. GROUP BY iy

      ORDER BY iy

    * This source code was highlighted with Source Code Highlighter.


    well, here's a post to start ;-)
    Article based on information from habrahabr.ru

    Популярные сообщения из этого блога

    Approval of WSUS updates: import, export, copy

    The Hilbert curve vs. Z-order

    Kaspersky Security Center — the fight for automation