Recursive (Hierarchical) queries in PostgreSQL
the
History
It all started like this
From: Evgen Potemkin <evgent@ns.terminal.ru>
Subject: Proposal of hierachical queries (a la Oracle)
Date: 2002-11-14 11:52:28 GMT
Hi there!
I want to propose the patch for adding the hierarchical queries
posibility. It allows to construct queries a la Oracle for ex:
SELECT a,b FROM t CONNECT BY a PRIOR b START WITH cond;B
I've seen this type of queries often made by adding a new type, which
stores position of row in the tree. But sorting such tree are very
tricky (i think).
Patch allows result tree to be sorted, i.e. subnodes of each node will
be sorted by ORDER BY clause.
with regards, evgen
then
From: Tatsuo Ishii <ishii@postgresql.org>
Subject: RFP: Recursive query in 8.4
Date: 2008-02-19 08:36:00 GMT (1 year, 12 weeks, 6 days ago)
Hi,
As I promised before we would like to propose implementing the
recursive query as defined in the standard for PostgreSQL 8.4.
The work is supported by Sumitomo Electric Information Systems Co.,
Ltd. (http://www.sei-info.co.jp/) and SRA OSS, Inc. Japan
(http://www.sraoss.co.jp).
Well, starting with the 8.4 version of Postgresql to support recursive query.
the
the Theory
OF in PostgreSQL is implemented on the basis of standard SQL WITH clause. Let's go a little further: not recursive, which allows you to adesivil WITH repeated subqueries to divide a complex query into a few smaller, is a convenient way to tell the shortcut to access the subquery itself posebe convenient in terms of saving time when writing code. as in the example below managed to avoid using a subquery in the WHERE clause through the use of time table top_regions formirovanii specifically for this request.
WITH regional_sales AS (
SELECT region SUM(amount) AS total_sales
FROM orders
GROUP BY region
the
- ), top_regions AS (
SELECT region
FROM regional_sales
WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
the
- )
SELECT region,
the
- product
SUM(quantity) AS product_units,
SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;
* This source code was highlighted with Source Code Highlighter.
Adding optional operator RECURSEVE allows the query in Postgre apply to svoimu output. the algorithm of request must socoety of two parts. the first part is the basis, usually return a single row with a starting point of the hierarchy or part hierarchy. Ie place in the hierarchy where it will be marked ( for example root). and echoing the recursive part that will mess with the time table which we announced after WITH. combine the first and second part of the operator UNION or UNION ALL. what would this confused set of words to put in order privedem example.
We will create a table which will be described in structure one company
the
CREATE TABLE ULS
(
"ID" character varying(55)
"DESCRIPTION" character varying(255),
"PARENT" character varying(55
) ;
after entering the data:
the
Select * from kpo
ID | DESCRIPTION | PARENT a |
---|---|---|
== ====== | ================================ | ======= |
KPO | KARACHAGANAK PETROLEUM OPERATING | {null} |
AKSAY | AKSAY | KPO |
URALSK | KPO | KPO |
LONDON | LONDON | KPO |
KPC | KPC | AKSAY |
U2 | UNIT 2 | AKSAY |
U3 | UNIT 3 | AKSAY |
PROD | PRODACTION | KPC |
MAINT | MAINTENANCE | AKSAY |
CMMS | CMMS TEAM | MAINT |
Now he recursive query:
WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH LEVEL ) AS (
FROM KPO T1 WHERE T1."PARENT" IS NULL
union
select T2."ID", T2."PARENT", T2."DESCRIPTION" CAST ( temp1.PATH ||'->'|| T2."ID" AS VARCHAR(50)) LEVEL + 1
FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") )
select * from temp1 ORDER BY PATH LIMIT 100
* This source code was highlighted with Source Code Highlighter.
The first part (lines 2-3) returns in the time table the first row in this case, the root record naesa structure, which will start the countdown in our hierarchy. the second part( lines 4-5) adds to the same time table entry associated with the already soderjaschiesya in temp1 string using JOIN (ID = PARENT), and so on until the end, until all the leaves of our ROOTa will not be in temp1.
Also in this example was symitiroval Rakolovka function sys_connect_by_path.
ID | "PARENT" | DESCRIPTION | path | level |
---|---|---|---|---|
KPO | KARACHAGANAK PETROLEUM OPERATING | KPO | 1 | |
AKSAY | KPO | AKSAY | KPO->AKSAY | 2 |
KPC | AKSAY | KPC | KPO->AKSAY- > KPC | 3 |
PROD | KPC | PRODAUCTION | KPO->AKSAY->KPC->PROD | 4 |
MAINT | AKSAY | MAINTENANCE | KPO->AKSAY- > MAINT | 3 |
CMMS | MAINT | CMMS TEAM | KPO->AKSAY- > MAINT- > CMMS | 4 |
U2 | AKSAY | UNIT 2 | KPO->AKSAY->U2 | 3 |
U3 | AKSAY | UNIT 3 | KPO->AKSAY->U3 | 3 |
LONDON | KPO | LONDON | KPO->LONDON | 2 |
URALSK | KPO | URALSK | KPO- > URALSK | 2 |
In Postgre there are no built-in check for looping, so if the data we got from the guys who were involved in directly creating a structure in Excel, we need to check the structure integrity. sometimes it is enough to use UNION instead of UNION ALL but this is only sometimes. if you the first part asked the starting point in the hierarchy and, even if somewhere in the hierarchy there are breaks in the principle of kveri sustiv the above error will not be just the string "otschipentsy" will be ignored. But we also need to know where is the error, and this can be realized by implementing additional checks before performing the UNION.
WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH LEVEL cycle ) AS (
SELECT T1."ID",T1."PARENT", T1."DESCRIPTION" cast (array[T1."ID"] as varchar(100)[]) , 1 , FALSE
FROM KPO T1
union all
select T2."ID", T2."PARENT", T2."DESCRIPTION" cast(temp1.PATH || T2."ID" as varchar(100) []) ,LEVEL + 1 ,
the
- T2."ID" = any (temp1.PATH)
FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") AND NOT CYCLE )
select * from temp1 WHERE CYCLE IS TRUE LIMIT 100;
* This source code was highlighted with Source Code Highlighter.
As you can see here is created the same field Path but all previous parents are organized in an array, which gives us possible to compare us to each new ID to the duplicate well, if in the array there is already such entry then in the temporary table row is filled in with the flag in the next aisle we have not ispolzuyut the string to search for descendants, thanks to this avoided zatsiklivayas (union all... WHERE ... AND NOT CYCLE).
Tip: use the LIMIT operator as I did in the examples. this will help you keep your nerves.
the
Practical examples
The theory is right of course, but where all this can be put into practice, chambolle in light of the cost of such requests.
In addition that would be nice to draw the hierarchy in a single query, and find for example all the sheets, tarogato “ID” there are other problems.
You will often need to make changes, such as for example, change the phone code to all employees of a certain Department in the phone book. of course you can use a column in the employee table or even make the prefix to “ID”. but all this makes the database not flexible and not scalable. Much better to make a separate table Employees which will display a hierarchy with three columns “ID “, “PARENT “, “HIERARCHID”. the field “HIERARCHID” will make you more than one hierarchical structure. Table name for example ANCESTOR which will also consist of three columns “ID”, “ANCESTOR”, “HIERARCHID”. in the "ANCESTOR" will contain all the ancestors, remember the array "Path" of example 3. so to fill the table with the recursive query.
insert into ancestor ( “ID”, "ANCESTOR", "HIERARCHID")
WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH LEVEL cycle ) AS (
SELECT T1."ID",T1."PARENT", T1."DESCRIPTION" cast (array[T1."ID"] as varchar(100)[]) , 1 , FALSE
FROM KPO T1
union all
select T2."ID", T2."PARENT", T2."DESCRIPTION" cast(temp1.PATH || T2."ID" as varchar(100) []) ,LEVEL + 1 ,
the
- T2."ID" = any (temp1.PATH)
FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") AND NOT CYCLE
the
- )
select ID AS LOCATION PATH[1] AS "ANCESTOR" , 'DEPT' AS "DID" from temp1
* This source code was highlighted with Source Code Highlighter.
got this sign
LOCATION | ANCESTOR | HIERARCHID |
---|---|---|
========= | ========= | ========== |
AKSAY | AKSAY | DEPT |
AKSAY | KPO | DEPT |
CMMS | KPO | DEPT |
CMMS | MAINT | DEPT |
CMMS | CMMS | DEPT |
CMMS | AKSAY | DEPT |
KPC | AKSAY | DEPT |
KPC | KPC | DEPT |
KPC | KPO | DEPT |
KPO | KPO | DEPT |
LONDON | LONDON | DEPT |
LONDON | KPO | DEPT |
MAINT | AKSAY | DEPT |
MAINT | MAINT | DEPT |
MAINT | KPO | DEPT |
PROD | KPO | DEPT |
PROD | AKSAY | DEPT |
PROD | KPC | DEPT |
PROD | PROD | DEPT |
U2 | AKSAY | DEPT |
U2 | KPO | DEPT |
U2 | U2 | DEPT |
U3 | U3 | DEPT |
U3 | AKSAY | DEPT |
U3 | KPO | DEPT |
URALSK | KPO | DEPT |
URALSK | URALSK | DEPT |
Now all of our selects and updates will be done using this table. for example, tamie phone numbers it will look like this:
Update EMPLOYEE SET “TELNUM” = ‘545-454’ FROM ANCESTOR WHERE “ANCESTOR”.”ANCESTOR” = ‘AKSAY’ AND EMPLOYEE.”LOCATION” = ANCESTOR.”LOCATION” AND "EMPLOYEE".ISDPRT = ‘N’ ;
Yes, it will be necessary predusmotreny Triger to update the table when introducing new or deleted records in a table KPO.
Will continue
Let's say that you have a table which recorded the logs, imagine that you need to count how many records for the previous month was done and grouped by day. for this you will need a reference calendar for last month, for example, that would not miss a day when records fiction. here is a calendar ( a list of days in one column) can you get this request.
WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < (SELECT cast (extract ('day' from date_trunc('month'current_date) - interval '24 hour') as integer)) the
SELECT to_date ( n || '-'||extract ('month' from date_trunc('month'current_date) - interval '24 hour') the
FROM t;
* This source code was highlighted with Source Code Highlighter.
Another very useful example for a snack. The original idea of Graham Jobe (Graeme Job). the result is better to look in the text file.
WITH RECURSIVE z(ix, iy, cx, cy, x, y, i) AS (
SELECT ix, iy, x::float, y::float, x::float, y::float 0
FROM (select -1.88+0.016*i, i from generate_series(0,150) as i) as xgen(x,ix),
the
- (select -1.11+0.060*i, i from generate_series(0,36) as i) as ygen(y,iy)
UNION ALL
SELECT ix, iy, cx, cy, x*x - y*y + cx, y*x*2 + cy, i+1
FROM z
WHERE x * x + y * y < 16::float
AND i < 27
the
- )
SELECT array_to_string(array_agg(substring(' .,,,-----++++%%%%@@@@#### '
the
- greatest(i,1), 1))")
FROM (
SELECT ix, iy max(i) AS i
FROM z
GROUP BY iy, ix
ORDER BY iy, ix
the
- ) AS zt
GROUP BY iy
ORDER BY iy
* This source code was highlighted with Source Code Highlighter.
well, here's a post to start ;-)