Recursive (Hierarchical) queries in PostgreSQL

After the Oracle with his ‘connet by prior all other DBMS introduce their implementation of hierarchical queries (FROM). I would like to tell the audience how it's done in PostgreSQL.

the

History

It all started like this

From: Evgen Potemkin <evgent@ns.terminal.ru>
Subject: Proposal of hierachical queries (a la Oracle)
Date: 2002-11-14 11:52:28 GMT
Hi there!
I want to propose the patch for adding the hierarchical queries
posibility. It allows to construct queries a la Oracle for ex:
SELECT a,b FROM t CONNECT BY a PRIOR b START WITH cond;B
I've seen this type of queries often made by adding a new type, which
stores position of row in the tree. But sorting such tree are very
tricky (i think).
Patch allows result tree to be sorted, i.e. subnodes of each node will
be sorted by ORDER BY clause.
with regards, evgen

then

From: Tatsuo Ishii <ishii@postgresql.org>
Subject: RFP: Recursive query in 8.4
Date: 2008-02-19 08:36:00 GMT (1 year, 12 weeks, 6 days ago)
Hi,
As I promised before we would like to propose implementing the
recursive query as defined in the standard for PostgreSQL 8.4.
The work is supported by Sumitomo Electric Information Systems Co.,
Ltd. (http://www.sei-info.co.jp/) and SRA OSS, Inc. Japan
(http://www.sraoss.co.jp).

Well, starting with the 8.4 version of Postgresql to support recursive query.
the

the Theory

OF in PostgreSQL is implemented on the basis of standard SQL WITH clause. Let's go a little further: not recursive, which allows you to adesivil WITH repeated subqueries to divide a complex query into a few smaller, is a convenient way to tell the shortcut to access the subquery itself posebe convenient in terms of saving time when writing code. as in the example below managed to avoid using a subquery in the WHERE clause through the use of time table top_regions formirovanii specifically for this request.

WITH regional_sales AS ( SELECT region SUM(amount) AS total_sales FROM orders GROUP BY region the ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) the ) SELECT region, the product SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter.

Adding optional operator RECURSEVE allows the query in Postgre apply to svoimu output. the algorithm of request must socoety of two parts. the first part is the basis, usually return a single row with a starting point of the hierarchy or part hierarchy. Ie place in the hierarchy where it will be marked ( for example root). and echoing the recursive part that will mess with the time table which we announced after WITH. combine the first and second part of the operator UNION or UNION ALL. what would this confused set of words to put in order privedem example.
We will create a table which will be described in structure one company
the

CREATE TABLE ULS
 
(
 
"ID" character varying(55)
 
"DESCRIPTION" character varying(255),
 
"PARENT" character varying(55
 
) ;

after entering the data:
the Select * from kpo

the the the the the the the the the the the the

ID	DESCRIPTION	PARENT a
== ======	================================	=======
KPO	KARACHAGANAK PETROLEUM OPERATING	{null}
AKSAY	AKSAY	KPO
URALSK	KPO	KPO
LONDON	LONDON	KPO
KPC	KPC	AKSAY
U2	UNIT 2	AKSAY
U3	UNIT 3	AKSAY
PROD	PRODACTION	KPC
MAINT	MAINTENANCE	AKSAY
CMMS	CMMS TEAM	MAINT

Now he recursive query:

WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH LEVEL ) AS ( FROM KPO T1 WHERE T1."PARENT" IS NULL union select T2."ID", T2."PARENT", T2."DESCRIPTION" CAST ( temp1.PATH ||'->'|| T2."ID" AS VARCHAR(50)) LEVEL + 1 FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") ) select * from temp1 ORDER BY PATH LIMIT 100 * This source code was highlighted with Source Code Highlighter.

The first part (lines 2-3) returns in the time table the first row in this case, the root record naesa structure, which will start the countdown in our hierarchy. the second part( lines 4-5) adds to the same time table entry associated with the already soderjaschiesya in temp1 string using JOIN (ID = PARENT), and so on until the end, until all the leaves of our ROOTa will not be in temp1.
Also in this example was symitiroval Rakolovka function sys_connect_by_path.

the the the the the the the the the the the

ID	"PARENT"	DESCRIPTION	path	level
KPO		KARACHAGANAK PETROLEUM OPERATING	KPO	1
AKSAY	KPO	AKSAY	KPO->AKSAY	2
KPC	AKSAY	KPC	KPO->AKSAY- > KPC	3
PROD	KPC	PRODAUCTION	KPO->AKSAY->KPC->PROD	4
MAINT	AKSAY	MAINTENANCE	KPO->AKSAY- > MAINT	3
CMMS	MAINT	CMMS TEAM	KPO->AKSAY- > MAINT- > CMMS	4
U2	AKSAY	UNIT 2	KPO->AKSAY->U2	3
U3	AKSAY	UNIT 3	KPO->AKSAY->U3	3
LONDON	KPO	LONDON	KPO->LONDON	2
URALSK	KPO	URALSK	KPO- > URALSK	2

In Postgre there are no built-in check for looping, so if the data we got from the guys who were involved in directly creating a structure in Excel, we need to check the structure integrity. sometimes it is enough to use UNION instead of UNION ALL but this is only sometimes. if you the first part asked the starting point in the hierarchy and, even if somewhere in the hierarchy there are breaks in the principle of kveri sustiv the above error will not be just the string "otschipentsy" will be ignored. But we also need to know where is the error, and this can be realized by implementing additional checks before performing the UNION.

WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH LEVEL cycle ) AS ( SELECT T1."ID",T1."PARENT", T1."DESCRIPTION" cast (array[T1."ID"] as varchar(100)[]) , 1 , FALSE FROM KPO T1 union all select T2."ID", T2."PARENT", T2."DESCRIPTION" cast(temp1.PATH || T2."ID" as varchar(100) []) ,LEVEL + 1 , the T2."ID" = any (temp1.PATH) FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") AND NOT CYCLE ) select * from temp1 WHERE CYCLE IS TRUE LIMIT 100; * This source code was highlighted with Source Code Highlighter.

As you can see here is created the same field Path but all previous parents are organized in an array, which gives us possible to compare us to each new ID to the duplicate well, if in the array there is already such entry then in the temporary table row is filled in with the flag in the next aisle we have not ispolzuyut the string to search for descendants, thanks to this avoided zatsiklivayas (union all... WHERE ... AND NOT CYCLE).

Tip: use the LIMIT operator as I did in the examples. this will help you keep your nerves.
the

Practical examples

The theory is right of course, but where all this can be put into practice, chambolle in light of the cost of such requests.
In addition that would be nice to draw the hierarchy in a single query, and find for example all the sheets, tarogato “ID” there are other problems.
You will often need to make changes, such as for example, change the phone code to all employees of a certain Department in the phone book. of course you can use a column in the employee table or even make the prefix to “ID”. but all this makes the database not flexible and not scalable. Much better to make a separate table Employees which will display a hierarchy with three columns “ID “, “PARENT “, “HIERARCHID”. the field “HIERARCHID” will make you more than one hierarchical structure. Table name for example ANCESTOR which will also consist of three columns “ID”, “ANCESTOR”, “HIERARCHID”. in the "ANCESTOR" will contain all the ancestors, remember the array "Path" of example 3. so to fill the table with the recursive query.

insert into ancestor ( “ID”, "ANCESTOR", "HIERARCHID") WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH LEVEL cycle ) AS ( SELECT T1."ID",T1."PARENT", T1."DESCRIPTION" cast (array[T1."ID"] as varchar(100)[]) , 1 , FALSE FROM KPO T1 union all select T2."ID", T2."PARENT", T2."DESCRIPTION" cast(temp1.PATH || T2."ID" as varchar(100) []) ,LEVEL + 1 , the T2."ID" = any (temp1.PATH) FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") AND NOT CYCLE the ) select ID AS LOCATION PATH[1] AS "ANCESTOR" , 'DEPT' AS "DID" from temp1 * This source code was highlighted with Source Code Highlighter.

got this sign

the the the the the the the the the the the the the the the the the the the the the the the the the the the the the

LOCATION	ANCESTOR	HIERARCHID
=========	=========	==========
AKSAY	AKSAY	DEPT
AKSAY	KPO	DEPT
CMMS	KPO	DEPT
CMMS	MAINT	DEPT
CMMS	CMMS	DEPT
CMMS	AKSAY	DEPT
KPC	AKSAY	DEPT
KPC	KPC	DEPT
KPC	KPO	DEPT
KPO	KPO	DEPT
LONDON	LONDON	DEPT
LONDON	KPO	DEPT
MAINT	AKSAY	DEPT
MAINT	MAINT	DEPT
MAINT	KPO	DEPT
PROD	KPO	DEPT
PROD	AKSAY	DEPT
PROD	KPC	DEPT
PROD	PROD	DEPT
U2	AKSAY	DEPT
U2	KPO	DEPT
U2	U2	DEPT
U3	U3	DEPT
U3	AKSAY	DEPT
U3	KPO	DEPT
URALSK	KPO	DEPT
URALSK	URALSK	DEPT

Now all of our selects and updates will be done using this table. for example, tamie phone numbers it will look like this:

Update EMPLOYEE SET “TELNUM” = ‘545-454’ FROM ANCESTOR WHERE “ANCESTOR”.”ANCESTOR” = ‘AKSAY’ AND EMPLOYEE.”LOCATION” = ANCESTOR.”LOCATION” AND "EMPLOYEE".ISDPRT = ‘N’ ;

Yes, it will be necessary predusmotreny Triger to update the table when introducing new or deleted records in a table KPO.

Will continue

Let's say that you have a table which recorded the logs, imagine that you need to count how many records for the previous month was done and grouped by day. for this you will need a reference calendar for last month, for example, that would not miss a day when records fiction. here is a calendar ( a list of days in one column) can you get this request.

WITH RECURSIVE t(n) AS (

VALUES (1)

UNION ALL

SELECT n+1 FROM t WHERE n < (SELECT cast (extract ('day' from date_trunc('month'current_date) - interval '24 hour') as integer)) the

)

SELECT to_date ( n || '-'||extract ('month' from date_trunc('month'current_date) - interval '24 hour') the

|| '-'||extract ('year' from date_trunc('month'current_date) - interval '24 hour') 'dd-mm-yyyy')

FROM t; * This source code was highlighted with Source Code Highlighter.

Another very useful example for a snack. The original idea of Graham Jobe (Graeme Job). the result is better to look in the text file.

WITH RECURSIVE z(ix, iy, cx, cy, x, y, i) AS ( SELECT ix, iy, x::float, y::float, x::float, y::float 0 FROM (select -1.88+0.016*i, i from generate_series(0,150) as i) as xgen(x,ix), the (select -1.11+0.060*i, i from generate_series(0,36) as i) as ygen(y,iy) UNION ALL SELECT ix, iy, cx, cy, x*x - y*y + cx, y*x*2 + cy, i+1 FROM z WHERE x * x + y * y < 16::float AND i < 27 the ) SELECT array_to_string(array_agg(substring(' .,,,-----++++%%%%@@@@#### ' the greatest(i,1), 1))") FROM ( SELECT ix, iy max(i) AS i FROM z GROUP BY iy, ix ORDER BY iy, ix the ) AS zt GROUP BY iy ORDER BY iy * This source code was highlighted with Source Code Highlighter.

well, here's a post to start ;-)

Article based on information from habrahabr.ru

Поиск по этому блогу

computer express