PostgreSQL 8.3

Josh Berkus (Josh Berkus) announced the release of PostgreSQL 8.3beta1 (see official Changelog). More than six months it took developers to complete work on the handling of patches (we will remind, feature freeze on 1 April 2007). It's time to tell what will please us this year the most developed from open DBMS in the world.

I break the whole list into four parts. First, for many the most important part I will list the changes that touches on the performance. Secondly — here is the list of new features for database programmers, designed to enhance further the already strong set of "features" of PostgreSQL. The third part discusses the innovations intended for database administrators. And finally, at the end I will mention some Open Source projects that are projects-companions of Postgres (in other words, have their own development cycle).

the

Performance


Let's start with the fact that today (currently stable branch is 8.2, the latest version is 8.2.5) PostgreSQL successfully compete in terms of performance not only with OpenSource alternatives, but also with leading commercial DBMSs. Such as Oracle. This is not an empty phrase — look at the results test spent in the Sun. Slow elephants are no more! A rich set of types of the indexes, the broadest possibilities of tuning of the system, work with very large volumes and loads, a good choice for replication and scaling — all it "on teeth" of modern elephants. Even the speed of development distinguishes Postgres compared to other DBMS: every year we always get a significant step forward.

What's new in PostgreSQL version 8.3 in terms of performance? Many changes are trivial. According to the coordinator of the development of PostgreSQL Bruce Momjian (Bruce Momjian), who inflicted not so long ago at the invitation of the company "Postgresmain" visit Moscow, the work on the optimization of the performance of the system, which employs developers of Postgres in recent years, is extremely complex. Each step requires more and more substantial work takes more time and effort of developers.

One of these is really non-trivial changes can safely be considered a "feature" called HOT (Heap Only Tuples). This is probably one of the most significant changes in terms of performance. To understand what is this change, you need to remember that PostgreSQL implements the so-called multiversion model of differentiation of access (MVCC Multi-Version Concurrency Control).



HOT essence in the following. Earlier, before implementing this approach, when a row is updated in the table, each new version has led to the emergence of new versions of all indexes, regardless affected whether these changes indexed columns or not (see Fig. "Update without HOT"). Now, if a new version of the row falls in the same memory page as the previous one, and the columns on which the index was created, has not changed, the index remains the same. But that's not all. If possible, the "instant reuse" places in the page Heap. That, of course, reduces the amount of work produced during the operation VACUUM. In Fig. "HOT-update" is schematically shown, how is now update the row.

The following new products have the taste, first of all, a large number of web developers. Starting with version 8.3 any transaction in PostgreSQL you can do asynchronous.

This means that when you execute transaction commit (COMMIT) the PostgreSQL server will not wait for an expensive operation synchronization transaction log (WAL fsync). In other words, the transaction will be considered successfully completed once all the logical conditions are met (checked all the necessary constraints). Physically, the entry in the transaction log will occur in a very small period of time (typically for a normally functioning system is the maximum 200-1000 MS).For the status of the transaction (synchronous/asynchronous) responds to the environment variable synchronous_commit. To switch to asynchronous mode just:
It should be noted that asynchronous transactions are not an alternative to server mode with disabled fsync operations. The fact that the mode fsync=off may result in an inconsistent database state (for example, in the event of unforeseen equipment failure or power loss) and is only recommended in cases where used equipment of high reliability (for example, a disk controller with a battery). The new opportunities can not result in misalignment of the data. The most that is possible, is the loss of small amounts of data (again, in the case of a hard server failure — error OS, hardware, a power failure). A typical example for asynchronous transactions can serve the task of maintaining large amounts of information in the table-a log (e.g. log of user actions) when the loss of a few rows is not critical. All important transactions can still be synchronous.

Another improvement in terms of performance refers to situations when PostgreSQL query sequentially through the table (the operation of SeqScan). If to version 8.3 in such cases, quite often there is a situation when different processes of Postgres at the same time doing the same job is viewed the same table — but now, thanks to the implementation of Synchronized Scans ("sinhronizovani views"), at the same time for one table may not be conducted more than one transaction-safe. This is achieved in the following way. If any session requires a SeqScan-and for some table that is already in progress SeqScan (for another session), there will be "jump on-the-go" to the results of the already running SeqScan. At the end of this process, if necessary, will be carried out "transoms" of the results using another partial SeqScan-a (see figure).

Working on reducing the stress effect produced by the implementation system processes checkpoint ("checkpoint") continues: now checkpoint s are not executed immediately, but gradually: the process as if "smeared" in time. Hence the name of this change checkpoint smoothing. It should be noted that during normal shutdown of the server and meeting a "clear" operation checkpoint (team CHECKPOINT) writing data to disk will still be as fast as possible.

At the end of the conversation about performance, here is a short list of other changes designed to improve performance of systems using PostgreSQL:
the
    the
  • now autovacuum is enabled by default;
  • the
  • in some situations now a possible running several processes autovacuum (for example, during prolonged cleaning process a large table, a small table now don't have to wait for the completion of the process);
  • the
  • notable reduced disk space occupied by databases: 1) due to the header varlena types (variable-length data types: text, arrays, etc.), which was formerly strictly 4 bytes, and now in some situations, only 1 byte; 2) by saving the row headers of the table (previously 27 bytes, now 24);
  • the
  • transactions not modifying the data does not increase the value of the counter of transactions (xid), which significantly reduces the probability of a situation of "overlapping" of the counter of transactions (xid wraparound); in addition, this change is a significant step forward in the realization of integrated Master-Slave-based replication transfers the transaction log — now will not have desync xid counter between Master and Slave nodes;
  • the
  • implemented the mechanism of autotune of parameters of process of bgwriter (background writer is a special process involved in writing a dirty shared buffers to disk);
  • the
  • optimized the mechanism to obtain results for queries using "...ORDER BY ... LIMIT..." (i.e., Top-N sorting): in some cases, the system holds the necessary page data in memory that provides very high speed of delivery of results;


the

database Developers


The most noticeable and important change that should be noted here, migration of the module for full-text search (contrib/tsearch2) in a kernel system. Developed by Russian developers Vladimir Bartunov and Fedor Sigaeva, tsearch2 has long been the most popular contrib module of Postgres. Patch to migrate the full-text search into core, which was adopted this summer as a result of hard and long work (the accepted version of the patch — 58!) several key developers of the PostgreSQL team is the biggest in the history of the project.

In addition, all the functions of the tsearch2 module will now be available by default and migration to a new version of PostgreSQL is significantly easier to configure the dictionaries and the rules of word processing will now be easier: all basic operations for configuring is carried out using SQL commands. So, for example, you can create a simple thesaurus dictionary:
СREATE TEXT SEARCH DICTIONARY thesaurus_astro ( 
TEMPLATE = thesaurus, 
DictFile = thesaurus_astro, 
Dictionary = english_stem 
); 
ALTER TEXT SEARCH CONFIGURATION russian 
ADD MAPPING FOR lword, lhword, lpart_hword 
WITH thesaurus_astro, english_stem;

Simplified the process of creating the index. An example of creating a GIN index on a text column (without creating additional columns and triggers):
CREATE INDEX pgweb_idx ON pgweb 
USING gin(to_tsvector('English', title || body)); 

Here is an example of a query with ranking by relevance, using to the special function plainto_tsquery for obtaining tsquery (allows you to forget about escaping characters and quickly and easily convert plain text to tsquery)
SELECT 
ts_rank_cd(textsearch_index, q) AS rank, title 
FROM 
pgweb, plainto_tsquery('supernova star') q 
WHERE 
q @@ textsearch_index 
ORDER BY 
rank DESC LIMIT 10;


Another noticeable change — XML support, which was attended by the author of this article. This functionality is implemented in accordance with the SQL:2003 standard (14-I of the standard, SQL/XML).

First of all, there is a special data type xml built into the kernel. When using this type, the server checks whether the generated data (check with well-formedness is a). And possible use cases in which allowed to work with the parts of the document (this allows you to ensure the property of "closure" of functions to work with XML data type xml).

In accordance with the SQL:2003 standard is implemented a set of functions to convert relational data to XML (i.e., functions of SQL/XML publishing). Here is a simple example of a request for generating XML data:
SELECT XMLROOT ( 
XMLELEMENT ( 
NAME 'some', 
XMLATTRIBUTES ( 
'val' AS 'name', 
1 + 1 AS 'num' 
), 
XMLELEMENT ( 
NAME 'more', 
'foo' 
) 
), 
VERSION '1.0', 
STANDALONE YES 
);


In addition, support DTD validation (the xmlvalidatedtd()), support the evaluation of XPath expressions (the xpath() that returns an array of xml data type ), and alternative functions for easier publishing relational data in XML (the tabletoxml(), querytoxml()).

To speed up query execution to XML data it is possible to use a functional btree-indexes and GIN indexes, and use full-text search for XML data. Here is an example of creating a btree index on the results of the evaluation of an XPath expression:
CREATE INDEX i_table1_xdata ON table1 USING btree( 
((xpath(’//person/@name’, xdata))[1]) 
);


With regard to data types, PostgreSQL 8.3 introduces a number of innovations: in addition to embedded in the core of the system types, tsquery/tsvector and xml, the following:
the

    enum (enumerated data types, user-defined) for the convenience of some users, including those migrating from TheirSQL; the

  • data types a GUID/UUID (contrib module);
  • the
  • arrays of composite types (for example, user-defined types).

  • And finally, a short list of other changes:
    the
      the
    • automatic invalidation of the cache query plan for PL/pgSQL-functions;
    • the
    • design "CREATE FUNCTION ... RETURNS TABLE" and "RETURN TABLE..." to create functions that result in a table;
    • the
    • support for update operations for cursors;
    • the
    • standard (ISO/ANSI SQL) "ORDER BY ... NULLS FIRST/LAST" to simplify the installation of the order of NULL values (also helps when migrating from other DBMS);
    • the
    • indexing of NULL values in GiST-index.


    the

    database Administrators


    This section turned out scanty because a lot of what is intended to improve the lives of DBA described above :-) However, will talk briefly about what's left.

    In the query plans (the EXPLAIN ANALYZE) now you can see what kind of sorting algorithm has been selected and how much memory was spent:
     QUERY PLAN 
    ------------------------------------------------------- 
    Sort (cost=34.42..34.38 rows=13 width=176) (actual time=0.946..0.948 rows=6 loops=1) 
    Sort Key: obj2tag.o2t_tag_name 
    Sort Method: quicksort Memory: 18kB <-- see here! 
    -> Hash Join (cost=19.19..34.14 rows=13 width=176) (actual time=0.812..0.835 rows=6 loops=1) 
    [...]

    Special contrib module pg_standby, written by Simon Riggs (Simon Riggs) will make it easier for the administrators setting up the server "warm backup" (Warm Standby) on the basis of the transfer of logs (WAL transfer). The module is written in pure C, so it is easily expandable and portable to new platforms (performance tested already, at least on Linux and Win32).

    If you define a function you can now override the environment variables that will be valid only within this function (bind variable values functions). For example, you can specify that the function log _data() switches transaction mode to asynchronous:
     ALTER FUNCTION log_data(text) 
    SET synchronous_commit TO OFF; 
    

    Well, according to tradition, a short list of other new products in this section:
    the
      the
    • interface support GSSAPI;
    • the
    • improved build on Win32 platform (you do not need MinGW, the build is in MS VC++, which among other things leads to performance improvements in Windows).
    • the
    • table creation in the likeness of the given indices (example: CREATE TABLE dict2 (LIKE dictionary INCLUDING INDEXES)).


    the

    Additional projects


    The company EnterpriseDB (the staff of which are active developers of PostgreSQL, a version change 8.3 in terms of performance are their credit) released the debugger pldebugger, which is a contrib module that allows you to debug PL/pgSQL functions in the standard administration tool pgAdminIII and carry out the profiling.

    The project currently exists as an independent contrib module (available at PgFoundry) and runs on many platforms (including Linux and Win32). It should be noted that this module works with version 8.2 of Postgres.

    As we he told me not so long ago, the company Skype (which uses widely known for the eponymous project of PostgreSQL) has released Open Source several products which can be useful for a large community of developers. Among them, most notably pseudo-language PL/Proxythat allows you to organize horizontal scaling with almost no restrictions (provided that if the entire application business logic implemented as stored procedures), extremely light, the connection Manager PgBouncer. Check out Skype Developers Zone you will find many interesting things!
    At the turn of spring and summer 2007, the year released version 1.0 simple and convenient tools for log analysis pgFouine. This program will help you learn what is involved your processor (CPU) database server. pgFoiune analyzes the logs of queries of Postgres (if you enable logging of queries it is recommended to introduce a time limit from below, see the description of parameter log_min_duration_statement)providing reports on slowest queries, errors, and General statistics (see examples). Thus, this tool allows the database developer to understand which queries can be improved to speed up your application using PostgreSQL.

    Finally, a brief about the other products:
    the
      the
    • project pgSNMP is an implementation of an SNMP agent for PostgreSQL (monitoring servers)
    • the
    • SEPostgres – the extension is based on a model of providing enhanced security SELinux;
    • the
    • created a tool that creates recommendations to the DBA to create indexes, and showing the possible query execution plans, subject to the availability of such indices (Index Advisor);
    • the
    • in the well-known tool for web-administration phppgadmin appeared (or will appear) capabilities settings Slony cluster, full text search, if so.


    the

    Conclusion


    Version 8.3 is the next step on the road to management system databases for corporate use. Non-trivial improvements in the areas of performance, appearance features, which are dictated by the needs of users, expanding the number of projects satellites — all this demonstrates the steady and rapid development of PostgreSQL.

    In writing this review, the author used the following sources:
    the
Article based on information from habrahabr.ru

Популярные сообщения из этого блога

Approval of WSUS updates: import, export, copy

Kaspersky Security Center — the fight for automation

The Hilbert curve vs. Z-order