MySQL vs PostgreSQL
MySQL vs PostgreSQL is a decision many must make when
approaching open-source relational databases management systems. Both
are time-proven solutions that compete strongly with proprietary
database software. MySQL has long been assumed to be the faster but less
full-featured of the two database systems, while PostgreSQL was assumed
to be a more densely featured database system often described as an
open-source version of Oracle. MySQL has been popular among various
software projects because of its speed and ease of use, while
PostgreSQL has had a close following from developers who come from an
Oracle or SQL Server background.
These assumptions, however, are mostly outdated and incorrect.
MySQL has come a long way in adding advanced functionality while
PostgreSQL dramatically improved its speed within the last few major
releases. Many, however, are unaware of the convergence and still hold
on to stereotypes based on MySQL 4.1 and PostgreSQL 7.4. The current
versions are MySQL 5.5.27 and PostgreSQL 9.2.2.
PostgreSQL is a unified database server with a single storage engine.
MySQL has two layers, an upper SQL layer and a set of storage engines.
When comparing the two it's typically necessary to specify which storage
engines are being used with MySQL because that greatly affects
suitability, performance and (even basic) feature availability. The most
commonly used storage engines in MySQL are InnoDB for almost full ACID
support and high performance on large workloads with lots of
concurrency and MyISAM for lower concurrency workloads or higher
concurrency read-mostly workloads that don't need ACID properties.
Applications can combine multiple storage engines as required to exploit
the advantages of each.
Database systems can be optimized according to the environment they
run in. Thus, it is very difficult to give an accurate comparison in
performance without paying attention to configuration and environment.
PostgreSQL and MySQL both employ various technologies to improve
MySQL began development with a focus on speed while PostgreSQL began
development with a focus on features and standards. Thus, MySQL was
often regarded as the faster of the two. The default configuration for
both is tuned to run on small systems, and it's common for people
performing benchmark tests to either not change the defaults, or
properly tune only the one they are most familiar with. Either action
will usually give misleading results. Furthermore, both DBMS's will do
better in benchmarks related to their original strengths (i.e. MySQL
fast in simple operations, PostgreSQL more reliable and faster in
 Raw Speed
PostgreSQL provides significant performance features
- efficient executor for both static SQL or parameterised SQL
- advanced cost-based optimizer, with many plan choices and adaptive statistics collection
- indexing: partial, functional, multiple-index-combining, index-only scans, 5 different kinds of index
- TOAST data compression
- improved cache management in versions 8.1 and 8.2
- huge scalability on write intensive workloads from 8.2+
- asynchronous commit ("MyISAM for Postgres")
- asynchronous Replication built-in from 9.0+
- synchronous Replication built-in from 9.1
The 8.x releases have added more than 75 new discrete performance
features. These have been added as a result of a multi-year project to
improve performance by steadily identifying and removing key bottlenecks
in scalability, as well as adding low-level tuning and architectural
 shows feature set added for the 8.3 release, for example.
PostgreSQL can compress and decompress its data on the fly with a
fast compression scheme to fit more data in an allotted disk space. The
advantage of compressed data, besides saving disk space, is that
reading data takes less IO, resulting in faster data reads.
PostgreSQL supports one storage engine, with tight integration
between that storage engine and the rest of the database. Options like
asynchronous commit can be set on a per-transaction, per-user or whole
system basis, allowing different transaction types to co-exist
efficiently without the need to select storage engine types once for
each table ahead of time.
By default, PostgreSQL comes tuned to run on a shared server, so
has low performance settings. When running on a dedicated server
performance can be improved by changes to a few key parameters.
MySQL 5.1 natively supports 9 storage engines :
- NDB Cluster
- MEMORY (HEAP)
However, the federated and blackhole engines are not actually
"storage" engines (for example, "blackhole" does not store anything).
There are several externally-developed storage engines, some of the most popular are:
- Aria (crash-safe MyISAM)
- XtraDB (feature enhanced InnoDB fork)
MySQL has several custom and community storage engines under development:
In some distributions, the default storage engine is MyISAM, which is
not transaction safe. Setting the default engine to a transactional
engine such as InnoDB is trivial. Beginning with MySQL 5.5.1 InnoDB is
the default storage engine.
MySQL has a query cache that does simple string matching before
the parser to see whether a query has been processed recently and
rapidly returns the result to the client application if it has, without
the need to do any of the traditional database work. This is of
considerable value to many read-mostly workloads. Cached queries are
removed whenever any table involved in the query is changed so its
usefulness declines as the rate of data changes increases.
The query cache runs on a single thread and must consider each
select, so it may eventually become a performance bottleneck at some
point beyond 8 cores, but that's not usually the case. It can be turned
off easily to check this and to see whether its small overhead is
worthwhile for the particular workload.
MySQL also supports network protocol-level compression which is
an option that can be turned on by the client if the server allows it.
This compresses everything to and from the server.
MyISAM is the traditional MySQL storage engine and is often better
for read-mostly workloads. MySQL's MyISAM engine performs faster than
PostgreSQL on simple queries and when concurrency is low or follows
certain patterns (e.g. count(*) is very fast). MyISAM's speed comes at
the cost of not supporting transactions, foreign keys, and not offering
guaranteed data durability.
In old versions, MyISAM required exclusive access for SELECT or
data-changing operations. Recent versions default to allowing updates at
the end concurrently if there are no deleted records and offer the
option to ignore deleted record free space and always operate
concurrently, or to disable concurrent inserts.
MyISAM supports leading prefix compression for keys and supports a read only compressed table form that compresses data as well.
System tables always use the MyISAM storage engine. This has been
used to criticise it for the potential of losing system table
information, but some claim that this doesn't happen "in practice" as it
is unlikely that such generally infrequently updated tables will be in
the process of being written to at the time of a crash.
InnoDB is an ACID compliant, transactional storage engine using MVCC technology. It's the normal choice for most modern applications using MySQL.
The InnoDB storage engine stores the data with the primary key,
so primary key lookups are fast. Good choice of primary key for physical
optimisation can be very useful; in cases where it's undesirable or
where the desired primary key produces poor physical performance a
simple integer can be used. An internal integer primary key is the
default if no primary key or unique column is present.
The InnoDB engine automatically generates hash index entries when
processing SELECTs. This feature can be turned off if necessary; some
workloads perform better without it.
The InnoDB engine has an insert buffer that caches updates to
secondary index entries and applies them in the background. This can
significantly speed up inserts, reducing the number of physical writes
required by combining many updates. If a secondary index page has
outstanding updates when it is needed for a query the updates will be
merged first. As of version 5.5 the insert buffer is also used as a
buffer for other types of writes, improving the performance of UPDATE
queries as well.
With the InnoDB installed via Plugin MySQL 5.1 supports on-the-fly compression of InnoDB tables.
Beginning with this release of the InnoDB Plugin, you can use the
attributes ROW_FORMAT=COMPRESSED or KEY_BLOCK_SIZE in the CREATE TABLE
and ALTER TABLE commands to request InnoDB to compress each page to 1K,
2K, 4K, 8K, or 16K bytes.
Despite major changes in ownership in recent years, InnoDB performance has received continuing development attention.
With InnoDB, the secondary key lookup is slow, because the leaves
of its secondary indices are the primary keys. So it has to do 2
lookups when using a secondary index: one for the secondary index
itself, and one for the primary key.
 MySQL:NDB Cluster
NDB is a high performance, highly available, (mostly) in-memory
storage engine. Non-indexed attributes may be stored on disk. Data and
logs are automatically flushed to disk periodically to mitigate data
loss in the event of a cluster failure. NDB is used heavily in
Telecommunications applications where uptime and real time performance
is critical. NDB transparently organizes records into fragments which
are distributed evenly to all nodes in the cluster. NDB uses
synchronous replication internally to ensure records are committed to at
all nodes within a logical node group before returning a COMMIT. This
two phase commit gives NDB the ability to support sub-second failover.
Further to this NDB also supports automatic node recovery. NDB supports
the online introduction of additional nodes to the cluster without
impacting availability of the application.
Significant Limitation: The distributed nature of the tables
makes NDB perform poorly for complex JOIN operation as compared to
traditional storage engines. This has been resolved in version ndb-7.2.1
and later with Adaptive Query Localization (distributed pushed-down join, formerly SPJ)
PostgreSQL: PostgreSQL has no comparable solution.
MySQL supports on-the-fly compression since version 5.0 with the ARCHIVE storage engine.
Archive is a write-once, read-many storage engine, designed for
historical data. It compresses data up to 90%. It does not support
indexes. In version 5.1 Archive engine can be used with partitioning.
Aria is a fully ACID
compliant, storage engine being incrementally developed by Monty
Widenius and a team of other former MySQL developers. The incremental
development model means that the full ACID feature set is gradually
Historically MySQL has focused more on scale-out than scaleup and
PostgreSQL is often considered to scale better with large numbers of
cores or uncommonly high concurrency levels.
With the increased availability of multi-core systems MySQL has
been working to improve this, through both new development (5.5 and 5.6)
and the incremental development of 5.0 via bug fixes that remove
bottlenecks. The continuous incremental improvements without major or
minor version number changes makes it vital to compare with recent
builds of the server, not just looking at the main version number. A
partial list of versions with significant performance improvements is
5.0.30, 5.0.54 and 5.0.58 and any benchmark result older than those
versions should be considered to be particular to older installations,
especially in the areas of concurrency and scalability.
Along with MySQL's internal work on concurrency
many patches from Google and Percona hold promise and some have been incorporated into the standard MySQL server in version 5.4.
 Asynchronous I/O
PostgreSQL supports a full fledged asynchronous API for use by client applications. It is reported to increase performance by up to 40% in some cases. MySQL lacks Async support, though some drivers have been created to attempt to overcome this deficiency (perl ruby).
Transactional databases which implement MVCC
such as PostgreSQL and InnoDB perform COUNT(*) in a way that is very
slow compared to non-transactional databases like MyISAM. The MyISAM
engine in MySQL uses an index scan for COUNT(*) and also caches the
result of the count, thus it is much faster. PostgreSQL and InnoDB
require a table scan to locate all visible rows. These MVCC capable
engines implement COUNT(*) this way because MVCC stores transaction
visibility in the row data as opposed to the index. With MVCC capable
databases, caching the COUNT(*) would result in incorrect data being
returned. PostgreSQL 9.2 has index-only scan support which uses the
visibility map feature to determine whether a row is visible to the
current transaction rather than visiting the page. This means
dramatically faster COUNT(*) results. PostgreSQL 9.2 index-only scans PostgreSQL Slow Count() Workaround InnoDB COUNT(*)
Speed comparisons tend to depend on specific workloads and be used
long after the versions involved are out of date, for example, tweakers.net and jamonation tests that precede many improvements that were implemented in MySQL 5.0 and the subsequent MySQL and Oracle releases.
It's still true that in some benchmarks PostgreSQL may scale
better with large numbers of cores at high concurrency levels. Equally,
there are workloads where the same is true for MySQL, so the usual
advice to test your own application continues to apply. PostgreSQL
excels at workloads with a significant write workload.
A new benchmark
for MySQL 5.0.51 and MySQL 5.1.30 with InnoDB 1.0.3 (MySQL 5.4 contains
the patches from InnoDB 1.0.3) compared with PostgreSQL 8.3.7 suggests
that MySQL and PostgreSQL are almost equal in terms of scalability, by
at least one standard of measure.
While benchmarks can be an indicator of performance, MySQL
performance in many subquery / join statements is behind PostgreSQL due
to the inferior Query Optimizer (a much wider benchmark would show
PostgreSQL as a clear winner in this case).Recent enhancements to the MySQL optimizer provide large gains in the performance of subquery / join statements.
 ACID Compliance
ACID stands for Atomicity, Consistency, Isolation and Durability.
This model is used to judge data integrity across database management
systems. Most database systems achieve ACID compliance by using
transactions AND enforcing triggers/ foreign keys (it's forgetting about
this part that prevents consistency, as in InnoDB).
PostgreSQL is fully ACID compliant.
InnoDB engine is fully ACID compliant, but using InnoDB in MySQL
fails ACIDity, because MySQL doesn't propagate the triggers of foreign
keys for this engine. InnoDB is a storage engine, it does not execute
the queries. Until MySQL is ACID compliant, outside of the storage
engines, any storage engine running on it will not conform.
Some other MySQL engines are said to be ACID compliant, like Falcon / Cluster.
PostgreSQL is acknowledged as having a more rigorous approach to robustness and data integrity.
PostgreSQL and MySQL both have an impressive array of features that
increase data integrity, functionality, and performance. The features
included in a database may help improve performance, ease of use,
functionality, or stability.
 Ease of use
A "gotcha" is a feature or function which works as advertised - but not as expected.
PostgreSQL supporters claim that MySQL has more "gotchas" than PostgreSQL,
due to its deviation from the SQL standard, and its various functional
limitations which may not seem intuitively obvious to a new user. While
retaining backwards compatibility, MySQL has introduced various SQL
modes that increase standards compliance and produce results that may be
more expected by those unfamiliar with MySQL.
In certain aspects, such as case sensitivity in CHAR fields, MySQL's
behaviour [...] is the diametrical opposite of the default behaviour
of most other databases. (Tested: DB2 8.1, Firebird 1.5.1, Oracle 8.1.7
and PostgreSQL 7.4.3 [...])
 Insert Ignore / Replace
MySQL supports the statements, 'INSERT IGNORE', which inserts if a row doesn't exist, and 'REPLACE', which replaces the current row.
Currently PostgreSQL supports neither of these statements and
suggests using stored procedures to get around the lack of these
statements. However, there are major shortcomings:
it can only insert a single value at a time. This is a major
performance limitation, and also suffers concurrency issues. INSERT
IGNORE and REPLACE handle multi-valued inserted much more gracefully.
When appropriate, PostgreSQL can also use
RULEs to to produce any of these behaviors on a normal
UPDATE; this can also be used on a view.
A similar MySQL feature
INSERT ... ON DUPLICATE UPDATE is also missing from PostgreSQL and requires use of stored procedures (which can work only on one row at a time) or
RULEs, which are not so constrained.
PostgreSQL will feature the MERGE clause in a future version ,
which follows the SQL:2008 standard. This can do the same as MySQL's
non-standard 'INSERT IGNORE','REPLACE' and 'INSERT ... ON DUPLICATE
UPDATE' statements but with more granularity.
Both PostgreSQL and MySQL support Not-Null, Unique, and Primary Key
constraints. Foreign Key constraints are supported by PostgreSQL and by
MySQL's InnoDB storage engine, but not other engines. However MySQL
silently ignores the CHECK constraint which PostgreSQL has supported for a long time.
InnoDB tables support checking of foreign key constraints... For
other storage engines, MySQL Server parses and silently ignores the
FOREIGN KEY and REFERENCES syntax in CREATE TABLE statements.
For MySQL engines not supporting foreign key constraints or to implement constraints between tables in differing engines, triggers can be used to check the constraint, although this is not a guarantee as it's prone to race conditions. 
Also, the support of ForeignKeys in MySQL is not yet complete, as
cascade actions are not considered as SQL statements and thus do not
fire triggers etc. (they only cascade further)
 Default Values
PostgreSQL allows for any function marked as IMMUTABLE or STABLE to
be used as the default value for a column. Currently, NOW() is the only
function that can be used as a default value in a MySQL table -- and can
only be applied to one column per table, on TIMESTAMP columns only.
 Stored Procedures
MySQL supports stored procedures, per se; PostgreSQL supports stored functions, which are in practice very similar.
The first query language for PostgreSQL, PL/pgSQL, is similar to Oracle's PL/SQL. PostgreSQL supports SQL:2003
PSM stored procedures as well as many other general purpose programming
languages such as Perl (PL/Perl), Python (PL/Python), TCL (PL/Tcl),
Java (PL/Java) and C (PL/C).
MySQL follows the SQL:2003 syntax for stored routines, which is also used by IBM's DB2.
Via the plugin interface MySQL supports external language stored procedures in Java, Perl, XML-RPC with more language plugins in the works.
Both PostgreSQL and MySQL support triggers. A PostgreSQL trigger can
execute ANY user-defined function from any of its procedural languages,
not just PL/pgsql.
MySQL triggers are activated by SQL statements only. They are not
activated by changes in tables made by APIs that do not transmit SQL
statements to the MySQL Server; in particular, they are not activated by
updates made using the NDB API.
MySQL triggers are also not activated by cascading updates and
deletes even when caused by a SQL statement (this is against the
standard), since those are a feature of the InnoDB engine rather than
the database as a whole. In these cases, MySQL silently ignores the
triggers without issuing a warning (i.e. in other dbms's you must be
extra careful to not pop triggers accidentally, in MySQL you must be
extra careful that many of your statements will not activate the
triggers, such as cascades, etc.).
PostgreSQL also supports "rules," which allow operating on the
query syntax tree, and can do some operations more simply that are
traditionally done by triggers. However, the rule system is on the way
out in favour of (more powerful) triggers.
Syntax for definition of triggers in PostgreSQL isn't as
straightforward as in MySQL. PostgreSQL requires separate definition of a
function with specific data type returned (this is the general
behaviour of PostgreSQL = more strict vs MySQL = less strict. The syntax
in itself is as straightforward but the implementation is stricter and
much more powerful (any user function), thus a bit harder to learn). On
the upside, PostgreSQL supports multiple actions per trigger using OR
(e.g. BEFORE INSERT OR UPDATE).
MySQL does NOT support multiple triggers of the same type (i.e.
maximum one ON UPDATE BEFORE and one ON UPDATE AFTER trigger) on the
same table, whereas PostgreSQL does (If multiple triggers of the same
kind are defined for the same event, they will be fired in alphabetical
order by name. > from the manual).
PostgreSQL, as of 9.1, supports TRIGGERS on views, MySQL does not.
 Replication and High Availability
Replication is a database management system's ability to duplicate
its stored data for the purposes of backup safety and is one way to
prevent database downtime. PostgreSQL and MySQL both support
PostgreSQL has built-in asynchronous replication as of 9.0, consisting of streaming replication  and hot standby .
PostgreSQL streams the write-ahead log data which replays activity on
the database guaranteeing identical results. MySQL's older form of
statement based replication could introduce slave inconsistency when
executing non-deterministic statements. MySQL now warns when using
potentially unsafe statements, switching to a ROW image based log
There are several packages that also provide replication in PostgreSQL:
It is a common misconception that these "third-party packages" are
somehow less well integrated. Slony, for example, was designed and built
by Jan Wieck, a PostgreSQL core team member, and has a number of other
members of the PostgreSQL community involved in its ongoing design and
maintenance. However, Slony is considerably slower and uses more
resources than built-in replication, as it uses SQL and triggers rather
than binary log shipping to replicate the data across servers. That may
make it less suitable for larger cluster deployments with high
 Slony-I Replication Weakness
Slony-I, the most widely used PostgreSQL replication tool, is
inherently inferior to MySQL's built in replication for a number of
reasons. First, it uses SQL and triggers to replicate the data across
servers. This is considerably slower than MySQL's binary log shipping
and makes the communication costs much higher. Second, Slony-I's
communication costs grow quadratically in relation to the number of
servers in the replication pool (Order(n^2)). This makes it inherently
unusable for larger clusters. If we conservatively figure that
Slony-I's SQL/trigger method takes twice as much communication as
MySQL's binary log shipping, we can easily see how poorly this would
work for larger clusters in the real world.
With two servers:
MySQL: 2 = 2
PostgreSQL: 2*2^2 = 8
With 4 servers:
MySQL: 4 = 4
PostgreSQL: 2*4^2 = 32
With 12 servers:
MySQL: 12 = 12
PostgreSQL: 2*12^2 = 288.
While Slony-I is adequate for high availability with two servers,
its communication costs are simply prohibitive for scaling out.
Note from Jan Wieck: I don't quite understand how the author
of the above came up with those numbers. It seems to refer to the SYNC
events and their confirmations. Slony does create these internal
housekeeping messages on all nodes and they are transported to all other
nodes. But it doesn't transport all user data changes back and forth in
that fashion. These are small (maybe 30-100 byte messages) that occur
once every couple of seconds. Claiming that with 2 nodes it has 4 times
MySQL's communication volume or with 12 nodes 24 times that is outright
wrong. Also, Slony does allow cascading of nodes, where one replica will
act as a multiplier, serving more replicas in a star like cluster
configuration. There is no need to pull directly from the origin.
Slony-I is also difficult to administer.
PGCluster is not useful for situations where high-performance and
a decent amount of writes are to be expected. This is because it is a
synchronous replication system which waits until a write has happened on
all machines in the cluster rather. However, in situations that have
very few writes and require data to be absolutely consistent across each
database, PGCluster can be a good tool to use.
 PostgreSQL Synchronous Replication
PostgreSQL 9.1 comes with support for synchronous replication. It
thus allows for the popular star-topology configurations with one write
and multiple read-only slaves with slaves capable of stepping up in case
the primary goes down.
MySQL ships with support for asynchronous replication. In this form
of replication a log of events is transmitted to the slaves. These
slaves must apply the statements or rows in this log to each of the
slave servers independently. This has the limitation that each slave
may have a slightly different view of the data depending upon the length
of their lag in transferring and applying this log. Prior to 5.1
non-deterministic statements in this log can cause records to be
inserted or updated differently on each slave 
Starting with version 5.1, MySQL supports two forms of
replication; statement based replication (SBR) and row based replication
(RBR). SBR, used prior to 5.1, collects SQL queries which affect
changes to the database in a binary log which the slave servers
subscribe to for their changes. RBR instead records the incremental row
changes themselves in the binary log that are then applied to the
slave. RBR is used automatically when non-deterministic queries are
executed on the master. The storage engines NDB and InnoDB, in certain
cases, only support replication using this new row based binlog format.
Starting with version 5.5 MySQL supports semi-synchronous 
replication in which the master confirms receipt of the log by at least
one slave before returning from a COMMIT. This allows for greater data
integrity and simplified slave promotion in the event of a catastrophic
failure of the master.
Within the NDB storage engine MySQL supports scalable synchronous
replication between ndbd nodes using a two phase commit protocol which
does not rely upon a binary log. The two phase commit protocol still
exhibits excellent performance due to the in-memory nature of NDB. Once
an update has been committed to memory on two nodes it is considered
Replication between two NDB clusters or between NDB and tables in
another engine are possible via the asynchronous replication that comes
standard with MySQL.
In addition to the built-in asynchronous and semi-synchronous
replication methods provided by MySQL Server, additional 3rd party
solutions exists to provide synchronous replication.
PostgreSQL does not have an unsigned integer data type (which is not a
standard one and can be replaced by a SQL DOMAIN in PG), but it has a
much richer data type support in several aspects: standards compliance,
the logically fundamental data type BOOLEAN, IP address and networks,
user-defined data types mechanism, built-in and contributed data types.
PostgreSQL allows columns of a table to be defined as
variable-length multidimensional arrays. Arrays of any built-in or
user-defined base type, enum type, or composite type can be created.
Arrays of domains are not yet supported. 
MySQL has set and enum types.
MySQL does not have network IP address data types that PostgreSQL
has but does provide INET_ATON() and INET_NTOA() functions to convert
IPv4 addresses to and from integers, which are easily stored.
Also, PostgreSQL supports a range of IP-related functions like checking whether a range is part of another range, etc.
Both MySQL and PostgreSQL support subqueries. Support for them is more recent in MySQL and performance is still being improved for some types of subquery
that may have already been optimised in PostgreSQL. It should also be
noted that MySQL does not allow for a subquery in a view. This feature
was removed after 4.1 and as of 5.5 was still not available.
Workarounds include nested views, joins and hacking the source code.
However, it must be extremely clear that there remains a lot of basic
subqueries with major optimization problems in current MySQL versions
(5.5.x), which HAVE to be worked around..
Also, be careful as the way "NOT IN" works is not the same across DBMS's
(i.e. you need inner/outer value null checks in mysql, they're
integrated in postgresql, etc.).
Both MySQL and PostgreSQL support joins. As for subqueries, the
support is more recent in MySQL too, and the performance in many cases
is still behind that of PostgreSQL, also requires workarounds etc.
There are even cases where a MySQL (innodb) join will crashinstead of just being slow (maybe someone has that experience with a recent version of PostgreSQL, if yes please comment).