We followed the following core principles during the development of SPQR:
There are some good sharding solutions relying on the Postgres codebase for routing. This is a reliable and maintainable design decision. One of the obvious benefits of this approach is that the SQL grammar is always compatible between the same versions of Postgres.
Postgres, as any DBMS, solves the hard problem of state management. And the most important state is system catalog - metadata, data about your data. Postgres allows you to see a snapshot of the structure of your data in the past. To make the performance of the system catalog acceptable, it employs a sophisticated system of caches with rather tricky invalidations.
At the beginning of our journey to sharding solutions, we tried to implement FDW-based sharding and custom node based sharding. Eventually, we concluded that both catalog and caches are excessive elements for the task of query routing. At query routing time, Postgres core checks that column data types pose necessary casts, support functions, operators, etc. The “analyze” and “rewrite” phases of the query routing made latencies go unreasonably high.
That’s why we decided to build a query routing component that knows about data structure as little as possible. SPQR does not preserve any data besides routing rules.
It is a good idea to hack your favourite database driver and implement all the routing logic there. This will be fast, relatively cheap, and maintainable. However, as soon as you come to migrating data between shards or fully resharding, you will face serious problems. By the way, it is not always possible to modify the source code of an application. In such a situation, only a proxy can be used between the application and the database.
We followed the following core principles during the development of SPQR:
There are some good sharding solutions relying on the Postgres codebase for routing. This is a reliable and maintainable design decision. One of the obvious benefits of this approach is that the SQL grammar is always compatible between the same versions of Postgres.
Postgres, as any DBMS, solves the hard problem of state management. And the most important state is system catalog - metadata, data about your data. Postgres allows you to see a snapshot of the structure of your data in the past. To make the performance of the system catalog acceptable, it employs a sophisticated system of caches with rather tricky invalidations.
At the beginning of our journey to sharding solutions, we tried to implement FDW-based sharding and custom node based sharding. Eventually, we concluded that both catalog and caches are excessive elements for the task of query routing. At query routing time, Postgres core checks that column data types pose necessary casts, support functions, operators, etc. The “analyze” and “rewrite” phases of the query routing made latencies go unreasonably high.
That’s why we decided to build a query routing component that knows about data structure as little as possible. SPQR does not preserve any data besides routing rules.
It is a good idea to hack your favourite database driver and implement all the routing logic there. This will be fast, relatively cheap, and maintainable. However, as soon as you come to migrating data between shards or fully resharding, you will face serious problems. By the way, it is not always possible to modify the source code of an application. In such a situation, only a proxy can be used between the application and the database.