Minor Planet Center Annex

@ SBN

Postgres database replication

Note that the public database schema is a β release, and may be subject to change over the development period. Tables may be added, removed, or modified by the MPC.

If you decided to replicate a Postgres mpc_sbn copy of database from SBN, please, consider "Master" side paragraph FOR YOUR INFORMATION only. Start from the "Replica" side (from the scratch).

"Master" side:

  1. Make sure port 5432 is open.
  2. As postgres user check postgresql.conf:
listen_addresses should be listening to the replica address.
wal_level = logical
  1. In psql setup an account and give it at least read access to the database and tables to be replicated:
CREATE ROLE <username> WITH REPLICATION LOGIN PASSWORD '<password>';
GRANT CREATE ON DATABASE <dbname> TO <username>;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO <username>;
  1. As postgres user edit pg_hba.conf:
In the hosts section add a line like:
host <dbname> <username> <user_IP>/32 md5
  1. Restart the postgres server.
  2. In psql in the <dbname> create a publication:
[sudo -u postgres] psql <dbname>
CREATE PUBLICATION <pubname> FOR ALL TABLES;
or
CREATE PUBLICATION <pubname> FOR TABLES table1, table2, table3,...;
  1. Dump the schema of all tables in the publication and send them to the replica machine:
pg_dump -d <dbname> -t <table1> -s > <table1>_schema.sql
pg_dump -d <dbname> -t <table2> -s > <table2>_schema.sql
pg_dump -d <dbname> -t <table3> -s > <table3>_schema.sql
 
(can be done for all tables at once by removing the -t option, but above method makes it easier to troubleshoot any issues). pg_dump -d <dbname> > <dbname>_schemas.sql

Items 1,2,6 need to be done 1 time.
Items 3,4,5 need to be done with every new replication
Item 7 could be done once, but with every new replication the <username> should be updated in every *_schema.sql file:

GRANT SELECT ON TABLE public.neocp_events TO <username>;

"Replica" side (from the scratch):

  1. Install PostgresSQL v16.x
  2. Provide IP-address of the server to SBN (mamoutkine@astro.umd.edu)
  3. Wait for table-structures*.sql files and shell script from SBN
  4. Download tables_structure*.sql files into local server disk with reading permission for all.
  5. Create an empty database (mpc_sbn):
sudo -u postgres psql
CREATE DATABASE <dbname>
  1. Add empty tables using schemas from the "master" (Run those commands from the same directory where *.sql files):
[sudo -u postgres] psql -d <dbname> -a < <table1>_schema.sql
[sudo -u postgres] psql -d <dbname> -a < <table2>_schema.sql
[sudo -u postgres] psql -d <dbname> -a < <table3>_schema.sql
or
[sudo -u postgres] psql <dbname> < <dbname>_schemas.sql
  1. Create a subscription to the master's publication:
sudo -u postgres psql <dbname>
CREATE SUBSCRIPTION <subname> CONNECTION "host=<master_IP_address> port=5432 user=<username> password=<password> dbname=<dbname>" PUBLICATION <pubname>;

<pubname> in "replica" and "master" sides must match, but the <subname> must not already exist. Multiple subscribers can all connect to the same publication <pubname>.

https://www.digitalocean.com/community/tutorials/how-to-set-up-logical-replication-with-postgresql-10-on-ubuntu-18-04

  1. How to check a replication:
sudo -u postgres psql mpc_sbn
SELECT * FROM neocp_obs_archive;

Another check:

sudo -u postgres psql mpc_sbn
SELECT * FROM pg_stat_replication;

Another check: View as a sudo latest log-file: [psql_path]/11/data/log/postgresql-[Wed].log

"Replica" side (with existing logical replication from SBN) - how to add new table(s):

  1. Download all structure_only.sql file into local disk with reading permission to all
  2. Add empty tables using schemas from "master":
[sudo -u postgres] psql -d <dbname> -a < <table1>_schema.sql
  1. Modify a subscription to the master's publication:
sudo -u postgres psql <dbname>
ALTER SUBSCRIPTION <subscription_name> REFRESH PUBLICATION;

How to drop a database for complete re-start or version upgrade:

  • If you have a publication and distribute data to somebody - drop the publication:
[sudo -u postgres] psql <dbname>
DROP PUBLICATION <pubname>;
  • If you subscribed to another database and have an active connection - drop the subscription:
[sudo -u postgres psql <dbname>]
DROP SUBSCRIPTION IF EXISTS <subname>;
  • If you subscribed to another database but connection is broken:
ALTER SUBSCRIPTION <subname> DISABLE;
ALTER SUBSCRIPTION <subname> SET (slot_name=NONE);
DROP SUBSCRIPTION <subname>;
  • Delete database:
[sudo -u postgres psql]
DROP DATABASE <dbname>;
  • If you would like to remove the old PosgreSQL version:
sudo systemctl stop postgresql-11
sudo systemctl disable postgresql-11
sudo yum remove postgresql11
  • To start a new version of database go to "Replica side (from the scratch)"

Best practices for SBN Postgres replication clients:

  • Postgres configuration tune-up.
As a minimum: https://pgtune.leopard.in.ua/#/
More detailed: https://pgconfigurator.cybertec-postgresql.com/
  • Hardware.
SSD storage for data-dir.
Based on experience, it's the most critical for better performance of the database and replication. Even more important, than network bandwidth.RAM memory.
Better to have more bad / slow memory than less good / fast memory.CPU.
Multi-core and fast processor(s) could make a small advantage for database and replication performance.
Example of hardware specifications for the server to replicate the MPC Postgres database:
CPU - quad core or more
RAM - 32-48 GB
SSD - 4 TB
Network - 1-10 Gbps would be a good compliment.
  • 24/7 service.
Please avoid any hibernating devices like desktop or especially laptop.
  • No new subscription / replication slot without SBN agreement.
If you have any issues with existing subscription / replication slot, please, try to resolve them first. In case of emergency, drop the existing subscription first, drop data from all tables, and then create a new subscription (with the same or different name).

Index(es) and Table(s)

  • In addition to the table structures and data, the logical replication of the current database includes index(es) created by MPC to support their processes.
  • You (the Subscriber) could create custom index(es) on your local copy of the database to improve the query performance specific to your processes. Please, pause the subscription, create new index, continue the subscription:
ALTER SUBSCRIPTION <your_sub> DISABLE;
CREATE INDEX idx_table_column ON table(column);
ALTER SUBSCRIPTION <your_sub> ENABLE;
  • You also could add custom table(s) to your local copy of the database. Similar, pause the subscription, create new table, continue subscription:
ALTER SUBSCRIPTION <your_sub> DISABLE;
CREATE TABLE distributors (
did integer PRIMARY KEY,
name varchar(40)
);
ALTER SUBSCRIPTION <your_sub> ENABLE;

MPC Distributed Postgres Database Terms Of Service

Version 1.0, 9/28/2023

These terms of service ("Terms") govern your replication, access, and use of the data distributed from PostgreSQL database ("Database", or alternatively, the MPCDB), a public copy of the Minor Planet Center's (MPC's) live database, hosted from the NASA Planetary Data System's Small Bodies Node (SBN), located at the University of Maryland. This Database contains all publicly distributed data collected and produced by the MPC, and has live copies of the central processing tables. The database design base is Postgres and uses the Postgres built-in subscription capability.

Descriptions of the database content and structure are provided elsewhere (see the schema posted on the MPC's website: https://data.minorplanetcenter.net/postgres-schema/schema.html). By replicating, accessing, or using the Database, you agree to be bound by these Terms. If you disagree with these Terms, you may not replicate, access, or use the Database.

  1. Description of Service

    We offer a service that allows entities to replicate, access, and use data from our Database for their internal purposes. SBN is distributing the Database from the Minor Planet Center (MPC) as a courtesy.

  2. License

    We grant you a non-exclusive, non-transferable, non-sublicensable license to replicate, access, and use our database's data for your internal purposes.

  3. Intellectual Property

    All intellectual property rights in the Database, including but not limited to copyrights, trademarks, patents, and trade secrets, are owned by us or our licensors. To prevent confusion, any modifications or derived data products must be explicitly and clearly marked as such if they are redistributed.

  4. Restrictions

    You agree not to use the Database for any illegal purpose. You may not replicate, access, or use the Database in violation of these Terms or any applicable law or regulation. You may not use the Database to harass, defame, or threaten any person or entity.

  5. Limitation of Liability

    SBN and MPC are not responsible for any damages, losses, or liabilities arising from your use of the Database, including but not limited to any errors, omissions, interruptions, or delays. We do not guarantee the accuracy or completeness of any data in the Database, and we are not liable for any reliance on such data. Users are responsible for their resource management.

  6. Termination

    This service can be negatively impacted by users at an individual endpoint. We may therefore find it necessary to terminate your access to the Database at any time for any reason. Some occurrences that may result in such termination are mentioned in Terms 11-13 below. This software should not be run on a machine that is not intended to be online 24 hours per day. Alternative means of acquiring specific information within the database will be provided; please contact SBN (see Term 10 below) for these services.

  7. Governing Law

    These Terms are governed by and construed in accordance with the laws of the jurisdiction in which we are located. Any dispute arising from or relating to these Terms shall be resolved through binding arbitration in accordance with the rules of the American Arbitration Association.

  8. Changes to Terms

    We reserve the right to modify or update these Terms at any time. If we make material changes to these Terms, we will notify you by email or by posting a notice on our website (https://sbnmpc.astro.umd.edu/MPC_database/statusDB.shtml). Your continued use of the Database following any such changes constitutes your acceptance of the revised Terms.

  9. User liability

    The User is responsible for responding reasonably to SBN requests related to the Database distribution service, safety, security, or integrity.

  10. Contact Information

    If you have any questions about these Terms or the Database, please contact us using the MPC helpdesk under the category of "Database Replication".

  11. Operating system and Postgres version requirements

    The live copy of the relational system is a Postgres database. To ensure integrity and longevity of the distribution system, SBN requires that the system to which the live copy is distributed (the user's system) runs an operating system that supports the earliest supported Postgres version and distribution.Currently (9/28/23) the SBN supports Postgres V14 and higher.

  12. Download time constraints

    The MPCDB tables may grow quickly in size with the advent of new surveys. Consequently, SBN requires the user's system to complete the initial download within 3 days. Furthermore, the user's full system must be able to keep up with the database throughput rate, requiring consideration of factors such as system specification, system configuration, and data transfer bandwidth. Notional thresholds regarding these factors will be provided in the Best Practices section, and may be upgraded depending on the average database volume growth from successive generational evolution of survey detection volumes.

  13. User's system downtime

    In order to avoid backlogs in distribution transactions, we require that the user's do not install the system on a transient server, such as a laptop that often travels with prolonged downtime. We require that the downtime be no more than an average of 14 days (cumulative) per year. The user is to notify SBN if a subscription is no longer being used.

  14. Maximum Institutional copies

    We request no more than 4 copies be distributed from SBN to the same institution without contacting the SBN regarding additional copies; SBN grants permissions to duplicate the database within the institution.

  15. User Feedback

    The MPC helpdesk shall be used for first reporting user related problems, rather than contacting SBN personnel directly. The MPCDB stewards at the SBN receive tickets through the MPC's helpdesk.

  16. Appendix A. Technical thresholds.

    Violation of any parameter from below list will result in (semi)-automatic termination of the distribution to the institution. Replication could be restored after fixing the source of violation. Technical thresholds are subject to periodic review and modification.

    • Inactive replication slot for longer than 7 days.
    • Difference between "normal" XMIN (fully synchronized replication slot) and "replication behind" XMIN > 5%.
    • Lag replication exceed 1 MB for longer than 3-5 days.