Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[guides] guides: Add postgres migration guidelines #6596

Merged
merged 11 commits into from
Aug 29, 2023
197 changes: 197 additions & 0 deletions source/guides/postgres-migration.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
Migration guidelines from MySQL to PostgreSQL
=============================================

.. include:: ../_static/badges/allplans-selfhosted.rst
:start-after: :nosearch:

From Mattermost v8.0, PostgreSQL is our database of choice for Mattermost to enhance the platform’s performance and capabilities. Recognizing the importance of supporting the community members who are interested in migrating from a MySQL database, we have taken proactive measures to provide guidance and best practices.

To streamline the migration process and alleviate any potential challenges, we have prepared a comprehensive set of guidelines to facilitate a smooth transition. Additionally, we want to offer recommendations for various tools that have proven to be highly effective in simplifying your migration efforts.

.. note::

These guidelines are in development and we are working to streamline the migration process. We plan to improve this guide by updating it as new information becomes available. It is essential to note that it does not encompass migration configurations for any plugins, such as Focalboard and Playbooks. If your system utilizes these plugins, we highly advise exercising patience until we incorporate the necessary configurations specifically tailored to ensure a smooth transition for those plugins as well. Please use this guide as a starting point and always backup your database before starting a migration.

.. contents:: On this page:
:backlinks: top
:local:
:depth: 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a known issues table might be a good addition here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to collect as much as known issues and tried to address "Schema diffs" section. I think we can add a section for known issues & troubleshooting once these gets accumulated.

Required tools
--------------

- Install ``pgLoader``. See the official `installation
guide <https://pgloader.readthedocs.io/en/latest/install.html>`__.
- Install morph CLI by running the following command:

- ``go install github.com/mattermost/morph/cmd/morph@v1``

- Optinally install ``dbcmp`` to compare the data after a migration:

- ``go install github.com/mattermost/dbcmp/cmd/dbcmp@latest``

Before the migration
--------------------

.. note::
This guide requires a schema of v6.4 or later. So, if you have an earlier version and planning to migrate, please update your Mattermost Server to v6.4 at a minimum.

- Back up your MySQL data.
- Confirm your Mattermost version. See the **About** modal for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isacikgoz do we have a min MM version we would recommend? I would certainly like to call out being on a supported ESR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add that, for now it looks like the minimum support version is v6.4 ESR as we require morph based schema migrations. So if they have version before v6.4 they need to upgrade it first, then conduct the migration. Shall we add that info?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nab-77 added a note 👍

- Determine the migration window needed. This process requires you to stop the Mattermost Server during the migration.
- See the `schema-diffs <#schema-diffs>`__ section to ensure data compatibility between schemas.
- Prepare your PostgreSQL environment by creating a database and user. See the `database </install/prepare-mattermost-database.html>`__ documentation for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isacikgoz do we have a preference for pgsql version here? (outside of ensuring its a supported version).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think minimum version that we support is fine, but the latest supported version would be our preference as far as I'm concerned. Do you think we should provide a specific version?


Prepare target database
-----------------------

- Clone the ``mattermost`` repository for your specific version:
``git clone -b <your current version (eg. release-7.8)> [email protected]:mattermost/mattermost.git --depth=1``
- ``cd`` into ``mattermost`` project*.
- Create a PostgreSQL database using morph CLI with the following command:

.. code:: bash

morph apply up --driver postgres --dsn "postgres://user:pass@localhost:5432/<target_db_mame>?sslmode=disable" --path ./db/migrations/postgres --number -1

\* After ``v8`` due to project re-organization, the migrations directory has been changed to ``./server/channels/db/migrations/postgres/`` relative to project root. Therefore ``cd`` into ``mattermost/server/channels``.

Schema diffs
------------

Before the migration, due to differences between two schemas, some manual steps may be required for an error-free migration.

Text to character varying
~~~~~~~~~~~~~~~~~~~~~~~~~

Since the Mattermost MySQL schema uses the ``text`` column type in the various tables instead of ``varchar`` representation in the PostgreSQL schema, we encourage you to check if the sizes are consistent within the PostgreSQL schema limits.

================ ================ =====================
Table Column Data type casting
================ ================ =====================
Audits Action text -> varchar(512)
Audits ExtraInfo text -> varchar(1024)
ClusterDiscovery HostName text -> varchar(512)
Commands IconURL text -> varchar(1024)
Commands AutoCompleteDesc text -> varchar(1024)
Commands AutoCompleteHint text -> varchar(1024)
Compliances Keywords text -> varchar(512)
Compliances Emails text -> varchar(1024)
FileInfo Path text -> varchar(512)
FileInfo ThumbnailPath text -> varchar(512)
FileInfo PreviewPath text -> varchar(512)
FileInfo Name text -> varchar(256)
FileInfo MimeType text -> varchar(256)
LinkMetadata URL text -> varchar(2048)
RemoteClusters SiteURL text -> varchar(512)
RemoteClusters Topics text -> varchar(512)
Sessions DeviceId text -> varchar(512)
Systems Value text -> varchar(1024)
UploadSessions FileName text -> varchar(256)
UploadSessions Path text -> varchar(512)
================ ================ =====================

As you can see, there are several occurrences where the schema can differ and data size constraints within the PostgreSQL schema can result in errors. Several reports have been received from our community that ``LinkMetadata`` and ``FileInfo`` tables had some overflows, so we recommend checking these tables in particular. Please do check if your data in the MySQL schema exceeds these limitations. You can check if there are any required deletions. For example, to do so in the ``Audits`` table/``Action`` column; run:

.. code:: sql

DELETE FROM mattermost.Audits where LENGTH(Action) > 512;

Full-text indexes
~~~~~~~~~~~~~~~~~

It's possible that some words in the ``Posts`` and ``FileInfo`` tables can exceed the `limits of the maximum token length <https://www.postgresql.org/docs/11/textsearch-limitations.html>`__ for full text search indexing. In these cases, we recommend dropping the ``idx_posts_message_txt`` and ``idx_fileinfo_content_txt`` indexes from the PostgreSQL schema, and creating these indexes after the migration by running the following queries:

To drop indexes, run the following commands before the migration (These are included in the script, so you may not need to run these manually):

.. code:: sql

DROP INDEX IF EXISTS idx_posts_message_txt;
DROP INDEX IF EXISTS idx_fileinfo_content_txt;

Migrate the data
----------------

Once we set the schema to a desired state, we can start migrating the **data** by running ``pgLoader`` \*\*

\*\* Use the following configuration for the baseline of the data migration:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isacikgoz what are you thoughts on us providing the migration.load template for users to download/edit? Link here maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The template is actually on the repo, being used by the workflow: https://github.com/mattermost/mattermost/blob/master/server/tests/template.load we can add the link here.


.. code::

LOAD DATABASE
FROM mysql://{{ .mysql_user }}:{{ .mysql_password }}@mysql:3306/{{ .source_schema }}
INTO pgsql://{{ .pg_user }}:{{ .pg_password }}@postgres:5432/{{ .target_schema }}

WITH data only,
workers = 8, concurrency = 1,
multiple readers per thread, rows per range = 50000,
create no tables, create no indexes,
preserve index names

SET PostgreSQL PARAMETERS
maintenance_work_mem to '128MB',
work_mem to '12MB'

SET MySQL PARAMETERS
net_read_timeout = '120',
net_write_timeout = '120'

CAST column Channels.Type to "channel_type" drop typemod,
column Teams.Type to "team_type" drop typemod,
column UploadSessions.Type to "upload_session_type" drop typemod,
column Drafts.Priority to text,
type int when (= precision 11) to integer drop typemod,
type bigint when (= precision 20) to bigint drop typemod,
type text to varchar drop typemod,
type tinyint when (<= precision 4) to boolean using tinyint-to-boolean,
type json to jsonb drop typemod

EXCLUDING TABLE NAMES MATCHING ~<IR_>, ~<focalboard>

BEFORE LOAD DO
$$ ALTER SCHEMA public RENAME TO {{ .source_schema }}; $$,
$$ DROP INDEX IF EXISTS idx_posts_message_txt; $$,
$$ DROP INDEX IF EXISTS idx_fileinfo_content_txt; $$

AFTER LOAD DO
$$ UPDATE {{ .source_schema }}.db_migrations set name='add_createat_to_teamembers' where version=92; $$,
$$ CREATE INDEX IF NOT EXISTS idx_posts_message_txt ON {{ .source_schema }}.posts USING gin(to_tsvector('english', message)); $$,
$$ CREATE INDEX IF NOT EXISTS idx_fileinfo_content_txt ON {{ .source_schema }}.fileinfo USING gin(to_tsvector('english', content)); $$,
$$ ALTER SCHEMA {{ .source_schema }} RENAME TO public; $$;

Once you save this configuration file, e.g. ``migration.load``, you can run the ``pgLoader`` with the following command:

.. code:: bash

pgLoader migration.load > migration.log

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to place a note here to remind users to run the commands to re-create the indexes as noted in lines 113/114?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can add that.

Feel free to contribute to and/or report your findings through your migration to us.

Compare the data
----------------

We internally developed a tool to simplify the process of comparing contents of two databases. The ``dbcmp`` tool compares every table and reports whether if there is a diversion between two schemas.

The tool includes a few flags to run a comparison:

.. code:: sh

Usage:
dbcmp [flags]

Flags:
--exclude strings exclude tables from comparison, takes comma-separated values.
-h, --help help for dbcmp
--source string source database dsn
--target string target database dsn
-v, --version version for dbcmp

For our case we can simply run the following command:

.. code:: sh

dbcmp --source "${MYSQL_DSN}" --target "${POSTGRES_DSN}" --exclude="db_migrations,ir_,focalboard,systems"

Note that this migration guide only covers the tables for Mattermost channels. Support for other plugins, such as Playbooks, will be added in the future.

Another exclusion we are making is in the ``db_migrations`` table which has a small difference (a typo in a single migration name) creates a diff. Since we created the PostgreSQL schema with morph, and the official ``mattermost`` source, we can skip it safely without concerns. On the other hand, ``systems`` table may contain additional diffs if there were extra keys added during some of the migrations. Consider excluding the ``systems`` table if you run into issues, and perform a manual comparison as the data in the ``systems`` table is relatively smaller in size.