Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Technical notes about GMT sessions #3398

Open
seisman opened this issue Aug 13, 2024 · 0 comments
Open

Technical notes about GMT sessions #3398

seisman opened this issue Aug 13, 2024 · 0 comments
Labels
discussions Need more discussion before taking further actions

Comments

@seisman
Copy link
Member

seisman commented Aug 13, 2024

Sometimes we talk about GMT sessions in issues/PRs. It's important to know that there are two different kinds of sessions: (1) the GMT CLI session; (2) the GMT C API session. Here are some technical notes of the two GMT sessions to help understand how PyGMT works and the potential flaws.

The GMT CLI session

Here is a simple GMT CLI script:

gmt begin map -V
    gmt basemap -R0/10/0/10 -JX10c -Baf
gmt end show

The gmt begin command creates the so-called GMT CLI session directory under the ~/.gmt/sessions directory. The session directory name is like ~/.gmt/sessions/gmt_session.XXXXX, in which XXXXX is the parent process ID (PPID), but can be changed by the environmental variable GMT_SESSION_NAME. Then, subsequent GMT commands will read/write information from/to files in this directory. This is how different GMT module calls communicate in modern mode. See https://docs.generic-mapping-tools.org/dev/begin.html#note-on-unix-shells for the official explanations.

In the current PyGMT implementation, when we import the PyGMT library (i.e., import pygmt),
we call gmt begin to create the GMT CLI session. This GMT CLI session will be used by all subsequent GMT calls. It's usually OK, but when used in multiprocessing, GMT module calls from different processes access this directory at the same time, which can cause corruptions. This explains why PyGMT has troubles with multiprocessing (#217).

So, to make PyGMT support multiprocessing, the solution seems straightforward:

  1. Set environmental variables GMT_SESSION_NAME to a unique value (we already have the
    unique_name() function) so that each process has a unique session name
  2. Do not call gmt begin at import time so that each process has a unique GMT CLI session directory

A proof-of-concept PR is opened at #3392.

The GMT C API session

We also need to know a little about the GMT C API session. Here is a simplified C example that calls GMT C API functions (the original example is https://github.com/GenericMappingTools/gmt/blob/master/src/testapi_modern.c):

#include "gmt.h"

int main () {
	void *API = NULL;

	/* Initialize the GMT session */
	API = GMT_Create_Session ("testapi_modern", 2, GMT_SESSION_RUNMODE, NULL));
	GMT_Call_Module(API, "begin", GMT_MODULE_CMD, "apimodern png");
	GMT_Call_Module(API, "basemap", GMT_MODULE_CMD, "-BWESN -Bafg -JM16c -R5/41/9/43");
	GMT_Call_Module(API, "end", GMT_MODULE_CMD, "show");
	GMT_Destroy_Session (API);
}

The C API function GMT_Create_Session creates the so-called GMT C API session. This function does a lot of things, including, allocating memory for internal variables, deciding the session name, loading gmt.conf settings, and more. The API function GMT_Destroy_Session is responsible for destroying the GMT C API session.

The equivalent PyGMT version should be:

from pygmt.clib import Session

with Session() as lib:
    lib.call_module("begin", "pygmt-session")
    lib.call_module("figure", "apimodern -")
    lib.call_module("basemap", "-BWESN -Bafg -JM16c -R5/41/9/43")
    lib.call_module("psconvert", "-A -Tg")
    lib.call_module("end")

However, in the current implementation, the PyGMT version looks like below:

from pygmt.clib import Session

with Session() as lib:
    lib.call_module("begin", "pygmt-session")
with Session() as lib:
    lib.call_module("figure", "apimodern -")
with Session() as lib:
    lib.call_module("basemap", "-BWESN -Bafg -JM16c -R5/41/9/43")
with Session() as lib:
    lib.call_module("psconvert", "-A -Tg")mi
with Session() as lib:
    lib.call_module("end")

in which the GMT C API sessions are created/destroyed multiple times. We may have some performance improvements if we can use a single GMT C API session, but we also need to note that the GMT CLI script also creates/destorys GMT C API sessions repeatly.

Extra notes

Here are some extra notes:

  1. The session name is decided in C API function GMT_Create_Session. So, GMT_SESSION_NAME should be defined before calling GMT_Create_Session.
  2. Data processing modules can be called in either classic mode or modern mode (i.e., inside gmt begin or not), but some modules behave different in classis/modern mode. For example, gmt makecpt writes the output to stdout in classic mode but to a hidden CPT file in modern mode. To make things as simple as possible, it's OK to always call gmt begin at the beginning.
  3. A GMT C API session is required when calling any GMT C API functions. However, a GMT CLI session is only required when calling GMT modules. For example, the following Python script works without a GMT CLI session:
from pygmt.clib import Session

with Session() as lib:
    lib.read_data("@earth_relief_01d_g", kind="grid")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussions Need more discussion before taking further actions
Projects
None yet
Development

No branches or pull requests

1 participant