Skip to content

Commit 1d5d044

Browse files
authored
Fetch and Verify clarifications (#18829)
1 parent d8787f3 commit 1d5d044

File tree

6 files changed

+13
-13
lines changed

6 files changed

+13
-13
lines changed
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
Use [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %}) to migrate [CSV]({% link {{ page.version.version }}/migrate-from-csv.md %}), TSV, or [Avro]({% link {{ page.version.version }}/migrate-from-avro.md %}) data stored via [userfile]({% link {{ page.version.version }}/use-userfile-storage.md %}) or [cloud storage]({% link {{ page.version.version }}/use-cloud-storage.md %}) into pre-existing tables on CockroachDB. This option achieves the highest throughput, but [requires taking the tables **offline**]({% link {{ page.version.version }}/import-into.md %}#considerations) to achieve its import speed.
1+
Use [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %}) to migrate [CSV]({% link {{ page.version.version }}/migrate-from-csv.md %}), TSV, or [Avro]({% link {{ page.version.version }}/migrate-from-avro.md %}) data stored via [userfile]({% link {{ page.version.version }}/use-userfile-storage.md %}) or [cloud storage]({% link {{ page.version.version }}/use-cloud-storage.md %}) into pre-existing tables on CockroachDB. This option achieves the highest throughput, but [requires taking the CockroachDB tables **offline**]({% link {{ page.version.version }}/import-into.md %}#considerations) to achieve its import speed.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
Use [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %}) to migrate [CSV]({% link {{ page.version.version }}/migrate-from-csv.md %}), TSV, or [Avro]({% link {{ page.version.version }}/migrate-from-avro.md %}) data stored via [userfile]({% link {{ page.version.version }}/use-userfile-storage.md %}) or [cloud storage]({% link {{ page.version.version }}/use-cloud-storage.md %}) into pre-existing tables on CockroachDB. This option achieves the highest throughput, but [requires taking the tables **offline**]({% link {{ page.version.version }}/import-into.md %}#considerations) to achieve its import speed.
1+
Use [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %}) to migrate [CSV]({% link {{ page.version.version }}/migrate-from-csv.md %}), TSV, or [Avro]({% link {{ page.version.version }}/migrate-from-avro.md %}) data stored via [userfile]({% link {{ page.version.version }}/use-userfile-storage.md %}) or [cloud storage]({% link {{ page.version.version }}/use-cloud-storage.md %}) into pre-existing tables on CockroachDB. This option achieves the highest throughput, but [requires taking the CockroachDB tables **offline**]({% link {{ page.version.version }}/import-into.md %}#considerations) to achieve its import speed.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
Use [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %}) to migrate [CSV]({% link {{ page.version.version }}/migrate-from-csv.md %}), TSV, or [Avro]({% link {{ page.version.version }}/migrate-from-avro.md %}) data stored via [userfile]({% link {{ page.version.version }}/use-userfile-storage.md %}) or [cloud storage]({% link {{ page.version.version }}/use-cloud-storage.md %}) into pre-existing tables on CockroachDB. This option achieves the highest throughput, but [requires taking the tables **offline**]({% link {{ page.version.version }}/import-into.md %}#considerations) to achieve its import speed.
1+
Use [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %}) to migrate [CSV]({% link {{ page.version.version }}/migrate-from-csv.md %}), TSV, or [Avro]({% link {{ page.version.version }}/migrate-from-avro.md %}) data stored via [userfile]({% link {{ page.version.version }}/use-userfile-storage.md %}) or [cloud storage]({% link {{ page.version.version }}/use-cloud-storage.md %}) into pre-existing tables on CockroachDB. This option achieves the highest throughput, but [requires taking the CockroachDB tables **offline**]({% link {{ page.version.version }}/import-into.md %}#considerations) to achieve its import speed.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
Use [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %}) to migrate [CSV]({% link {{ page.version.version }}/migrate-from-csv.md %}), TSV, or [Avro]({% link {{ page.version.version }}/migrate-from-avro.md %}) data stored via [userfile]({% link {{ page.version.version }}/use-userfile-storage.md %}) or [cloud storage]({% link {{ page.version.version }}/use-cloud-storage.md %}) into pre-existing tables on CockroachDB. This option achieves the highest throughput, but [requires taking the tables **offline**]({% link {{ page.version.version }}/import-into.md %}#considerations) to achieve its import speed.
1+
Use [`IMPORT INTO`]({% link {{ page.version.version }}/import-into.md %}) to migrate [CSV]({% link {{ page.version.version }}/migrate-from-csv.md %}), TSV, or [Avro]({% link {{ page.version.version }}/migrate-from-avro.md %}) data stored via [userfile]({% link {{ page.version.version }}/use-userfile-storage.md %}) or [cloud storage]({% link {{ page.version.version }}/use-cloud-storage.md %}) into pre-existing tables on CockroachDB. This option achieves the highest throughput, but [requires taking the CockroachDB tables **offline**]({% link {{ page.version.version }}/import-into.md %}#considerations) to achieve its import speed.

src/current/molt/molt-fetch.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -108,8 +108,8 @@ Cockroach Labs **strongly** recommends the following:
108108

109109
~~~ shell
110110
molt fetch \
111-
--source '$SOURCE' \
112-
--target '$TARGET' \
111+
--source $SOURCE \
112+
--target $TARGET \
113113
--table-filter 'employees' \
114114
--bucket-path 's3://molt-test' \
115115
--table-handling truncate-if-exists
@@ -198,9 +198,9 @@ To verify that your connections and configuration work properly, run MOLT Fetch
198198
| `--flush-size` | Size (in bytes) before the source data is flushed to intermediate files. **Note:** If `--flush-rows` is also specified, the fetch behavior is based on the flag whose criterion is met first. |
199199
| `--import-batch-size` | The number of files to be imported at a time to the target database. This applies only when using [`IMPORT INTO`](#data-movement) to load data into the target. **Note:** Increasing this value can improve the performance of full-scan queries on the target database shortly after fetch completes, but very high values are not recommended. If any individual file in the import batch fails, you must [retry](#fetch-continuation) the entire batch.<br><br>**Default:** `1000` |
200200
| `--local-path` | The path within the [local file server](#local-file-server) where intermediate files are written (e.g., `data/migration/cockroach`). `--local-path-listen-addr` must be specified. |
201-
| `--local-path-crdb-access-addr` | Address of a [local file server](#local-file-server) that is reachable by CockroachDB. This flag is only necessary if CockroachDB cannot reach the local address specified with `--local-path-listen-addr` (e.g., when moving data to a CockroachDB {{ site.data.products.cloud }} deployment). `--local-path` and `--local-path-listen-addr` must be specified.<br><br>**Default:** Value of `--local-path-listen-addr`. |
201+
| `--local-path-crdb-access-addr` | Address of a [local file server](#local-file-server) that is **publicly accessible**. This flag is only necessary if CockroachDB cannot reach the local address specified with `--local-path-listen-addr` (e.g., when moving data to a CockroachDB {{ site.data.products.cloud }} deployment). `--local-path` and `--local-path-listen-addr` must be specified.<br><br>**Default:** Value of `--local-path-listen-addr`. |
202202
| `--local-path-listen-addr` | Write intermediate files to a [local file server](#local-file-server) at the specified address (e.g., `'localhost:3000'`). `--local-path` must be specified. |
203-
| `--log-file` | Write messages to the specified log filename. If not specified, messages are only written to `stdout`. |
203+
| `--log-file` | Write messages to the specified log filename. If no filename is provided, messages write to `fetch-{datetime}.log`. If `"stdout"` is provided, messages write to `stdout`. |
204204
| `--logging` | Level at which to log messages (`trace`/`debug`/`info`/`warn`/`error`/`fatal`/`panic`).<br><br>**Default:** `info` |
205205
| `--metrics-listen-addr` | Address of the metrics endpoint, which has the path `{address}/metrics`.<br><br>**Default:** `'127.0.0.1:3030'` |
206206
| `--mode` | Configure the MOLT Fetch behavior: `data-load`, `data-load-and-replication`, `replication-only`, `export-only`, or `import-only`. For details, refer to [Fetch mode](#fetch-mode).<br><br>**Default:** `data-load` |
@@ -222,7 +222,6 @@ To verify that your connections and configuration work properly, run MOLT Fetch
222222
| `--use-copy` | Use [`COPY FROM`](#data-movement) to move data. This makes tables queryable during data load, but is slower than using `IMPORT INTO`. For details, refer to [Data movement](#data-movement). |
223223
| `--use-implicit-auth` | Use [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}) for [cloud storage](#cloud-storage) URIs. |
224224

225-
226225
### `tokens list` flags
227226

228227
| Flag | Description |
@@ -380,7 +379,7 @@ MOLT Fetch can use either [`IMPORT INTO`]({% link {{site.current_cloud_version}}
380379

381380
By default, MOLT Fetch uses `IMPORT INTO`:
382381

383-
- `IMPORT INTO` achieves the highest throughput, but [requires taking the tables **offline**]({% link {{site.current_cloud_version}}/import-into.md %}#considerations) to achieve its import speed. Tables are taken back online once an [import job]({% link {{site.current_cloud_version}}/import-into.md %}#view-and-control-import-jobs) completes successfully. See [Best practices](#best-practices).
382+
- `IMPORT INTO` achieves the highest throughput, but [requires taking the CockroachDB tables **offline**]({% link {{site.current_cloud_version}}/import-into.md %}#considerations) to achieve its import speed. Tables are taken back online once an [import job]({% link {{site.current_cloud_version}}/import-into.md %}#view-and-control-import-jobs) completes successfully. See [Best practices](#best-practices).
384383
- `IMPORT INTO` supports compression using the `--compression` flag, which reduces the amount of storage used.
385384

386385
`--use-copy` configures MOLT Fetch to use `COPY FROM`:
@@ -402,7 +401,7 @@ MOLT Fetch can move the source data to CockroachDB via [cloud storage](#cloud-st
402401
Only the path specified in `--bucket-path` is used. Query parameters, such as credentials, are ignored. To authenticate cloud storage, follow the steps in [Secure cloud storage](#secure-cloud-storage).
403402
{{site.data.alerts.end}}
404403

405-
`--bucket-path` specifies that MOLT Fetch should write intermediate files to a path within a [Google Cloud Storage](https://cloud.google.com/storage/docs/buckets) or [Amazon S3](https://aws.amazon.com/s3/) bucket to which you have the necessary permissions. For example:
404+
`--bucket-path` instructs MOLT Fetch to write intermediate files to a path within a [Google Cloud Storage](https://cloud.google.com/storage/docs/buckets) or [Amazon S3](https://aws.amazon.com/s3/) bucket to which you have the necessary permissions. For example:
406405

407406
Google Cloud Storage:
408407

@@ -422,7 +421,7 @@ Cloud storage can be used to move data with either [`IMPORT INTO` or `COPY FROM`
422421

423422
#### Local file server
424423

425-
`--local-path` specifies that MOLT Fetch should write intermediate files to a path within a [local file server]({% link {{site.current_cloud_version}}/use-a-local-file-server.md %}). `local-path-listen-addr` specifies the address of the local file server. For example:
424+
`--local-path` instructs MOLT Fetch to write intermediate files to a path within a [local file server]({% link {{site.current_cloud_version}}/use-a-local-file-server.md %}). `local-path-listen-addr` specifies the address of the local file server. For example:
426425

427426
{% include_cached copy-clipboard.html %}
428427
~~~
@@ -432,7 +431,7 @@ Cloud storage can be used to move data with either [`IMPORT INTO` or `COPY FROM`
432431

433432
In some cases, CockroachDB will not be able to use the local address specified by `--local-path-listen-addr`. This will depend on where CockroachDB is deployed, the runtime OS, and the source dialect.
434433

435-
For example, if you are migrating to CockroachDB {{ site.data.products.cloud }}, such that the {{ site.data.products.cloud }} cluster is in a different physical location than the machine running `molt fetch`, then CockroachDB cannot reach an address such as `localhost:3000`. In these situations, use `--local-path-crdb-access-addr` to specify an address for the local file server that is reachable by CockroachDB. For example:
434+
For example, if you are migrating to CockroachDB {{ site.data.products.cloud }}, such that the {{ site.data.products.cloud }} cluster is in a different physical location than the machine running `molt fetch`, then CockroachDB cannot reach an address such as `localhost:3000`. In these situations, use `--local-path-crdb-access-addr` to specify an address for the local file server that is **publicly accessible**. For example:
436435

437436
{% include_cached copy-clipboard.html %}
438437
~~~

src/current/molt/molt-verify.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ Flag | Description
7070
`--concurrency` | Number of threads to process at a time when reading the tables. <br>**Default:** 16 <br>For faster verification, set this flag to a higher value. {% comment %}<br>Note: Table splitting by shard only works for [`INT`]({% link {{site.current_cloud_version}}/int.md %}), [`UUID`]({% link {{site.current_cloud_version}}/uuid.md %}), and [`FLOAT`]({% link {{site.current_cloud_version}}/float.md %}) data types.{% endcomment %}
7171
`--continuous` | Verify tables in a continuous loop. <br />**Default:** `false`
7272
`--live` | Retry verification on rows before emitting warnings or errors. This is useful during live data import, when temporary mismatches can occur. <br />**Default:** `false`
73+
`--log-file` | Write messages to the specified log filename. If no filename is provided, messages write to `verify-{datetime}.log`. If `"stdout"` is provided, messages write to `stdout`.
7374
`--metrics-listen-addr` | Address of the metrics endpoint, which has the path `{address}/metrics`.<br><br>**Default:** `'127.0.0.1:3030'` |
7475
`--row-batch-size` | Number of rows to get from a table at a time. <br>**Default:** 20000
7576
`--schema-filter` | Verify schemas that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression).

0 commit comments

Comments
 (0)