Skip to content

Commit

Permalink
feat: Add output_path property, and streamline implementation (#50)
Browse files Browse the repository at this point in the history
## About

Users got confused about the semantics of the `output_path_prefix`
property. This patch makes it better by renaming it to `output_path`,
while still providing backwards-compatibility.

Further, the `destination_path` property can also be used, thus this
target can be a drop-in replacement to the `hotgluexyz` variant.

The value of the variable does not need to be configured using a
trailing slash any longer. Instead, the implementation more thoroughly
leverages `pathlib.Path` for concatenating `output_path` and `filename`.

## References

- GH-9

## Thoughts

I am not sure about fc54324. I added it because mypy tripped like
described in the commit message. Please educate me if that was wrong, so
I will either remove it or amend it correspondingly.
  • Loading branch information
amotl authored Dec 18, 2023
1 parent 639d90c commit 2b156b6
Show file tree
Hide file tree
Showing 4 changed files with 49 additions and 16 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Built with the [Meltano SDK](https://sdk.meltano.com) for Singer Taps and Target

| Setting | Required | Default | Description |
|:-------------------------|:--------:|:-------:|:------------|
| output_path_prefix | False | None | Optional path prefix which will be prepended to the file path indicated by `file_naming_schema`. |
| output_path | False | None | Filesystem path where to store output files. By default, the current working directory will be used. |
| file_naming_scheme | False | {stream_name}.csv | The scheme with which output files will be named. Naming scheme may leverage any of the following substitutions:<BR/>- `{stream_name}`<BR/>- `{datestamp}`<BR/>- `{timestamp}` |
| datestamp_format | False | %Y-%m-%d | A python format string to use when outputting the `{datestamp}` string. For reference, see: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes |
| timestamp_format | False | %Y-%m-%d.T%H%M%S | A python format string to use when outputting the `{timestamp}` string. For reference, see: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes |
Expand Down
4 changes: 2 additions & 2 deletions meltano.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ plugins:
- catalog
- discover
settings:
- name: output_path_prefix
- name: output_path
kind: string
config:
output_path_prefix: ./.output/
output_path: ./.output
40 changes: 30 additions & 10 deletions target_csv/sinks.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@

import datetime
import sys
import warnings
from pathlib import Path
from typing import Any, Dict, List, Optional

import pytz
from singer_sdk import PluginBase
from singer_sdk import Target
from singer_sdk.sinks import BatchSink

from target_csv.serialization import write_csv
Expand All @@ -19,7 +20,7 @@ class CSVSink(BatchSink):

def __init__( # noqa: D107
self,
target: PluginBase,
target: Target,
stream_name: str,
schema: Dict,
key_properties: Optional[List[str]],
Expand All @@ -45,21 +46,40 @@ def filepath_replacement_map(self) -> Dict[str, str]: # noqa: D102
}

@property
def destination_path(self) -> Path: # noqa: D102
result = self.config["file_naming_scheme"]
def output_file(self) -> Path: # noqa: D102
filename = self.config["file_naming_scheme"]
for key, val in self.filepath_replacement_map.items():
replacement_pattern = "{" f"{key}" "}"
if replacement_pattern in result:
result = result.replace(replacement_pattern, val)
if replacement_pattern in filename:
filename = filename.replace(replacement_pattern, val)

if "output_path_prefix" in self.config:
warnings.warn(
"The property `output_path_prefix` is deprecated, "
"please use `output_path`.",
category=UserWarning,
)

# Accept all possible properties defining the output path.
# - output_path: The new designated property.
# - destination_path: Alias for `output_path` (`hotgluexyz` compat).
# - output_path_prefix: The property used up until now.
output_path = self.config.get(
"output_path",
self.config.get(
"destination_path", self.config.get("output_path_prefix", None)
),
)

if self.config.get("output_path_prefix", None) is not None:
result = f"{self.config['output_path_prefix']}{result}"
filepath = Path(filename)
if output_path is not None:
filepath = Path(output_path) / filepath

return Path(result)
return filepath

def process_batch(self, context: dict) -> None:
"""Write out any prepped records and return once fully written."""
output_file: Path = self.destination_path
output_file: Path = self.output_file
self.logger.info(f"Writing to destination file '{output_file.resolve()}'...")
new_contents: dict # noqa: F842
create_new = (
Expand Down
19 changes: 16 additions & 3 deletions target_csv/target.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,26 @@ class TargetCSV(Target):
name = "target-csv"
config_jsonschema = th.PropertiesList(
th.Property(
"output_path_prefix",
"output_path",
th.StringType,
description=(
"Filesystem path where to store output files. "
"By default, the current working directory will be used."
),
),
th.Property(
"destination_path",
th.StringType,
description=(
"Optional path prefix which will be prepended to "
"the file path indicated by `file_naming_schema`."
"Filesystem path where to store output files. Alias for "
"`output_path` to be compatible with the `hotgluexyz` variant."
),
),
th.Property(
"output_path_prefix",
th.StringType,
description=("DEPRECATED. Filesystem path where to store output files."),
),
th.Property(
"file_naming_scheme",
th.StringType,
Expand Down

0 comments on commit 2b156b6

Please sign in to comment.