Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Implement msgspec encoding #2541

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

edgarrmondragon
Copy link
Collaborator

@edgarrmondragon edgarrmondragon commented Jul 17, 2024

  • Docs
  • UX. Relying on mro magic is not great, e.g. why SQLiteTap(MsgSpecWriter, SQLTap) and not SQLiteTap(SQLTap, MsgSpecWriter). A better approach might be to make the IO implementation an attribute of the Singer class.
  • Fix tests

📚 Documentation preview 📚: https://meltano-sdk--2541.org.readthedocs.build/en/2541/

Copy link

codspeed-hq bot commented Jul 17, 2024

CodSpeed Performance Report

Merging #2541 will improve performances by ×12

Comparing edgarrmondragon/refactor/msgspec-impl-naive (9e46c44) with main (4ce312f)

Summary

⚡ 2 improvements
✅ 4 untouched benchmarks

Benchmarks breakdown

Benchmark main edgarrmondragon/refactor/msgspec-impl-naive Change
test_bench_deserialize_json 24.1 ms 5.5 ms ×4.4
test_bench_format_message 53.3 ms 4.4 ms ×12

Copy link

codecov bot commented Jul 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.32%. Comparing base (4ce312f) to head (9e46c44).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2541      +/-   ##
==========================================
- Coverage   89.82%   87.32%   -2.51%     
==========================================
  Files          58       59       +1     
  Lines        4886     4923      +37     
  Branches      960      960              
==========================================
- Hits         4389     4299      -90     
- Misses        346      462     +116     
- Partials      151      162      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@edgarrmondragon edgarrmondragon force-pushed the edgarrmondragon/refactor/msgspec-impl-naive branch from 97de9f7 to 1d9e947 Compare July 17, 2024 02:00
@edgarrmondragon edgarrmondragon force-pushed the edgarrmondragon/refactor/msgspec-impl-naive branch from 1d9e947 to 4febeba Compare July 17, 2024 02:59
@edgarrmondragon edgarrmondragon force-pushed the edgarrmondragon/refactor/msgspec-impl-naive branch from 4febeba to cbe10bd Compare July 17, 2024 03:00
@edgarrmondragon edgarrmondragon force-pushed the edgarrmondragon/refactor/msgspec-impl-naive branch from 8876b80 to f691f78 Compare July 25, 2024 15:45
@edgarrmondragon edgarrmondragon added this to the 0.41.0 milestone Aug 14, 2024
@BuzzCutNorman
Copy link
Contributor

I found that it helped to add a defualt_output to the for the SingerWriter to use. This allows you to make the write message a little generic.

default_output = sys.stdout.buffer

def write_message(self, message: Message) -> None:
	"""Write a message to stdout.

	Args:
		message: The message to write.
	"""
	self.default_output.write(self.format_message(message))
	self.default_output.flush()

@BuzzCutNorman
Copy link
Contributor

BuzzCutNorman commented Aug 16, 2024

In the json.py file I found to match the msgspec performance suggestions and fit into the framework you put in place I created a function for generating jsonl so I could keep the functionality of seralize_json to return strings. This way the serialize_json can be used in the connector engine creation and also in process_batch_files.

https://jcristharif.com/msgspec/perf-tips.html#line-delimited-json

json.py:

def serialize_json(obj: object, **kwargs: t.Any) -> str:
    """Serialize a dictionary into a line of json.

    Args:
        obj: A Python object usually a dict.
        **kwargs: Optional key word arguments.

    Returns:
        A string of serialized json.
    """
    return encoder.encode(obj).decode()

msg_buffer = bytearray(64)

def serialize_jsonl(obj: object, **kwargs: t.Any) -> bytes:
        """Serialize a dictionary into a line of jsonl.

        Args:
            obj: A Python object usually a dict.
            **kwargs: Optional key word arguments.

        Returns:
            A bytes of serialized json.
        """
        encoder.encode_into(obj, msg_buffer)
        msg_buffer.extend(b"\n")
        return msg_buffer

SingerWriter:

    def serialize_message(self, message: Message) -> str | bytes:
        """Serialize a dictionary into a line of json.

        Args:
            message: A Singer message object.

        Returns:
            A string of serialized json.
        """
        return serialize_jsonl(message.to_dict())

@edgarrmondragon edgarrmondragon added the Release Highlight Call this out in the release notes label Aug 22, 2024
@edgarrmondragon
Copy link
Collaborator Author

edgarrmondragon commented Sep 6, 2024

I found that it helped to add a defualt_output to the for the SingerWriter to use. This allows you to make the write message a little generic.

default_output = sys.stdout.buffer

def write_message(self, message: Message) -> None:
	"""Write a message to stdout.

	Args:
		message: The message to write.
	"""
	self.default_output.write(self.format_message(message))
	self.default_output.flush()

Do you mean in

class MsgSpecWriter(GenericSingerWriter[bytes, Message]):
"""Interface for all plugins writing Singer messages to stdout."""
def serialize_message(self, message: Message) -> bytes: # noqa: PLR6301
"""Serialize a dictionary into a line of json.
Args:
message: A Singer message object.
Returns:
A string of serialized json.
"""
return encoder.encode(message.to_dict())
def write_message(self, message: Message) -> None:
"""Write a message to stdout.
Args:
message: The message to write.
"""
sys.stdout.buffer.write(self.format_message(message) + b"\n")
sys.stdout.flush()

?

@edgarrmondragon edgarrmondragon force-pushed the edgarrmondragon/refactor/msgspec-impl-naive branch from 29dea7a to d23a8ab Compare September 6, 2024 18:56
@BuzzCutNorman
Copy link
Contributor

BuzzCutNorman commented Sep 6, 2024

Yes, that is exactly what I meant. Could have definitely been stated clearer on my part😅.

 class MsgSpecWriter(GenericSingerWriter[bytes, Message]): 
     """Interface for all plugins writing Singer messages to stdout.""" 
     
     default_output = sys.stdout.buffer
     
     def serialize_message(self, message: Message) -> bytes:  # noqa: PLR6301 
         """Serialize a dictionary into a line of json. 
  
         Args: 
             message: A Singer message object. 
  
         Returns: 
             A string of serialized json. 
         """ 
         return serialize_jsonl(message.to_dict()) 
  
     def write_message(self, message: Message) -> None: 
         """Write a message to stdout. 
  
         Args: 
             message: The message to write. 
         """ 
 	self.default_output.write(self.format_message(message))
	self.default_output.flush()

@edgarrmondragon edgarrmondragon force-pushed the edgarrmondragon/refactor/msgspec-impl-naive branch from d23a8ab to 3169b58 Compare September 6, 2024 19:15
@edgarrmondragon edgarrmondragon changed the title refactor: Implement (naive) msgspec encoding refactor: Implement msgspec encoding Sep 6, 2024
@edgarrmondragon
Copy link
Collaborator Author

Naive of me to think I could get this across in 1/2 a day of work 😅. I'll come back to this later, there's plenty of time until the planned release date.

@BuzzCutNorman
Copy link
Contributor

Like the pun 😊. Great dad joke material. Kind an inside joke now since you dropped (naive) from the title of the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants