Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid ParsedTranscriptData range for non-ASCII transcript. #79

Open
akonior opened this issue Nov 29, 2024 · 0 comments
Open

Invalid ParsedTranscriptData range for non-ASCII transcript. #79

akonior opened this issue Nov 29, 2024 · 0 comments
Assignees

Comments

@akonior
Copy link

akonior commented Nov 29, 2024

If the server sends a JSON response containing non-ASCII (UTF-8) characters, a bug arises in the code that generates ranges for this transcript. This occurs because of how byte arrays are converted to strings, leading to discrepancies in character lengths.

async transcript(): Promise<{
  sent: string;
  recv: string;
  ranges: { recv: ParsedTranscriptData; sent: ParsedTranscriptData };
}> {
  const transcript = this.#prover.transcript();
  const recv = Buffer.from(transcript.recv).toString();
  const sent = Buffer.from(transcript.sent).toString();
  return {
    recv,
    sent,
    ranges: {
      recv: processTranscript(recv),
      sent: processTranscript(sent),
    },
  };
}

In this code const transcript = this.#prover.transcript(); is an array of bytes but const recv = Buffer.from(transcript.recv).toString(); is a string. If there are some utf8 (non-ASCII) characters in transcipt then

transcript.recv.length > Buffer.from(transcript.recv).toString().length

The length mismatch propagates into the processTranscript function:

export function processTranscript(transcript: string): ParsedTranscriptData {
  const returnVal: ParsedTranscriptData = {
    all: {
      start: 0,
      end: transcript.length,
    },

Here, transcript.length represents the number of characters, not bytes. When non-ASCII characters are present, this value is shorter than the actual byte length of the transcript. This discrepancy causes end, _processEOL and subsequent JSON parsing to behave incorrectly, as the range boundaries do not align with the original byte array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants