Skip to content

Commit

Permalink
Add Markdown Render Support to GPT completions (#56)
Browse files Browse the repository at this point in the history
* Add Markdown support. Use rehype-raw to retain raw innerHTML

* Rm extra whitespace breaks

* Replace python-jose with pyjwt.

* Replace non-existent get_unverified_claims function

* Change Exception to handle JWT-specific errors

* Convert the public key to PEM format

* Add pem format tests

* Another test, plus autherror fixes

* Remove directives to not return markdown.

* Add table support

* Update snapshots with new system prompt

---------

Co-authored-by: pmorgen <[email protected]>
Co-authored-by: Pamela Fox <[email protected]>
Co-authored-by: Pamela Fox <[email protected]>
  • Loading branch information
4 people authored Aug 21, 2024
1 parent d313c6a commit 781bf21
Show file tree
Hide file tree
Showing 66 changed files with 2,291 additions and 2,514 deletions.
2 changes: 1 addition & 1 deletion app/backend/approaches/chatreadretrieveread.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ def __init__(
def system_message_chat_conversation(self):
return """Assistant helps the company employees with their healthcare plan questions, and questions about the employee handbook. Be brief in your answers.
Answer ONLY with the facts listed in the list of sources below. If there isn't enough information below, say you don't know. Do not generate answers that don't use the sources below. If asking a clarifying question to the user would help, ask the question.
For tabular information return it as an html table. Do not return markdown format. If the question is not in English, answer in the language used in the question.
If the question is not in English, answer in the language used in the question.
Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. Use square brackets to reference the source, for example [info1.txt]. Don't combine sources, list each source separately, for example [info1.txt][info2.pdf].
{follow_up_questions_prompt}
{injected_prompt}
Expand Down
1 change: 0 additions & 1 deletion app/backend/approaches/chatreadretrievereadvision.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,6 @@ def system_message_chat_conversation(self):
Answer the following question using only the data provided in the sources below.
If asking a clarifying question to the user would help, ask the question.
Be brief in your answers.
For tabular information return it as an html table. Do not return markdown format.
The text and image source can be the same file name, don't use the image title when citing the image source, only use the file name as mentioned
If you cannot answer using the sources below, say you don't know. Return just the answer without any input texts.
{follow_up_questions_prompt}
Expand Down
1 change: 0 additions & 1 deletion app/backend/approaches/retrievethenread.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ class RetrieveThenReadApproach(Approach):
"You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions. "
+ "Use 'you' to refer to the individual asking the questions even if they ask with 'I'. "
+ "Answer the following question using only the data provided in the sources below. "
+ "For tabular information return it as an html table. Do not return markdown format. "
+ "Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. "
+ "If you cannot answer using the sources below, say you don't know. Use below example to answer"
)
Expand Down
1 change: 0 additions & 1 deletion app/backend/approaches/retrievethenreadvision.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ class RetrieveThenReadVisionApproach(Approach):
+ "Each text source starts in a new line and has the file name followed by colon and the actual information "
+ "Always include the source name from the image or text for each fact you use in the response in the format: [filename] "
+ "Answer the following question using only the data provided in the sources below. "
+ "For tabular information return it as an html table. Do not return markdown format. "
+ "The text and image source can be the same file name, don't use the image title when citing the image source, only use the file name as mentioned "
+ "If you cannot answer using the sources below, say you don't know. Return just the answer without any input texts "
)
Expand Down
4,634 changes: 2,200 additions & 2,434 deletions app/frontend/package-lock.json

Large diffs are not rendered by default.

12 changes: 7 additions & 5 deletions app/frontend/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,30 +12,32 @@
"preview": "vite preview"
},
"dependencies": {
"@azure/msal-react": "^2.0.21",
"@azure/msal-browser": "^3.17.0",
"@azure/msal-react": "^2.0.21",
"@fluentui/react": "^8.112.5",
"@fluentui/react-components": "^9.54.13",
"@fluentui/react-icons": "^2.0.249",
"@react-spring/web": "^9.7.3",
"marked": "^13.0.2",
"dompurify": "^3.0.6",
"ndjson-readablestream": "^1.2.0",
"react": "^18.3.1",
"react-dom": "^18.3.1",
"react-markdown": "^9.0.1",
"react-router-dom": "^6.23.1",
"ndjson-readablestream": "^1.2.0",
"react-syntax-highlighter": "^15.5.0",
"rehype-raw": "^7.0.0",
"remark-gfm": "^4.0.0",
"scheduler": "^0.20.2"
},
"devDependencies": {
"@types/dom-speech-recognition": "^0.0.4",
"@types/dompurify": "^3.0.4",
"@types/react": "^18.3.3",
"@types/react-dom": "^18.3.0",
"@types/react-syntax-highlighter": "^15.5.13",
"@vitejs/plugin-react": "^4.3.1",
"prettier": "^3.0.3",
"typescript": "^5.5.3",
"@types/react-syntax-highlighter": "^15.5.13",
"@types/dom-speech-recognition": "^0.0.4",
"vite": "^4.5.3"
}
}
8 changes: 7 additions & 1 deletion app/frontend/src/components/Answer/Answer.module.css
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,13 @@
line-height: 1.375em;
padding-top: 1em;
padding-bottom: 1em;
white-space: pre-line;
}

.answerText h1,
h2,
h2 {
font-size: 1rem;
font-weight: bold;
}

.answerText table {
Expand Down
7 changes: 6 additions & 1 deletion app/frontend/src/components/Answer/Answer.tsx
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
import { useMemo } from "react";
import { Stack, IconButton } from "@fluentui/react";
import DOMPurify from "dompurify";
import ReactMarkdown from "react-markdown";
import remarkGfm from "remark-gfm";
import rehypeRaw from "rehype-raw";

import styles from "./Answer.module.css";
import { ChatAppResponse, getCitationFilePath } from "../../api";
Expand Down Expand Up @@ -71,7 +74,9 @@ export const Answer = ({
</Stack.Item>

<Stack.Item grow>
<div className={styles.answerText} dangerouslySetInnerHTML={{ __html: sanitizedAnswerHtml }}></div>
<div className={styles.answerText}>
<ReactMarkdown children={sanitizedAnswerHtml} rehypePlugins={[rehypeRaw]} remarkPlugins={[remarkGfm]} />
</div>
</Stack.Item>

{!!parsedAnswer.citations.length && (
Expand Down
25 changes: 13 additions & 12 deletions app/frontend/src/components/MarkdownViewer/MarkdownViewer.tsx
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
import { Spinner, SpinnerSize, MessageBar, MessageBarType, Link, IconButton } from "@fluentui/react";
import React, { useState, useEffect } from "react";
import { marked } from "marked";
import ReactMarkdown from "react-markdown";
import remarkGfm from "remark-gfm";

import styles from "./MarkdownViewer.module.css";
import { Spinner, SpinnerSize, MessageBar, MessageBarType, Link, IconButton } from "@fluentui/react";

interface MarkdownViewerProps {
src: string;
Expand All @@ -13,12 +15,12 @@ export const MarkdownViewer: React.FC<MarkdownViewerProps> = ({ src }) => {
const [error, setError] = useState<Error | null>(null);

/**
* Anchor links are not handled well by 'marked' and result in HTTP 404 errors as the URL they point to does not exist.
* This function removes them from the resulted HTML.
* Anchor links result in HTTP 404 errors as the URL they point to does not exist.
* This function removes them from the markdown.
*/
const removeAnchorLinks = (html: string) => {
const ancorLinksRegex = /<a\s+(?:[^>]*?\s+)?href=['"](#[^"']*?)['"][^>]*?>/g;
return html.replace(ancorLinksRegex, "");
const removeAnchorLinks = (markdown: string) => {
const ancorLinksRegex = /\[.*?\]\(#.*?\)/g;
return markdown.replace(ancorLinksRegex, "");
};

useEffect(() => {
Expand All @@ -30,10 +32,9 @@ export const MarkdownViewer: React.FC<MarkdownViewerProps> = ({ src }) => {
throw new Error("Failed loading markdown file.");
}

const markdownText = await response.text();
const parsedHtml = await marked.parse(markdownText);
const cleanedHtml = removeAnchorLinks(parsedHtml);
setContent(cleanedHtml);
let markdownText = await response.text();
markdownText = removeAnchorLinks(markdownText);
setContent(markdownText);
} catch (error: any) {
setError(error);
} finally {
Expand Down Expand Up @@ -70,7 +71,7 @@ export const MarkdownViewer: React.FC<MarkdownViewerProps> = ({ src }) => {
href={src}
download
/>
<div className={`${styles.markdown} ${styles.markdownViewer}`} dangerouslySetInnerHTML={{ __html: content }} />
<ReactMarkdown children={content} remarkPlugins={[remarkGfm]} className={`${styles.markdown} ${styles.markdownViewer}`} />
</div>
)}
</div>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
},
{
"description": [
"{'role': 'system', 'content': \"You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions. Use 'you' to refer to the individual asking the questions even if they ask with 'I'. Answer the following question using only the data provided in the sources below. For tabular information return it as an html table. Do not return markdown format. Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. If you cannot answer using the sources below, say you don't know. Use below example to answer\"}",
"{'role': 'system', 'content': \"You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions. Use 'you' to refer to the individual asking the questions even if they ask with 'I'. Answer the following question using only the data provided in the sources below. Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. If you cannot answer using the sources below, say you don't know. Use below example to answer\"}",
"{'role': 'user', 'content': \"\\n'What is the deductible for the employee plan for a visit to Overlake in Bellevue?'\\n\\nSources:\\ninfo1.txt: deductibles depend on whether you are in-network or out-of-network. In-network deductibles are $500 for employee and $1000 for family. Out-of-network deductibles are $1000 for employee and $2000 for family.\\ninfo2.pdf: Overlake is in-network for the employee plan.\\ninfo3.pdf: Overlake is the name of the area that includes a park and ride near Bellevue.\\ninfo4.pdf: In-network institutions include Overlake, Swedish and others in the region\\n\"}",
"{'role': 'assistant', 'content': 'In-network deductibles are $500 for employee and $1000 for family [info1.txt] and Overlake is in-network for the employee plan [info2.pdf][info4.pdf].'}",
"{'role': 'user', 'content': 'What is the capital of France?\\nSources:\\n Benefit_Options-2.pdf: There is a whistleblower policy.'}"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
},
{
"description": [
"{'role': 'system', 'content': \"You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions. Use 'you' to refer to the individual asking the questions even if they ask with 'I'. Answer the following question using only the data provided in the sources below. For tabular information return it as an html table. Do not return markdown format. Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. If you cannot answer using the sources below, say you don't know. Use below example to answer\"}",
"{'role': 'system', 'content': \"You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions. Use 'you' to refer to the individual asking the questions even if they ask with 'I'. Answer the following question using only the data provided in the sources below. Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. If you cannot answer using the sources below, say you don't know. Use below example to answer\"}",
"{'role': 'user', 'content': \"\\n'What is the deductible for the employee plan for a visit to Overlake in Bellevue?'\\n\\nSources:\\ninfo1.txt: deductibles depend on whether you are in-network or out-of-network. In-network deductibles are $500 for employee and $1000 for family. Out-of-network deductibles are $1000 for employee and $2000 for family.\\ninfo2.pdf: Overlake is in-network for the employee plan.\\ninfo3.pdf: Overlake is the name of the area that includes a park and ride near Bellevue.\\ninfo4.pdf: In-network institutions include Overlake, Swedish and others in the region\\n\"}",
"{'role': 'assistant', 'content': 'In-network deductibles are $500 for employee and $1000 for family [info1.txt] and Overlake is in-network for the employee plan [info2.pdf][info4.pdf].'}",
"{'role': 'user', 'content': 'What is the capital of France?\\nSources:\\n Benefit_Options-2.pdf: There is a whistleblower policy.'}"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
},
{
"description": [
"{'role': 'system', 'content': \"You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions. Use 'you' to refer to the individual asking the questions even if they ask with 'I'. Answer the following question using only the data provided in the sources below. For tabular information return it as an html table. Do not return markdown format. Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. If you cannot answer using the sources below, say you don't know. Use below example to answer\"}",
"{'role': 'system', 'content': \"You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions. Use 'you' to refer to the individual asking the questions even if they ask with 'I'. Answer the following question using only the data provided in the sources below. Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. If you cannot answer using the sources below, say you don't know. Use below example to answer\"}",
"{'role': 'user', 'content': \"\\n'What is the deductible for the employee plan for a visit to Overlake in Bellevue?'\\n\\nSources:\\ninfo1.txt: deductibles depend on whether you are in-network or out-of-network. In-network deductibles are $500 for employee and $1000 for family. Out-of-network deductibles are $1000 for employee and $2000 for family.\\ninfo2.pdf: Overlake is in-network for the employee plan.\\ninfo3.pdf: Overlake is the name of the area that includes a park and ride near Bellevue.\\ninfo4.pdf: In-network institutions include Overlake, Swedish and others in the region\\n\"}",
"{'role': 'assistant', 'content': 'In-network deductibles are $500 for employee and $1000 for family [info1.txt] and Overlake is in-network for the employee plan [info2.pdf][info4.pdf].'}",
"{'role': 'user', 'content': 'What is the capital of France?\\nSources:\\n Benefit_Options-2.pdf: There is a whistleblower policy.'}"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
},
{
"description": [
"{'role': 'system', 'content': \"You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions. Use 'you' to refer to the individual asking the questions even if they ask with 'I'. Answer the following question using only the data provided in the sources below. For tabular information return it as an html table. Do not return markdown format. Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. If you cannot answer using the sources below, say you don't know. Use below example to answer\"}",
"{'role': 'system', 'content': \"You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions. Use 'you' to refer to the individual asking the questions even if they ask with 'I'. Answer the following question using only the data provided in the sources below. Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. If you cannot answer using the sources below, say you don't know. Use below example to answer\"}",
"{'role': 'user', 'content': \"\\n'What is the deductible for the employee plan for a visit to Overlake in Bellevue?'\\n\\nSources:\\ninfo1.txt: deductibles depend on whether you are in-network or out-of-network. In-network deductibles are $500 for employee and $1000 for family. Out-of-network deductibles are $1000 for employee and $2000 for family.\\ninfo2.pdf: Overlake is in-network for the employee plan.\\ninfo3.pdf: Overlake is the name of the area that includes a park and ride near Bellevue.\\ninfo4.pdf: In-network institutions include Overlake, Swedish and others in the region\\n\"}",
"{'role': 'assistant', 'content': 'In-network deductibles are $500 for employee and $1000 for family [info1.txt] and Overlake is in-network for the employee plan [info2.pdf][info4.pdf].'}",
"{'role': 'user', 'content': 'What is the capital of France?\\nSources:\\n Benefit_Options-2.pdf: There is a whistleblower policy.'}"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
},
{
"description": [
"{'role': 'system', 'content': \"You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions. Use 'you' to refer to the individual asking the questions even if they ask with 'I'. Answer the following question using only the data provided in the sources below. For tabular information return it as an html table. Do not return markdown format. Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. If you cannot answer using the sources below, say you don't know. Use below example to answer\"}",
"{'role': 'system', 'content': \"You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions. Use 'you' to refer to the individual asking the questions even if they ask with 'I'. Answer the following question using only the data provided in the sources below. Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. If you cannot answer using the sources below, say you don't know. Use below example to answer\"}",
"{'role': 'user', 'content': \"\\n'What is the deductible for the employee plan for a visit to Overlake in Bellevue?'\\n\\nSources:\\ninfo1.txt: deductibles depend on whether you are in-network or out-of-network. In-network deductibles are $500 for employee and $1000 for family. Out-of-network deductibles are $1000 for employee and $2000 for family.\\ninfo2.pdf: Overlake is in-network for the employee plan.\\ninfo3.pdf: Overlake is the name of the area that includes a park and ride near Bellevue.\\ninfo4.pdf: In-network institutions include Overlake, Swedish and others in the region\\n\"}",
"{'role': 'assistant', 'content': 'In-network deductibles are $500 for employee and $1000 for family [info1.txt] and Overlake is in-network for the employee plan [info2.pdf][info4.pdf].'}",
"{'role': 'user', 'content': 'What is the capital of France?\\nSources:\\n Benefit_Options-2.pdf: There is a whistleblower policy.'}"
Expand Down
Loading

0 comments on commit 781bf21

Please sign in to comment.