Skip to content

Commit

Permalink
[ZEPPELIN-5527] Remove the dependency of markdown from `zeppelin-ju…
Browse files Browse the repository at this point in the history
…pyter`

### What is this PR for?
Simplifying dependences.

### What type of PR is it?
[Improvement]

### Todos
* [x] - Remove the dependency of `markdown` and implement a markdown parser directly

### What is the Jira issue?
* Jira https://issues.apache.org/jira/browse/ZEPPELIN-5527

### How should this be tested?
Import Jupyter notebook

### Screenshots (if appropriate)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Jongyoul Lee <[email protected]>

Closes apache#4228 from jongyoul/ZEPPELIN-5527 and squashes the following commits:

acf087e [Jongyoul Lee] Fix tests
fae3fc1 [Jongyoul Lee] Remove a redundant newline
790f0cc [Jongyoul Lee] [ZEPPELIN-5527] Remove the dependency of `markdown` from `zeppelin-jupyter`
  • Loading branch information
jongyoul committed Sep 21, 2021
1 parent 056f952 commit 6dbf9a9
Show file tree
Hide file tree
Showing 11 changed files with 106 additions and 47 deletions.
4 changes: 2 additions & 2 deletions STYLE.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ app/styles/looknfeel
Overall look and theme of the Zeppelin notebook page can be customized here.

### Code Syntax Highlighting
There are two parts to code highlighting. First, Zeppelin uses the Ace Editor for its note paragraphs. Color style for this can be changed by setting theme on the editor instance. Second, Zeppelin's Markdown interpreter calls pegdown parser to emit HTML, and such content may contain &lt;pre&gt;&lt;code&gt; tags that can be consumed by Highlight.js.
There are two parts to code highlighting. First, Zeppelin uses the Ace Editor for its note paragraphs. Color style for this can be changed by setting theme on the editor instance. Second, Zeppelin's Markdown interpreter calls flexmark parser to emit HTML, and such content may contain &lt;pre&gt;&lt;code&gt; tags that can be consumed by Highlight.js.

#### Theme on Ace Editor
app/scripts/controllers/paragraph.js
Expand All @@ -16,7 +16,7 @@ Call setTheme on the editor with the theme path/name.
[List of themes on GitHub](https://github.com/ajaxorg/ace/tree/master/lib/ace/theme)

#### Style for Markdown Code Blocks
Highlight.js parses and converts &lt;pre&gt;&lt;code&gt; blocks from pegdown parser into keywords and language syntax with proper styles. It also attempts to infer the best fitting language if it is not provided. The visual style can be changed by simply including the desired [stylesheet](https://github.com/components/highlightjs/tree/master/styles) into app/index.html. See the next section on build.
Highlight.js parses and converts &lt;pre&gt;&lt;code&gt; blocks from markdown parser into keywords and language syntax with proper styles. It also attempts to infer the best fitting language if it is not provided. The visual style can be changed by simply including the desired [stylesheet](https://github.com/components/highlightjs/tree/master/styles) into app/index.html. See the next section on build.

Note that code block background color is overriden in app/styles/notebook.css (look for .paragraph .tableDisplay .hljs).

Expand Down
13 changes: 4 additions & 9 deletions docs/interpreter/markdown.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ limitations under the License.

## Overview
[Markdown](http://daringfireball.net/projects/markdown/) is a plain text formatting syntax designed so that it can be converted to HTML.
Apache Zeppelin uses [flexmark](https://github.com/vsch/flexmark-java), [pegdown](https://github.com/sirthias/pegdown) and [markdown4j](https://github.com/jdcasey/markdown4j) as markdown parsers.
Apache Zeppelin uses [flexmark](https://github.com/vsch/flexmark-java) and [markdown4j](https://github.com/jdcasey/markdown4j) as markdown parsers.

In Zeppelin notebook, you can use ` %md ` in the beginning of a paragraph to invoke the Markdown interpreter and generate static html from Markdown plain text.

In Zeppelin, Markdown interpreter is enabled by default and uses the [pegdown](https://github.com/sirthias/pegdown) parser.
In Zeppelin, Markdown interpreter is enabled by default and uses the [flexmark](https://github.com/vsch/flexmark-java) parser.

<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/markdown-interpreter-setting.png" width="60%" />

Expand All @@ -54,7 +54,7 @@ For more information, please see [Mathematical Expression](../usage/display_syst
<tr>
<td>markdown.parser.type</td>
<td>flexmark</td>
<td>Markdown Parser Type. <br/> Available values: flexmark, pegdown, markdown4j.</td>
<td>Markdown Parser Type. <br/> Available values: flexmark, markdown4j.</td>
</tr>
</table>

Expand All @@ -68,13 +68,8 @@ CommonMark/Markdown Java parser with source level AST.

<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/markdown-example-flexmark-parser-extensions.png" width="70%" />

### Pegdown Parser

`pegdown` parser provides github flavored markdown. Although still one of the most popular Markdown parsing libraries for the JVM, pegdown has reached its end of life.
The project is essentially unmaintained with tickets piling up and crucial bugs not being fixed.`pegdown`'s parsing performance isn't great. But keep this parser for the backward compatibility.

### Markdown4j Parser

Since `pegdown` parser is more accurate and provides much more markdown syntax `markdown4j` option might be removed later. But keep this parser for the backward compatibility.
Since `flexmark` parser is more accurate and provides much more markdown syntax `markdown4j` option might be removed later. But keep this parser for the backward compatibility.


3 changes: 3 additions & 0 deletions docs/setup/operation/upgrading.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ So, copying `notebook` and `conf` directory should be enough.

## Migration Guide

### Upgrading from Zeppelin 0.9, 0.10 to 0.11
- From 0.11, The type of `Pegdown` for parsing markdown was deprecated ([ZEPPELIN-5529](https://issues.apache.org/jira/browse/ZEPPELIN-2619)). It will use `Flexmark` instead.

### Upgrading from Zeppelin 0.8 to 0.9

- From 0.9, we changed the notes file name structure ([ZEPPELIN-2619](https://issues.apache.org/jira/browse/ZEPPELIN-2619)). So when you upgrading zeppelin to 0.9, you need to upgrade note files. Here's steps you need to follow:
Expand Down
8 changes: 4 additions & 4 deletions markdown/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Overview
Markdown parsers for Apache Zeppelin. Markdown is a plain text formatting syntax designed so that it can be converted to HTML. Apache Zeppelin uses `flexmark`, `pegdown` and `markdown4j`.
Since both `pegdown` and `markdown4j` are deprecated but it support for backward compatibility.
Markdown parsers for Apache Zeppelin. Markdown is a plain text formatting syntax designed so that it can be converted to HTML. Apache Zeppelin uses `flexmark` and `markdown4j`.
Since `markdown4j` are deprecated but it supports for backward compatibility.

# Architecture
Current interpreter implementation creates the instance of parser based on the configuration parameter provided, default is `flexmark` through `Markdown` and render the text into html.
Expand All @@ -18,7 +18,7 @@ CommonMark/Markdown Java parser with source level AST.
* maven dependency to add in pom.xml

```
<flexmark.all.version>0.50.40</flexmark.all.version>
<flexmark.all.version>0.62.2</flexmark.all.version>
<dependency>
<groupId>com.vladsch.flexmark</groupId>
Expand All @@ -31,4 +31,4 @@ CommonMark/Markdown Java parser with source level AST.
To support, YUML and websequnce diagram, need to build the image URL from the respective block and render it into HTML, So it requires
to implement some custom classes. `UMLExtension` is base class which has factory for other classes like `UMLBlockQuoteParser` and `UMLNodeRenderer`.
`UMLBlockQuoteParser` which parses the UML block and creates block quote node `UMLBlockQuote`.
`UMLNodeRenderer` which builds the URL using this block quote node `UMLBlockQuote` and render it as image into HTML.
`UMLNodeRenderer` which builds the URL using this block quote node `UMLBlockQuote` and render it as image into HTML.
12 changes: 1 addition & 11 deletions markdown/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
Expand All @@ -34,8 +34,6 @@
<properties>
<interpreter.name>md</interpreter.name>
<markdown4j.version>2.2-cj-1.0</markdown4j.version>
<pegdown.version>1.6.0</pegdown.version>
<flexmark.all.version>0.62.2</flexmark.all.version>
</properties>

<dependencies>
Expand All @@ -54,14 +52,6 @@
<dependency>
<groupId>com.vladsch.flexmark</groupId>
<artifactId>flexmark-all</artifactId>
<version>${flexmark.all.version}</version>
<exclusions>
<!-- jcl-over-slf4j is provided by zeppelin-interprerter -->
<exclusion>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
Expand Down
25 changes: 20 additions & 5 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@
<slf4j.version>1.7.30</slf4j.version>
<log4j.version>1.2.17</log4j.version>
<libthrift.version>0.13.0</libthrift.version>
<flexmark.all.version>0.62.2</flexmark.all.version>
<gson.version>2.8.6</gson.version>
<gson-extras.version>0.2.2</gson-extras.version>
<jetty.version>9.4.31.v20200723</jetty.version>
Expand Down Expand Up @@ -151,7 +152,7 @@
<hadoop3.1.version>3.1.3</hadoop3.1.version>
<hadoop3.2.version>3.2.0</hadoop3.2.version>
<hadoop.version>${hadoop2.7.version}</hadoop.version>

<hadoop.deps.scope>provided</hadoop.deps.scope>
<quartz.scheduler.version>2.3.2</quartz.scheduler.version>
<jettison.version>1.4.0</jettison.version>
Expand Down Expand Up @@ -214,6 +215,20 @@

<dependencyManagement>
<dependencies>
<!-- markdown -->
<dependency>
<groupId>com.vladsch.flexmark</groupId>
<artifactId>flexmark-all</artifactId>
<version>${flexmark.all.version}</version>
<exclusions>
<!-- jcl-over-slf4j is provided -->
<exclusion>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
</exclusion>
</exclusions>
</dependency>

<!-- Logging -->
<dependency>
<groupId>org.slf4j</groupId>
Expand Down Expand Up @@ -1243,7 +1258,7 @@
<artifactId>scala-maven-plugin</artifactId>
<version>${plugin.scala.alchim31.version}</version>
</plugin>

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
Expand Down Expand Up @@ -1379,7 +1394,7 @@
<artifactId>maven-eclipse-plugin</artifactId>
<version>${plugin.eclipse.version}</version>
</plugin>

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
Expand Down Expand Up @@ -1446,7 +1461,7 @@
<artifactId>frontend-maven-plugin</artifactId>
<version>${plugin.frontend.version}</version>
</plugin>

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-failsafe-plugin</artifactId>
Expand Down Expand Up @@ -1518,7 +1533,7 @@
<artifactId>apache-rat-plugin</artifactId>
<version>${plugin.rat.version}</version>
</plugin>

</plugins>
</pluginManagement>
</build>
Expand Down
1 change: 0 additions & 1 deletion zeppelin-distribution/src/bin_license/LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,6 @@ The following components are provided under Apache License.
(Apache 2.0) Servlet API (org.mortbay.jetty:servlet-api:2.5-20081211 - https://en.wikipedia.org/wiki/Jetty_(web_server))
(Apache 2.0) Google HTTP Client Library for Java (com.google.http-client:google-http-client-jackson2:1.21.0 - https://github.com/google/google-http-java-client/tree/dev/google-http-client-jackson2)
(Apache 2.0) validation-api (javax.validation - http://beanvalidation.org/)
(Apache 2.0) pegdown (org.pegdown:pegdown:1.6.0 - https://github.com/sirthias/pegdown)
(Apache 2.0) parboiled-java (org.parboiled:parboiled-java:1.1.7 - https://github.com/sirthias/parboiled)
(Apache 2.0) parboiled-core (org.parboiled:parboiled-core:1.1.7 - https://github.com/sirthias/parboiled)
(Apache 2.0) ZkClient (com.101tec:zkclient:0.7 - https://github.com/sgroschupf/zkclient)
Expand Down
16 changes: 7 additions & 9 deletions zeppelin-jupyter/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -53,15 +53,13 @@
</dependency>

<dependency>
<groupId>org.apache.zeppelin</groupId>
<artifactId>zeppelin-markdown</artifactId>
<version>${project.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.zeppelin</groupId>
<artifactId>zeppelin-interpreter-shaded</artifactId>
</exclusion>
</exclusions>
<groupId>com.vladsch.flexmark</groupId>
<artifactId>flexmark-all</artifactId>
</dependency>

<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
</dependency>

<!-- Test -->
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,11 @@
import org.apache.zeppelin.jupyter.nbformat.Output;
import org.apache.zeppelin.jupyter.nbformat.RawCell;
import org.apache.zeppelin.jupyter.nbformat.Stream;
import org.apache.zeppelin.jupyter.parser.MarkdownParser;
import org.apache.zeppelin.jupyter.zformat.Note;
import org.apache.zeppelin.jupyter.zformat.Paragraph;
import org.apache.zeppelin.jupyter.zformat.Result;
import org.apache.zeppelin.jupyter.zformat.TypeData;
import org.apache.zeppelin.markdown.FlexmarkParser;
import org.apache.zeppelin.markdown.MarkdownParser;

import java.io.BufferedReader;
import java.io.FileReader;
Expand All @@ -68,7 +67,7 @@ public class JupyterUtil {
private final RuntimeTypeAdapterFactory<Cell> cellTypeFactory;
private final RuntimeTypeAdapterFactory<Output> outputTypeFactory;

private final MarkdownParser markdownProcessor;
private final MarkdownParser markdownParser;

public JupyterUtil() {
this.cellTypeFactory = RuntimeTypeAdapterFactory.of(Cell.class, "cell_type")
Expand All @@ -78,7 +77,7 @@ public JupyterUtil() {
.registerSubtype(ExecuteResult.class, "execute_result")
.registerSubtype(DisplayData.class, "display_data").registerSubtype(Stream.class, "stream")
.registerSubtype(Error.class, "error");
this.markdownProcessor = new FlexmarkParser();
this.markdownParser = new MarkdownParser();
}

public Nbformat getNbformat(Reader in) {
Expand Down Expand Up @@ -146,7 +145,7 @@ public Note getNote(Nbformat nbformat, String id, String codeReplaced, String ma
}
} else if (cell instanceof MarkdownCell || cell instanceof HeadingCell) {
interpreterName = markdownReplaced;
String markdownContent = markdownProcessor.render(codeText);
String markdownContent = markdownParser.render(codeText);
typeDataList.add(new TypeData(TypeData.HTML, markdownContent));
paragraph.setUpMarkdownConfig(true);
} else {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.zeppelin.jupyter.parser;

import com.vladsch.flexmark.ext.autolink.AutolinkExtension;
import com.vladsch.flexmark.ext.emoji.EmojiExtension;
import com.vladsch.flexmark.ext.gfm.strikethrough.StrikethroughExtension;
import com.vladsch.flexmark.ext.tables.TablesExtension;
import com.vladsch.flexmark.ext.typographic.TypographicExtension;
import com.vladsch.flexmark.ext.wikilink.WikiLinkExtension;
import com.vladsch.flexmark.html.HtmlRenderer;
import com.vladsch.flexmark.parser.Parser;
import com.vladsch.flexmark.util.ast.Node;
import com.vladsch.flexmark.util.data.MutableDataSet;
import java.util.Arrays;

import static com.vladsch.flexmark.ext.emoji.EmojiImageType.UNICODE_ONLY;

public class MarkdownParser {
private final Parser parser;
private final HtmlRenderer renderer;

public MarkdownParser() {
MutableDataSet options = new MutableDataSet();
options.set(Parser.EXTENSIONS, Arrays.asList(StrikethroughExtension.create(),
TablesExtension.create(),
AutolinkExtension.create(),
WikiLinkExtension.create(),
TypographicExtension.create(),
EmojiExtension.create()));
options.set(HtmlRenderer.SOFT_BREAK, "<br />\n");
options.set(EmojiExtension.USE_IMAGE_TYPE, UNICODE_ONLY);
parser = Parser.builder(options).build();
renderer = HtmlRenderer.builder(options).build();
}

public String render(String markdownText) {
Node document = parser.parse(markdownText);
String html = renderer.render(document);
return wrapWithMarkdownClassDiv(html);
}

public static String wrapWithMarkdownClassDiv(String html) {
return "<div class=\"markdown-body\">\n" + html + "</div>";
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ public void getNoteAndVerifyData() throws Exception {
" <div class=\"col-sm-1\"><img src=\"https://knowledgeanyhow.org/static/images/favicon_32x32.png\" style=\"margin-top: -6px\"/></div>\n" +
" <div class=\"col-sm-11\">This notebook was created using <a href=\"https://knowledgeanyhow.org\">IBM Knowledge Anyhow Workbench</a>. To learn more, visit us at <a href=\"https://knowledgeanyhow.org\">https://knowledgeanyhow.org</a>.</div>\n" +
" </div>\n" +
"</div>\n\n" +
"</div>\n" +
"</div>" , results.get(0).getData());
assertEquals("HTML", results.get(0).getType());
}
Expand Down

0 comments on commit 6dbf9a9

Please sign in to comment.