Skip to content

Commit c42928d

Browse files
committed
Updated readme to md, more detail
1 parent 040ce71 commit c42928d

File tree

2 files changed

+42
-17
lines changed

2 files changed

+42
-17
lines changed

README

-17
This file was deleted.

README.md

+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# jsoup: Java HTML Parser
2+
3+
**jsoup** is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
4+
5+
6+
**jsoup** implements the [WHATWG HTML5](http://whatwg.org/html) specification, and parses HTML to the same DOM as modern browsers do.
7+
8+
* scrape and [parse](https://jsoup.org/cookbook/input/parse-document-from-string) HTML from a URL, file, or string
9+
* find and [extract data](https://jsoup.org/cookbook/extracting-data/selector-syntax), using DOM traversal or CSS selectors
10+
* manipulate the [HTML elements](https://jsoup.org/cookbook/modifying-data/set-html), attributes, and text
11+
* [clean](https://jsoup.org/cookbook/cleaning-html/whitelist-sanitizer) user-submitted content against a safe white-list, to prevent XSS attacks
12+
* output tidy HTML
13+
14+
jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree.
15+
16+
See [**jsoup.org**](https://jsoup.org/) for downloads and the full [API documentation](https://jsoup.org/apidocs/).
17+
18+
## Example
19+
Fetch the [Wikipedia](http://en.wikipedia.org/wiki/Main_Page) homepage, parse it to a [DOM](https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Introduction), and select the headlines from the *In the News* section into a list of [Elements](https://jsoup.org/apidocs/index.html?org/jsoup/select/Elements.html) ([online sample](https://try.jsoup.org/~LGB7rk_atM2roavV0d-czMt3J_g)):
20+
21+
```java
22+
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
23+
Elements newsHeadlines = doc.select("#mp-itn b a");
24+
```
25+
26+
## Open source
27+
jsoup is an open source project distributed under the liberal [MIT license](https://jsoup.org/license). The source code is available at [GitHub](https://github.com/jhy/jsoup/tree/master/src/main/java/org/jsoup).
28+
29+
## Getting started
30+
1. [Download](https://jsoup.org/download) the latest jsoup jar (or it add to your Maven/Gradle build)
31+
2. Read the [cookbook](https://jsoup.org/cookbook/)
32+
3. Enjoy!
33+
34+
## Development and support
35+
If you have any questions on how to use jsoup, or have ideas for future development, please get in touch via the [mailing list](https://jsoup.org/discussion).
36+
37+
If you find any issues, please file a [bug](https://jsoup.org/bugs) after checking for duplicates.
38+
39+
The [colophon](https://jsoup.org/colophon) talks about the history of and tools used to build jsoup.
40+
41+
## Status
42+
jsoup is in general, stable release.

0 commit comments

Comments
 (0)