title | description | created | updated | color |
---|---|---|---|---|
Jsoup (Java Library) |
Jsoup is a Java library used for data extcration and manipulation using DOM, CSS,and jquery etc. |
2019-08-05 |
2019-08-05 |
Jsoup is a Java library used for data extcration and manipulation using DOM, CSS,and jquery etc.
<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup -->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.12.1</version>
</dependency>
// https://mvnrepository.com/artifact/org.jsoup/jsoup
compile group: 'org.jsoup', name: 'jsoup', version: '1.12.1'
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.io.File;
import java.io.IOException;
import java.net.URL;
public class JsoupParse {
public static void main(String args[]) throws IOException {
String html = "<html><head><title>One Compiler</title></head>"
+ "<body><p>Welcome to one compiler jsoup cheatsheat.</p></body></html>";
Document stringDoc = Jsoup.parse(html);
//Update file path before execution
Document fileDocument = Jsoup.parse(new File("Path to your html file"), "UTF-8"); //throws IOException
}
}
- Jsoup.connect(String url) with get() method creates a connection and gets data, any error occurs while getting data IOException is thrown which should be handled.
Document document = Jsoup.connect("http://onecompiler.com/posts").get();
- Also we can build connections by using certain specifications as shown below:
Document document = Jsoup.connect("http://onecompiler.com/posts").userAgent("Mozilla")
.cookie("auth", "axvtu23d").timeout(3000).get();
String html = "<p>Welcome to one compiler jsoup cheatsheat.</p>";
Document stringDoc = Jsoup.parseBodyFragment(html);
Element element = document.getElementById("__next");
Elements elements = document.getElementsByClass("homepage--section");
Elements element = document.getElementsByAttribute("src");
Element links = document.getElementsByTag("a");
Find elements using id ex: #__next
Elements elements = document.select("#__next");
Find elements using class ex: .homepage--section
Elements elements = document.select(".homepage--section");
Find elements using tagname ex: a
Elements elements = document.select("a");
Find elements using named attribute ex: [href]
Elements elements = document.select("[href]");
Find elements using named attribute and its value [attr=val]
ex: a[class=homepage--section]
Elements elements = document.select("a[class=homepage--section]");
Find elements using named attribute and its value [attr=val]
ex: a[class=homepage--section]
Elements elements = document.select("a[class=homepage--section]");
Find elements by using named attribute and its value with prefix [attr^=prefixvalue]
ex: a[class^=home]
Elements elements = document.select("a[class^=home]");
Find elements by using named attribute and its value with suffix [attr$=suffixvalue]
ex: img[alt$=image]
Elements elements = document.select("img[alt$=image]");
Find elements using named attribute which contains given value [attr*=value]
ex: img[alt*=image]
Elements elements = document.select("img[alt*=image]");
Find elements using named attribute marches to given regex [attr~=regex]
ex: img[src~=.(png|jpeg)]
Elements elements = document.select("img[src~=.(png|jpeg)]");
Find elements that match any of the selectors mentioned with comma separated. ex: div, .clasname, a[href]
Elements elements = document.select(".homepage ,a[href], img[src~=.(png|jpeg)]");
Find elements that contain given text and search is case insensitive. ex: div:contains(compiler)
Elements elements = document.select("div:contains(compiler)");
Find elements that matches given regular expression. ex: div:matches(compiler)
Elements elements = document.select("div:matches((?i)compiler)");
Find elements that contains given text and search is case insensitive. ex: div:containsOwn(compiler)
Elements elements = document.select("div:containsOwn(compiler)");
Find elements that matches given regular expression. ex: div:matchesOwn((?i)compiler)
Elements elements = document.select("div:matchesOwn((?i)compiler)");
Find elements by that matches given regular expression. ex: div:containsOwn(compiler)
Elements elements = document.select("div:matchesOwn((?i)compiler)");
- We can modify/add text content to existing html using text setter .text(String text), .prepend(String addPrefix), .append(String addSuffix) methods of Element:
Element element = document.select(".__next");
element.text("One Compiler");
element.prepend("Welcome to ");
element.append("Platform.");
- We can set a attribute to multiple Elements.
document.select("a[href]").attr("alt", "logo name");
- We can set a attribute Element like:
doc.select("div.jss15").attr("title", "one compiler");
- We can modify/add html content to existing html using html setter .html(String html), .prepend(String addPrefix), .append(String addSuffix) methods of Element:
Element element = doc.select("div.homepage--section").first();
element.html("<p><b>One Compiler</b></p>");
element.prepend("<p><b>Welcome to </b></p>");
element.append("<p><b>Platform.</b></p>");