This is the Java implementation of the twitter-text parsing library. The library has methods and classes to parse Tweets and calculate length, validity, parse @mentions, #hashtags, URLs, and more.
The latest artifacts are published to Maven Central.
Bringing twitter-text-java into your project should be as simple as adding the following to your pom.xml:
<dependencies>
<dependency>
<groupId>com.twitter.twittertext</groupId>
<artifactId>twitter-text</artifactId>
<version>3.1.0</version> <!-- or whatever the latest version is -->
</dependency>
</dependencies>
To build, run maven:
% mvn clean install
To run tests:
% mvn test
To generate Javadocs, run this under src/main/java:
% javadoc -d ../../../docs/api/ com.twitter.twittertext
twitter-text 3.0 updates the config file with emojiParsingEnabled config option. When true, twitter-text will parse and discount emoji supported by the twemoji library (see https://github.com/twitter/twemoji). The length of these emoji will be the default weight (200 or two characters) even if they contain multiple code points combined by zero-width joiners. This means that emoji with skin tone and gender modifiers no longer count as more characters than those without such modifiers.
twitter-text 2.0 introduced configuration files that define how Tweets are parsed for length. This allows for backwards compatibility and flexibility going forward. Old-style traditional 140-character parsing is defined by the v1.json configuration file, whereas v2.json is updated for "weighted" Tweets where ranges of Unicode code points can have independent weights aside from the default weight. The sum of all code points, each weighted appropriately, should not exceed the max weighted length.
static TwitterTextParseResults TwitterTextParser.parseTweet(String text)
This method takes a string as input and returns a results object that contains information about the string. TwitterTextParseResults
includes:
-
public final int weightedLength
: the overall length of the Tweet with code points weighted per the ranges defined in the configuration file. -
public final int permillage
: indicates the proportion (per thousand) of the weighted length in comparison to the max weighted length. A value > 1000 indicates input text that is longer than the allowable maximum. -
public final boolean isValid
: indicates if input text length corresponds to a valid result. -
public final Range displayTextRange
: A pair of unicode code point indices identifying the inclusive start and exclusive end of the displayable content of the Tweet. -
public final Range validDisplayTextRange
: A pair of unicode code point indices identifying the inclusive start and exclusive end of the valid content of the Tweet.
final TwitterTextParseResults result = TwitterTextParser.parseTweet("We're expanding the character limit! We want it to be easier and faster for everyone to express themselves.\n\nMore characters. More expression. More of what's happening.\nhttps://cards.twitter.com/cards/gsby/4ztbu");
// result.weightedLength == 192
// result.permillage == 685
// result.isValid == true
// result.displayTextRange == 0-210
// result.validTextRange == 0-210
final Extractor extractor = new Extractor();
final List<String> urls = extractor.extractURLs("We're expanding the character limit! We want it to be easier and faster for everyone to express themselves.\n\nMore characters. More expression. More of what's happening.\nhttps://cards.twitter.com/cards/gsby/4ztbu")
// urls == ["https://cards.twitter.com/cards/gsby/4ztbu"]
final Autolink autolinker = new Autolink();
final String linkedText = autolinker.autoLink("@twitter #lovetwitter $TWTR");
// linkedText == "@<a class="tweet-url username" href="https://twitter.com/twitter" rel="nofollow">twitter</a> <a href="https://twitter.com/search?q=%23lovetwitter" title="#lovetwitter" class="tweet-url hashtag" rel="nofollow">#lovetwitter</a> <a href="https://twitter.com/search?q=%24TWTR" title="$TWTR" class="tweet-url cashtag" rel="nofollow">$TWTR</a>"
Have a bug? Please create an issue here on GitHub!
https://github.com/twitter/twitter-text/issues
Copyright 2011-2020 Twitter, Inc. and other contributors
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0