Skip to content

GREL Other Functions

Tom Morris edited this page Dec 26, 2012 · 10 revisions

Other functions supported by the OpenRefine Expression Language (GREL)


NOTE: This page was automatically converted from the old page at Google Code and has not been manually reviewed. Please compare it to the original , correct any errors or omissions, then remove this warning block.


See also: [GRELFunctions All GREL functions].

{{{type(o)}}}

Returns the type of o, such as undefined, string, number, etc.

{{{hasField(o, string name)}}}

Returns a boolean indicating whether o has a field called name. For example, cell.hasField("value") always returns true, as every cell has a value field.

{{{jsonize(value)}}}

Quotes a value as a JSON literal value.

{{{parseJson(string s)}}}

Parses s as JSON. get can then be used with parseJson, e.g.,

parseJson(" { 'a' : 1 } ").get("a")

returns 1.

To get all instances from a JSON array called "keywords" having the same object string name of "text", combine with the forEach() function to iterate over the array, e.g.,

JSON:

{
   "status":"OK",
   "url":"",
   "language":"english",
   "keywords":[
      {
         "text":"York en route",
         "relevance":"0.974363"
      },
      {
         "text":"Anthony Eden",
         "relevance":"0.814394"
      },
      {
         "text":"President Eisenhower",
         "relevance":"0.700189"
      }
   ]
}

GREL expression:

forEach(value.parseJson().keywords,v,v.text).join(":::")

will output like this:

York en route:::Anthony Eden:::President Eisenhower

{{{cross(cell c, string projectName, string columnName)}}}

Returns an array of zero or more rows in the project projectName for which the cells in their column columnName have the same content as cell c. (Similar to a lookup) Consider 2 projects with the following data

My Address Book

friend address
john 120 Main St.
mary 50 Broadway Ave.
anne 17 Morning Crescent

Christmas Gifts

gift recipient
lamp mary
clock john

Now in the project "Christmas Gifts", we want to add a column containing the addresses of the recipients, which are in project "My Address Book". So we can invoke the command "Add column based on this column" on the "recipient" column in "Christmas Gifts" and enter this expression

  cell.cross("My Address Book", "friend").cells["address"].value[0]

When that command is applied, the result is

Christmas Gifts

gift recipient address
lamp mary 50 Broadway Ave.
clock john 120 Main St.

{{{facetCount(choiceValue, string facetExpression, string columnName)}}}

Returns the facet count corresponding to the given choice value

gift recipient price
lamp mary 20
clock john 54
watch amit 80
clock claire 62

If we wanted to show how many of each gift we had given, we could use the following expression:

  facetCount(value, "value", "gift")

This would then complete the table as so:

gift recipient price count
lamp mary 20 1
clock john 54 2
watch amit 80 1
clock claire 62 2

Jsoup HTML parsing functions


[StrippingHTML Example usage of these functions is shown here]

{{{parseHtml(string s)}}}

returns a full HTML document and adds any missing closing tags.

You can then extract or .select() which portions of the HTML document you need for further splitting, partitioning, etc.

An example of extracting all table rows <tr> from a <div> using parseHtml().select() together is described more [StrippingHTML#Extract_HTML_attributes,_text,_links_with_integrated_GREL_comman here]

{{{select(Element e, String s)}}}

returns an element from an HTML doc element using selector syntax.

This function can be used with most of the Jsoup selector syntax as documented here: http://jsoup.org/cookbook/extracting-data/selector-syntax

{{{htmlAttr(Element e, String s)}}}

returns a value from an attribute on an HTML Element.

{{{htmlText(Element e)}}}

returns the text from within an element (including all child elements).

{{{innerHtml(Element e)}}}

returns the innerHtml of an HTML element.

{{{outerHtml(Element e)}}}

???

{{{ownText}}}(Element e)`

Gets the text owned by this HTML element only; does not get the combined text of all children.

Clone this wiki locally