-
Notifications
You must be signed in to change notification settings - Fork 16
Tree navigation basics
zverok edited this page Aug 7, 2015
·
8 revisions
Exploring the data
The content model of the tree tries to be straightforward, not very deep and easily understandable:
- on the first level there's paragraphs, headings, lists and tables;
- inside them is inline markup: bolds and italics, links, images, templates and allowed HTML tags;
- each tree node type has its own class with obvious name:
Infoboxer::Tree::Paragraph
,Infoboxer::Tree::Heading
,Infoboxer::Tree::UnorderedList
and so on.
Tree navigation is done like this:
include Infoboxer::Tree
# Node#lookup
page.lookup(Wikilink) # all wikilinks on page
page.lookup(Heading, level: 3) # all headings of level 3 only
page.lookup(Wikilink){|l| l.text.include?('federation')}
# if you don't want to include Infoboxer::Tree, class-y symbols are
# also working:
page.lookup(:Wikilink) # all wikilinks on page
# Node#lookup_children
page.lookup(:Paragraph).first.lookup_children(:Italic)
# => only italics which are direct children of the para (doesn't returns
# italics inside links, for example)
# Node#lookup_parents
page.lookup(:ListItem).first.lookup_parents(:UnorderedList)
# Node#lookup_siblings
page.lookup(:ListItem).first.lookup_siblings(index: 4)
Each lookup returns Nodes type, and it has methods to just continue your lookup like this:
page.lookup(UnorderedList).lookup_children(text: /Argentinian/)
Arguments passed to any lookup_*
method is a list of selectors, which
can contain those values:
- Node class (like
ListItem
) or class-name symbols (like:ListItem
); - Symbol (like
:empty?
) -- Node is checked for having this method and returning truthy value from it; - Hash of "symbol => pattern" values, where
symbol
is any node getter, and thepattern
is value to check against (checks are performed with===
, so you can do things liketext: /something/
); - block, which receives node and returns true or false.
It's not an XPath-strength solution, yet it is straightforward and flexible (and it is pure Ruby).
See also API docs.
Surprisingly, that's enough power to get virtually everything Wikipedia can provide. Yet there's more!
Next: Navigation shortcuts