Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to pull logo/profile image from Infobox table? #18

Open
tpitt opened this issue Apr 13, 2014 · 6 comments
Open

How to pull logo/profile image from Infobox table? #18

tpitt opened this issue Apr 13, 2014 · 6 comments

Comments

@tpitt
Copy link

tpitt commented Apr 13, 2014

It seems the entire side Wikipedia Infobox table is ignored when accessing the page.sanitized_content.

How can I pull only a company logo or a person's profile photo from the Infobox?

The rest of the Infobox content would be nice to have as well.

@aalvrz
Copy link

aalvrz commented Jul 16, 2017

@tpitt

Did you find a way to obtain the Infobox table data? I just started using this nice gem but can't seem to find a way to retrieve this data.

@pietromenna
Copy link
Collaborator

Hi,

It is possible to extract this information by using page.raw_data method. It is a nice contribution if you would like to add the functionality to retrieve only the infobox.

@aalvrz
Copy link

aalvrz commented Jul 17, 2017

@pietromenna Thanks for your reply.

I would love to submit a pull request, but while inspecting the raw_data I couldn't seem to find the infobox's information.

For example, a search on the raw_data of the Great white shark doesn't seem to have Phylum, Kingdom, etc...

I think the request_page method of the client might be incomplete. Probably some options or parameters missing, but I find the Wikipedia API documentation kind of confusing, and a pain to understand...

Would appreciate some insight, I would love to submit the patch then.

@pietromenna
Copy link
Collaborator

pietromenna commented Jul 17, 2017

Thank you @BigChief45 for your example. In the example raw_data does not contain that section when queried without additional parameters. This is because not all templates get pulled by the API call to wikipedia with the default parameters.

In order to make this query to work, you will have toinclude tllimit => 500. For more information about tllimit here.

Check out if the code below works:
page = Wikipedia.find( 'Great white shark', :tllimit => 500)

Usually when raw_data has something missing, it is related to default parameters on the API of wikipedia. More details here.

I hope this helped.

@aalvrz
Copy link

aalvrz commented Jul 17, 2017

@pietromenna I tried using that option with 500 as value. But I am still not getting that data... am I missing something else?

@pietromenna
Copy link
Collaborator

I apologize if I understood incorrectly. :-( I though you were looking for the list of taxonomies:

I ran this example:

require 'wikipedia'
page = Wikipedia.find( 'Great white shark', :tllimit => 500 )
puts page.templates

And it came this list:
...
Template:Taxonomy
Template:Taxonomy/Animalia
Template:Taxonomy/Bilateria
Template:Taxonomy/Carcharodon
Template:Taxonomy/Chondrichthyes
Template:Taxonomy/Chordata
Template:Taxonomy/Craniata
Template:Taxonomy/Deuterostomia
Template:Taxonomy/Elasmobranchii
Template:Taxonomy/Eugnathostomata
Template:Taxonomy/Eukaryota
Template:Taxonomy/Eumetazoa
Template:Taxonomy/Euselachii
Template:Taxonomy/Filozoa
Template:Taxonomy/Gnathostomata
Template:Taxonomy/Holozoa
Template:Taxonomy/Lamnidae
Template:Taxonomy/Lamniformes
Template:Taxonomy/Life
Template:Taxonomy/Nephrozoa
Template:Taxonomy/Opisthokonta
Template:Taxonomy/Selachimorpha
Template:Taxonomy/Unikonta
Template:Taxonomy/Vertebrata
...

Without the parameter the API simply does not return all the templates.

This gem retrieves information by calling wikipedia API, mostly by using the query action. If you want to see what gets returned by the API, you can check raw_data method. It should have the same content as using directly the API by using the wikipedia sandbox. If you don't pass any parameters, the API returns only part of the the contents (probably for security reasons).

One suggestion is try to get the query by using the sandbox to get the results you want. Then you are able to set the parameters to the gem so it retrieves for you the information you need programatically.

I hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants