Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Prevents MediawikiApi::ApiError Could not normalize image parameters filename.pdf. (urlparamnormal) error #6187

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

empty-codes
Copy link
Contributor

What this PR does

Refactors get_urls method to assign placeholder thumbnails to .pdf and .djvu files and only query mediawiki accepted file formats and adds a save_placeholder_thumbnail method that assigns placeholder values

  • Doing this solves the MediawikiApi::ApiError: Could not normalize image parameters filename.pdf. (urlparamnormal)

The logic used was derived from @ragesoss 's logic in this commit used to solve issue #330 : 'Upload importer attempts to get thumbnails over and over for invalid files'. It was later removed as per issue #699 : 'Commons spec is failing' in this commit as mediawiki handled it. It has however resurfaced again.

Screenshots

Before:

I tested a failing set of pageids:
image

I then cloned the course that experienced the error: https://outreachdashboard.wmflabs.org/courses/Wikimedia_Indonesia/1Lib1Ref_di_Indonesia_Januari_2025/uploads and altered the code so the upload thumbnails would not be added and ran a manual update:

before no thumbnail

After refactoring:

after yes thumbnail

Proof (logs / info) it works (I extracted snippets):

File:Van Dorp's Officieele Reisgids voor Spoor- en Tramswegen op Java (1900).pdf
[2025-02-08 17:05:36.119 DEBUG] CommonsUpload Load (5.5ms)  SELECT `commons_uploads`.* FROM `commons_uploads` WHERE `commons_uploads`.`file_name` = 'File:Van Dorp\'s Officieele Reisgids voor Spoor- en Tramswegen op Java (1900).pdf' LIMIT 1
#<CommonsUpload:0x00007f8958cc87e0>
[2025-02-08 17:05:36.209 DEBUG] TRANSACTION (0.3ms)  BEGIN
[2025-02-08 17:05:36.216 DEBUG] CommonsUpload Update (0.5ms)  UPDATE `commons_uploads` SET `commons_uploads`.`updated_at` = '2025-02-08 16:05:36', `commons_uploads`.`thumburl` = 'https://upload.wikimedia.org/wikipedia/commons/thumb/6/6c/No_image_3x4.svg/200px-No_image_3x4.svg.png', `commons_uploads`.`thumbwidth` = '200', `commons_uploads`.`thumbheight` = '150' WHERE `commons_uploads`.`id` = 158163184
[2025-02-08 17:05:36.250 DEBUG] TRANSACTION (2.6ms)  COMMIT
File:Van Dorp's Officieele Reisgids voor Spoor- en Tramswegen op Java (1898).pdf
[2025-02-08 17:05:36.268 DEBUG] CommonsUpload Load (0.9ms)  SELECT `commons_uploads`.* FROM `commons_uploads` WHERE `commons_uploads`.`file_name` = 'File:Van Dorp\'s Officieele Reisgids voor Spoor- en Tramswegen op Java (1898).pdf' LIMIT 1
#<CommonsUpload:0x00007f8958c743e8>
[2025-02-08 17:05:36.286 DEBUG] TRANSACTION (0.2ms)  BEGIN
[2025-02-08 17:05:36.292 DEBUG] CommonsUpload Update (0.4ms)  UPDATE `commons_uploads` SET `commons_uploads`.`updated_at` = '2025-02-08 16:05:36', `commons_uploads`.`thumburl` = 'https://upload.wikimedia.org/wikipedia/commons/thumb/6/6c/No_image_3x4.svg/200px-No_image_3x4.svg.png', `commons_uploads`.`thumbwidth` = '200', `commons_uploads`.`thumbheight` = '150' WHERE `commons_uploads`.`id` = 158163185
[2025-02-08 17:05:36.313 DEBUG] TRANSACTION (1.5ms)  COMMIT
File:Officieele reisgids der spoor- en tramwegen en aansluitende automobieldiensten op Java en Madoera (1935).pdf
[2025-02-08 17:05:36.329 DEBUG] CommonsUpload Load (0.9ms)  SELECT `commons_uploads`.* FROM `commons_uploads` WHERE `commons_uploads`.`file_name` = 'File:Officieele reisgids der spoor- en tramwegen en aansluitende automobieldiensten op Java en Madoera (1935).pdf' LIMIT 1
#<CommonsUpload:0x00007f8958c7d4e8>
[2025-02-08 17:05:36.345 DEBUG] TRANSACTION (0.2ms)  BEGIN
[2025-02-08 17:05:36.354 DEBUG] CommonsUpload Update (0.7ms)  UPDATE `commons_uploads` SET `commons_uploads`.`updated_at` = '2025-02-08 16:05:36', `commons_uploads`.`thumburl` = 'https://upload.wikimedia.org/wikipedia/commons/thumb/6/6c/No_image_3x4.svg/200px-No_image_3x4.svg.png', `commons_uploads`.`thumbwidth` = '200', `commons_uploads`.`thumbheight` = '150' WHERE `commons_uploads`.`id` = 158172982
[2025-02-08 17:05:36.380 DEBUG] TRANSACTION (1.6ms)  COMMIT

=> [{"pageid"=>158196457,
  "ns"=>6,
  "title"=>"File:Aerial view of Halte Bendo Kediri.tiff",
  "imagerepository"=>"local",
  "imageinfo"=>
   [{"thumburl"=>
      "https://upload.wikimedia.org/wikipedia/commons/thumb/0/0d/Aerial_view_of_Halte_Bendo_Kediri.tiff/lossy-page1-480px-Aerial_view_of_Halte_Bendo_Kediri.tiff.jpg",
     "thumbwidth"=>480,
     "thumbheight"=>480,
     "responsiveUrls"=>
      {"1.5"=>
        "https://upload.wikimedia.org/wikipedia/commons/thumb/0/0d/Aerial_view_of_Halte_Bendo_Kediri.tiff/lossy-page1-720px-Aerial_view_of_Halte_Bendo_Kediri.tiff.jpg",
       "2"=>
        "https://upload.wikimedia.org/wikipedia/commons/thumb/0/0d/Aerial_view_of_Halte_Bendo_Kediri.tiff/lossy-page1-960px-Aerial_view_of_Halte_Bendo_Kediri.tiff.jpg"},
     "url"=>"https://upload.wikimedia.org/wikipedia/commons/0/0d/Aerial_view_of_Halte_Bendo_Kediri.tiff",
     "descriptionurl"=>"https://commons.wikimedia.org/wiki/File:Aerial_view_of_Halte_Bendo_Kediri.tiff",
     "descriptionshorturl"=>"https://commons.wikimedia.org/w/index.php?curid=158196457"}]},
     {"pageid"=>158445924,
  "ns"=>6,
  "title"=>"File:Foto uit een album over de suikeronderneming Pesantren - (cropped).png",
  "imagerepository"=>"local",
  "imageinfo"=>
   [{"thumburl"=>
      "https://upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Foto_uit_een_album_over_de_suikeronderneming_Pesantren_-_%28cropped%29.png/854px-Foto_uit_een_album_over_de_suikeronderneming_Pesantren_-_%28cropped%29.png",
     "thumbwidth"=>854,
     "thumbheight"=>480,
     "responsiveUrls"=>
      {"1.5"=>
        "https://upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Foto_uit_een_album_over_de_suikeronderneming_Pesantren_-_%28cropped%29.png/1281px-Foto_uit_een_album_over_de_suikeronderneming_Pesantren_-_%28cropped%29.png",
       "2"=>"https://upload.wikimedia.org/wikipedia/commons/a/ac/Foto_uit_een_album_over_de_suikeronderneming_Pesantren_-_%28cropped%29.png"},
     "url"=>"https://upload.wikimedia.org/wikipedia/commons/a/ac/Foto_uit_een_album_over_de_suikeronderneming_Pesantren_-_%28cropped%29.png",
     "descriptionurl"=>"https://commons.wikimedia.org/wiki/File:Foto_uit_een_album_over_de_suikeronderneming_Pesantren_-_(cropped).png",
     "descriptionshorturl"=>"https://commons.wikimedia.org/w/index.php?curid=158445924"},
    {"thumburl"=>
      "https://upload.wikimedia.org/wikipedia/commons/archive/a/ac/20250124113940%21Foto_uit_een_album_over_de_suikeronderneming_Pesantren_-_%28cropped%29.png",
     "thumbwidth"=>190,
     "thumbheight"=>135,
     "url"=>"https://upload.wikimedia.org/wikipedia/commons/archive/a/ac/20250124113940%21Foto_uit_een_album_over_de_suikeronderneming_Pesantren_-_%28cropped%29.png",
     "descriptionurl"=>"https://commons.wikimedia.org/wiki/File:Foto_uit_een_album_over_de_suikeronderneming_Pesantren_-_(cropped).png",
     "descriptionshorturl"=>"https://commons.wikimedia.org/w/index.php?curid=158445924"}]},
 {"pageid"=>158659677,
  "ns"=>6,
  "title"=>"File:LL-Q13324 (min)-Zhilal Darma-nabu.wav",
  "imagerepository"=>"local",
  "imageinfo"=>
   [{"thumburl"=>"https://commons.wikimedia.org/w/resources/assets/file-type-icons/fileicon-ogg.png",
     "thumbwidth"=>400,
     "thumbheight"=>400,
     "url"=>"https://upload.wikimedia.org/wikipedia/commons/5/5d/LL-Q13324_%28min%29-Zhilal_Darma-nabu.wav",
     "descriptionurl"=>"https://commons.wikimedia.org/wiki/File:LL-Q13324_(min)-Zhilal_Darma-nabu.wav",
     "descriptionshorturl"=>"https://commons.wikimedia.org/w/index.php?curid=158659677"}]},

Test

Before:
b4 fail
After:
after pass

… and `.djvu` files and only query mediawiki accepted file formats

- Adds a `save_placeholder_thumbnail` method that assigns placeholder values
- Doing this solves the `MediawikiApi::ApiError
Could not normalize image parameters filename.pdf. (urlparamnormal)`
# mediawiki cannot generate thumbnails for pdf,djvu files
if title.match?(/\.(pdf|djvu)$/i)
bad_file = CommonsUpload.find_by(file_name: title)
save_placeholder_thumbnail(bad_file) if bad_file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think is should be done within the get_urls method. The method name (get_something) implies that it's a read-only method will find and return data without changing the database, so this will be an unexpected side effect.

The UploadImporter#import_urls method would be a more appropriate place to handle setting a placeholder thumbnail.


# mediawiki can generate thumbnails for jpg,png,tiff,wav files
# mediawiki cannot generate thumbnails for pdf,djvu files
if title.match?(/\.(pdf|djvu)$/i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something in the response value that can be used, rather than simply matching for the file suffix? There may be cases where the error happens even with a different filetype, and the set of non-supported filetypes might change over time. It would be better to look for the error/warning within the response, instead of assuming based on file type. If we just wanted to do this based on file time, we could instead set placeholder at the time the file gets saved, before even making a query for the thumbnail. But I think relying on mediawiki to determine whether a thumbnail can be made is better.

Copy link
Contributor Author

@empty-codes empty-codes Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file names are gotten with the build_info_query method which just returns the page id and titles, this is an example response.

I've tried changing the parameters of the image_info_query, if the iiurlheight parameter is removed for example, the error goes but it returns only the url of the files and not their thumburl for both image and non-image files meaning this line in import_urls:
file.thumburl = file_url['imageinfo'][0]['thumburl'] would not work as the imageinfo array would not have the thumburl property.

It seems that to get the thumburl in the response, iiurlheight / iiurlwidth must be added as a parameter. But apparently the non-image files don't have these properties hence the error.

Another possible way could be to query for mediatype as a iiprop and if it is a OFFICE mediatype which is what .pdf and .djvu fall under as seen here, add it to the non-image files. This is a list of mediawiki media types.

To my knowledge, the only way to match Could not normalise image parameters for.... is to allow the error happen and then handle/rescue it. My concern is that the error is raised and sent to sentry in wiki_api at this line @mediawiki.send(action, query), so if we want to match Could not normalise image parameters for..' directly, that means rescuing and handling the error in wiki_api's mediawikimethod, then in commons'sapi_get` which is what you did here.

So please which way do you think is most suitable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I'd really like to avoid this kind of complex error handling, if possible. Do you have a query that triggers it? I'd like to look at the mediawiki response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants