Supplemental material for the paper "Maximum entropy and quantized metric models for absolute category ratings"
Dietmar Saupe, Krzysztof Rusek, David Hägele, Daniel Weiskopf, Lucjan Janowski
In this supplemental material we provide more details on the models that we evaluated in the main document as well as the source code used to compute the models.
- ACR_Modeling.m Matlab code. Please cite paper Saupe, D., Rusek, K., Hägele, D., Weiskopf, D., & Janowski, L., Maximum entropy and quantized metric models for absolute category ratings, IEEE Signal Processing Letters 31 (2024) p. 2970-2974.
- KonIQ-10k.csv, ACR dataset. Please cite paper Hosu, V., Lin, H., Sziranyi, T., & Saupe, D., KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment, IEEE Transactions on Image Processing 29 (2020) p. 4041-4056.
- VQEG-HDTV.csv, ACR dataset. Please cite Video Quality Experts Group, Report on the validation of video quality models for high definition video content, https://vqeg.org/projects/hdtv/.
- gsd_prob_vectors.csv, dataset of GSD distributions. Please cite Nawała, J., Janowski, L., Ćmiel, B., Rusek, K., & Pérez, P., Generalized score distribution: A two-parameter discrete distribution accurately describing responses from quality of experience subjective experiments, IEEE Transactions on Multimedia 25 (2022) p. 6090-6104.
The code is given in the form of a single Matlab file ACR_Modeling.m. The program reads either one of the two datasets KonIQ-10k or VQEG-HDTV and then computes the maximum entropy and quantized metric models for all the ACR distributions in the datasets. This reproduces (among other things) the values in the columns for AIC and G-test in Tables I and II of the paper. The user can selects the dataset (KonIQ-10k or VQEG-HDTV) in the code and also which one of the distribution types shall be taken into account. The program produces as output a protocol that lists the progress of the processing, and at the end produces a figure and an output Excel file. The file contains for each stimulus in the dataset the stimulus-id, the ACR ratings, the model name, the model parameter values, the model probabilities for the 5 quality categories, the negative log likelihood, the gtest value, and the p-value correspondimng to the g-test, based on the chisquare distribution.
In the main document, an evaluation of the different models is given where their goodness of fit is determined and prediction accuracy is measured with respect to the VQEG HDTV and KonIQ-10k datasets. Here, we provide a visual comparison of the models and give more details on their individual performance.
To get a more detailed impression of the models' performances, we have a look at the plane of
Figure 1 - Scatterplots for the different models showing |
In Figure 1, it can be seen that all models expose similar patterns. The points on the sides of the plots, i.e., those corresponding to stimuli with little or high mean opinion score and smaller possible variance, have small G-test values (blue color) and are better fitted by the models. Stimuli located more toward the center tend to be fitted worse by the models. For this dataset (VQEG HDTV), all models look equally valid.
Figure 2 - Scatterplots for the different models showing |
In Figure 2, the same kind of visualization is used, but with the KonIQ-10k dataset, which contains a considerably larger number of stimuli.
Here, the models show clearly different patterns.
For example, the GSD model has high G-tests values for stimuli close to
While the above plots showed the models' performance on two different datasets, we also want to give some details on the general similarity of the model outputs.
Therefore, we have a look at the ACR probability vectors generated for different inputs of
In Figure 3 the L1-distance metric was used.
The largest distances can be observed in pairs with GSD, and that the models disagree with GSD most in the center area.
The most similar vector output (w.r.t the L1-distance) is generated by the normal and maxentropy models, as indicated by the large dark areas.
The logit-logistic model and the maxentropy model show low differences in the middle area around
Since the ACR probability vectors are compositions, i.e., their values are non-negative and sum up to