Survey IQA

https://github.com/chaofengc/Awesome-Image-Quality-Assessment ⭐

https://github.com/ziqihuangg/Awesome-Evaluation-of-Visual-Generation

https://github.com/bcmi/Awesome-Aesthetic-Evaluation-and-Cropping

https://github.com/chaofengc/IQA-PyTorch

README.md

Objective: Summarize all the image/video quality assessment metrics.

metrics

SRCC and PLCC

Spearman rank ordered correlation (SRCC), Pearson linear correlation (PLCC), and the standard deviation (std) are reported.

SRCC range[-1,1]，接近于1，IQA 效果越好
KRCC 越大越好
PLCC range[-1,1]，接近于1，IQA 效果越好

Full-reference

"FloLPIPS: A Bespoke Video Quality Metric for Frame Interpoation" Arxiv, 2022 Jul 17 paper

we present a bespoke full reference video quality metric for VFI, FloLPIPS, that builds on the popular perceptual image quality metric, LPIPS, which captures the perceptual degradation in extracted image feature space

PSNR

Peak signal-to-noise ratio (PSNR) is an engineering term for the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation.

https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio

MSE越小，则PSNR越大；所以PSNR越大，代表着图像质量越好。

PSNR高于40dB说明图像质量极好（即非常接近原始图像），在30—40dB通常表示图像质量是好的（即失真可以察觉但可以接受），在20—30dB说明图像质量差；最后，PSNR低于20dB图像不可接受

对应 code 看一下实现

from skimage.metrics import structural_similarity, peak_signal_noise_ratio  # SSIM, PSNR

MSE

def mean_squared_error(image0, image1):
    """
    Compute the mean-squared error between two images.

    Parameters
    ----------
    image0, image1 : ndarray
        Images.  Any dimensionality, must have same shape.

    Returns
    -------
    mse : float
        The mean-squared error (MSE) metric.

    Notes
    -----
    .. versionchanged:: 0.16
        This function was renamed from ``skimage.measure.compare_mse`` to
        ``skimage.metrics.mean_squared_error``.

    """
    check_shape_equality(image0, image1)
    image0, image1 = _as_floats(image0, image1)
    return np.mean((image0 - image1) ** 2, dtype=np.float64)

带入公式 10 * np.log10((data_range ** 2) / err)

这里 err 是 MSE，data_range 是灰度最大值

def peak_signal_noise_ratio(image_true, image_test, *, data_range=None):
    """
    Compute the peak signal to noise ratio (PSNR) for an image.

    Parameters
    ----------
    image_true : ndarray
        Ground-truth image, same shape as im_test.
    image_test : ndarray
        Test image.
    data_range : int, optional
        The data range of the input image (distance between minimum and
        maximum possible values).  By default, this is estimated from the image
        data-type.

    Returns
    -------
    psnr : float
        The PSNR metric.

    Notes
    -----
    .. versionchanged:: 0.16
        This function was renamed from ``skimage.measure.compare_psnr`` to
        ``skimage.metrics.peak_signal_noise_ratio``.

    References
    ----------
    .. [1] https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio

    """
    check_shape_equality(image_true, image_test)

    if data_range is None:
        if image_true.dtype != image_test.dtype:
            warn("Inputs have mismatched dtype.  Setting data_range based on "
                 "image_true.")
        dmin, dmax = dtype_range[image_true.dtype.type]
        true_min, true_max = np.min(image_true), np.max(image_true)
        if true_max > dmax or true_min < dmin:
            raise ValueError(
                "image_true has intensity values outside the range expected "
                "for its data type. Please manually specify the data_range.")
        if true_min >= 0:
            # most common case (255 for uint8, 1 for float)
            data_range = dmax
        else:
            data_range = dmax - dmin

    image_true, image_test = _as_floats(image_true, image_test)

    err = mean_squared_error(image_true, image_test)
    return 10 * np.log10((data_range ** 2) / err)

SSIM

"Image quality assessment: From error visibility to structural similarity" TIP, 2004 Apr 30, SSIM paper code pdf note Authors: Zhou Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli

对亮度 $\mu$, 对比度 $\sigma$, 结构特征 $(x-\mu)/\sigma$ 的加权乘积，指数=1

DISTS

关注纹理的指标！

"Image Quality Assessment: Unifying Structure and Texture Similarity" TPAMI, 2020 Apr 16, DISTS paper code pdf note Authors: Keyan Ding, Kede Ma, Shiqi Wang, Eero P. Simoncelli

人眼对于相同大致纹理相同（都是草坪）但细节区域纹理不一致的图像，感觉是类似的。PSNR & SSIM & MSE 等传统指标基于两张对齐的图像去对比，发现对于大致纹理相同&细节不同的图像（非对其但是为同一个纹理），和人眼主观感知是有差距的！

SSIM 对物理位置整体移动很敏感！对于物体移动 vs 高斯退化，高斯退化后的图像居然 SSIM 更高！！
DISTS 区分度主要在 0-0.2 左右，对于物体移动不敏感，物体移动但不降低质量的 DISTS 比加高斯噪声的更好

LPIPS

Learned Perceptual Image Patch Similarity (LPIPS) metric

LPIPS Loss producing negative values

"The Unreasonable Effectiveness of Deep Features as a Perceptual Metric" CVPR, 2018 Jan 11, LPIPS paper code pdf note Authors: Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, Oliver Wang

"The Unreasonable Effectiveness of Deep Features as a Perceptual Metric"，更符合人类的感知情况。LPIPS的值越低表示两张图像越相似，反之，则差异越大。 ['alex','vgg','squeeze'] are the base/trunk networks available 预训练提取初步特征，之后额外的几层 MLP(dropout + Conv) 输出后相加

将图像归一化到 [-1, 1]，取指定层的特征，计算差值的平方；

Perceptual Loss

取 VGG19，5 层 ConvBlock 的第一层 LayerNorm 出来的特征？（记忆里的，待再次核实）

No-Reference

"Making a ‘Completely Blind’ Image Quality Analyzer" SPL-2012, NIQE
"A Feature-Enriched Completely Blind Image Quality Evaluator" TIP 2015, ILNIQE

paper

NIQE 先别用了，真实退化上面效果很拉

"No-Reference Image Quality Assessment in the Spatial Domain" TIP, 2012 Dec, BRISQUE paper
"The Unreasonable Effectiveness of Deep Features as a Perceptual Metric" CVPR, 2018 Jan 11, LPIPS paper code pdf note Authors: Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, Oliver Wang
"MUSIQ: Multi-scale Image Quality Transformer" ICCV, 2021 Aug 12 paper code pdf note Authors: Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, Feng Yang

有两种版本，No.1 三种 resolution(native resolution, 224, 384) 一起评估；No2.(native resolution)

"MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment" NTIRE2022-1st, 2022 Apr 19 code
"Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild" CVPR, 2023 Apr 2 paper code pdf note Authors: Avinab Saha, Sandeep Mishra, Alan C. Bovik

"ARNIQA: Learning Distortion Manifold for Image Quality Assessment" WACV-oral 2024, 2023 Oct 20 paper code pdf note Authors: Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto Del Bimbo

相比本文指标，Re-IQA 在合成数据上性能接近 & 真实数据上也没差多少；BRISQUE, NIQE 就完全不准了。。。

"Exploring clip for assessing the look and feel of images" AAAI, 2023, CLIP-IQA
"Quality-Aware Image-Text Alignment for Real-World Image Quality Assessment" Arxiv, 2024 Mar 17, QualiCLIP paper code pdf note Authors: Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini

Inference 得到的是一个接近于 positive prompt 特征的相似度，范围 [0,1] 越接近 1 越好

FID

FID score for PyTorch

分数越低代表两组图像越相似，或者说二者的统计量越相似，FID 在最佳情况下的得分为 0.0，表示两组图像相同

用 torchvision 中预训练好的 InceptionV3 模型（修改了几层），提取第几个 block 的输出。对每个图像弄成 dimens 尺寸的 embedding。对这个 embedding 提取均值和方差

def calculate_activation_statistics(files, model, batch_size=50, dims=2048,
                                    device='cpu', num_workers=1):
    """Calculation of the statistics used by the FID"""
    mu = np.mean(act, axis=0)
    sigma = np.cov(act, rowvar=False)
    return mu, sigma

$ d^2 = ||mu_1 - mu_2||^2 + Tr(C_1 + C_2 - 2sqrt(C_1C_2)).$

BRISQUE

"No-Reference Image Quality Assessment in the Spatial Domain" TIP, 2012 Dec, BRISQUE paper

import piq

def calc_brisque(out, data_range=1., reduction='none') -> torch.Tensor:
    """brisque metric

    Args:
        out : tensor, range[-1,1]
        data_range int: max value Defaults to 1..
        reduction (str, optional): Defaults to 'none'.

    Returns:
        _type_: _description_
    """
    out = torch.clamp((out + 1) / 2., min=0., max=1.)
    return piq.brisque(out, data_range=data_range, reduction=reduction)

NIQE

"Making a ‘Completely Blind’ Image Quality Analyzer" SPL-2012, NIQE
"A Feature-Enriched Completely Blind Image Quality Evaluator" TIP 2015, ILNIQE

paper

def calc_niqe(img: torch.Tensor):
    """img.convert("L")"""
    img = torchvision.transforms.ToPILImage()(img)
    img = np.array(img.convert("L")).astype(np.float32)
    if img is None:
        return None
    return skvideo.measure.niqe(img)[0]

UHD

"UHD-IQA Benchmark Database: Pushing the Boundaries of Blind Photo Quality Assessment" paper
"Assessing UHD Image Quality from Aesthetics, Distortions, and Saliency" ECCVWorkshop, 2024 Sep 1 paper code pdf note Authors: Wei Sun, Weixia Zhang, Yuqin Cao, Linhan Cao, Jun Jia, Zijian Chen, Zicheng Zhang, Xiongkuo Min, Guangtao Zhai

Video

https://github.com/ckkelvinchan/BasicVSR_PlusPlus/tree/master/tests

https://github.com/open-mmlab/mmagic/blob/main/mmagic/evaluation/__init__.py

Warp Error https://github.com/phoenix104104/fast_blind_video_consistency

VFID https://github.com/amjltc295/Free-Form-Video-Inpainting/blob/master/src/evaluate.py#L28

"Learning Blind Video Temporal Consistency" ECCV, 2018 Aug 1, warping-error paper code pdf note Authors: Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, Ming-Hsuan Yang

warp error 有代码，这个论文用的 FlowNet 生成光流，需要自己换成别的光流模型 ⚠️

"FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling" ECCV2022+TPAMI2023, 2022 Jul 6 paper code pdf note Authors: Haoning Wu, Chaofeng Chen, Jingwen Hou, Liang Liao, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

our future work DOVER based on FAST-VQA with even better performance

"Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives" ICCV, 2022 Nov 9, DOVER ⭐ paper code pdf note Authors: Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

注意测试时候用的 224x224 T=32 的视频，需要 6G VRAM 左右

"Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis" Arxiv, 2023 Jun 15, HPSv2 paper code pdf note Authors: Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, Hongsheng Li

"Not All Noises Are Created Equally:Diffusion Noise Selection and Optimization" Arxiv, 2024 Jul 19 paper code pdf note Authors: Zipeng Qi, Lichen Bai, Haoyi Xiong, and Zeke Xie

HPS v2, PickScore, and ImageReward are all emerging human reward models that approximate human preference for text-to-image generation

HPS v2 is the state-of-the-art human reward model so far and offers a metric more close to human preference

human preference is regarded as the ground-truth and ultimate evaluation method for text-to-image generation. Thus, we regard human preference and HPS v2 as the two most important metrics.

"VBench: Comprehensive Benchmark Suite for Video Generative Models" CVPR-highlight, 2023 Nov 29 ⭐ paper code pdf note Authors: Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu
"Perceptual Video Quality Assessment: A Survey" Arxiv, 2024 Feb 5 paper code pdf note Authors: Xiongkuo Min, Huiyu Duan, Wei Sun, Yucheng Zhu, Guangtao Zhai
"PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild" CVPR2024, 2024 May 5 paper
"A Survey of AI-Generated Video Evaluation" Arxiv, 2024 Oct 24 paper code pdf note Authors: Xiao Liu, Xinhao Xiang, Zizhong Li, Yongheng Wang, Zhuoheng Li, Zhuosheng Liu, Weidi Zhang, Weiqi Ye, Jiawei Zhang
"VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models" Arxiv, 2024 Nov 20 paper code pdf note Authors: Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, Yaohui Wang, Xinyuan Chen, Ying-Cong Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

survey_IQA.md

survey_IQA.md

Survey IQA

metrics

Full-reference

PSNR

SSIM

DISTS

LPIPS

No-Reference

FID

BRISQUE

NIQE

UHD

Video

Files

survey_IQA.md

Latest commit

History

survey_IQA.md

File metadata and controls

Survey IQA

metrics

Full-reference

PSNR

SSIM

DISTS

LPIPS

No-Reference

FID

BRISQUE

NIQE

UHD

Video