Skip to content

Accuracy of Phi-3.5 Vision Models in Web-LLM is way off #586

Open
@atlury

Description

@atlury

I have been testing the accuracy of Phi-3.5 Vision models in Web-LLM https://chat.webllm.ai/#/chat vs Nvidia nims available https://build.nvidia.com/microsoft/phi-3_5-vision-instruct

  1. The Nvidia Nims results
    (a) Image - square root image
    Question - describe the image mathematically and solve it
    Answer - The image shows a hand-drawn representation of a square root symbol with the number 25 underneath it. Mathematically, this can be represented as √25. The square root of 25 is 5, so the solution to the mathematical expression is 5.

(b) Image - x+y=4 image
Question - describe the image mathematically and solve it
Answer - The image shows a simple algebraic equation and two separate equations. The main equation is 'x + y = 4', which is a linear equation with two variables, x and y. The first sub-equation is 'x = 2', which gives the value of x. The second sub-equation is 'y = ?', which is asking for the value of y. To solve for y, we substitute the value of x into the main equation: 2 + y = 4. By subtracting 2 from both sides, we find that y = 2. Therefore, the solution to the system of equations is x = 2 and y = 2.

(c) Image - x^2+y^2=100 image
Question - describe the image mathematically and solve it
Answer - The image contains a handwritten equation and a question. The equation is:
x^2 + y^2 = 100
The question asks for the value of y when x is given as 5. To solve for y, we substitute x with 5 in the equation:
(5)^2 + y^2 = 100 25 + y^2 = 100
Now, we solve for y^2:
y^2 = 100 - 25 y^2 = 75
Taking the square root of both sides to solve for y:
y = ±√75 y = ±√(25 * 3) y = ±5√3
Therefore, the value of y is ±5√3.

  1. The Web-LLM (fp32) results from https://chat.webllm.ai/#/chat
    (a) Image - square root image
    Question - describe the image mathematically and solve it
    Answer - The image provided does not contain a clear mathematical express to interpret or solve it......

(b) Image - x+y=4 image
Question - describe the image mathematically and solve it
Answer - The image provided seems to contain a hand-drawn image of a mathematical expression......

(c) Image - x^2+y^2=100 image
Question - describe the image mathematically and solve it
Answer -

The web-llm fp32 phi-3.5 vision doesn't really give the same results and doesnt solve it. Any ideas?

Attached images
sq
x y
xplusysq

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions