Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch reviews of an app #27

Open
BaseMax opened this issue Apr 1, 2023 · 13 comments
Open

Fetch reviews of an app #27

BaseMax opened this issue Apr 1, 2023 · 13 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@BaseMax
Copy link
Owner

BaseMax commented Apr 1, 2023

Hi there,

Gift

I'm here to say I will give a gift to anyone who can add this feature to the project and library.

Description

The feature is to fetch a list of reviews of an application. For example, we are going to fetch the list of all reviews one by one and handle pagination to fetch all reviews.

Sample page: https://play.google.com/store/apps/details?id=com.king.crash&hl=en&gl=US

Best,
M.

@BaseMax BaseMax added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers labels Apr 1, 2023
@IzzySoft
Copy link
Contributor

IzzySoft commented Apr 1, 2023

Well, as you pinged me on that and even assigned it to me… I had this on my list already indeed. Just need some time to implement. Where to get the data from is already marked in a note I have here, back from May 2022 (never got to that as I myself didn't need it).

Are you in a hurry with that so I shall name the hints here? Or is it "whenever time permits" and would at least have time until, say May 2023?

@BaseMax
Copy link
Owner Author

BaseMax commented Apr 1, 2023

Just to save time: https://github.com/Ne-Lexa/google-play-scraper may be useful

@IzzySoft
Copy link
Contributor

IzzySoft commented Apr 1, 2023

Thanks, but why? As pointed out, I've already marked it here, just needed to implement (PR on its way). Thanks though, maybe that link can fill some gaps.

@BaseMax
Copy link
Owner Author

BaseMax commented Apr 1, 2023

Well, as you pinged me on that and even assigned it to me… I had this on my list already indeed. Just need some time to implement. Where to get the data from is already marked in a note I have here, back from May 2022 (never got to that as I myself didn't need it).

Are you in a hurry with that so I shall name the hints here? Or is it "whenever time permits" and would at least have time until, say May 2023?

Great. do it when you can. It's okay with me. unless someone else does it faster :)

@IzzySoft
Copy link
Contributor

IzzySoft commented Apr 1, 2023

unless someone else does it faster :)

Too late for that now 🤣 Your turn, please test!

@IzzySoft
Copy link
Contributor

IzzySoft commented Apr 1, 2023

OK, checked your reference. Looks like my implementation has more details already 😉 Theirs:

        return new Review(
            $reviewId,
//            $reviewUrl,
            $userName,
            $text,
            $avatar,
            $date,
            $score,
            $likeCount,
            $reply,
            $appVersion
        );

Mine puts the reviewer's data into its own array (including userID, name, avatar, background image – ID+BG are missing "over there"). If you want the fields could still be renamed with my implementation: thumbs=>likeCount for example, or naming the id field in the user record userId, user_id, reviewer_id or what you like. Currently, for easy comparison:

[
   "review_id",
   "reviewed_version",
   "review_date",
   "text",
   "stars",
   "thumbs",
   "reviewer" = [
          "id",
          "name",
          "avatar",
          "bg_image"
   ]
]

So maybe text=>review_text as well…

Hm, $reply? That one I don't have. Might need to check what they put in there…

@BaseMax
Copy link
Owner Author

BaseMax commented Apr 1, 2023

Wow, nice.

I wonder if we can implement a function to get all reviews with a solution to get all, one by one or pagination.

@BaseMax
Copy link
Owner Author

BaseMax commented Apr 1, 2023

I do not forget what I said.
As I said, I want to pay for a gift. Show me possible ways for it.

@BaseMax
Copy link
Owner Author

BaseMax commented Apr 1, 2023

As it seems you did it so you win it.

I am going to have a trip to different countries soon. It seems you are in Munich. I was invited to give a talk at a university in Belgium. It seems Munich is near if it was, I would like to meet you and give you a special gift from my country.
If that not works. I will do it online.

Best,
M.

@IzzySoft
Copy link
Contributor

IzzySoft commented Apr 1, 2023

I wonder if we can implement a function to get all reviews with a solution to get all, one by one or pagination.

For that I'd first need to figure where that XHR goes to (e.g. by watching network traffic in the browser console while triggering such an XHR), and next how to parametrize it (how much can it pull? how often do I need to loop? what other request parameters (e.g. referrer) would be needed?). Just tried that, only see requests to load images (JPG, PNG) – later followed by some POST requests where Google wants to log stuff (haha, failing as I blocked XHR to gstatic, LOL – and yikes, seems like those were not even encrypted – or the lock icon is missing because the connection was not established).

image

As it's loaded dynamically into the very same page I expect the returned structures to be similar to what's already there, so the logic implemented could be reused. So I'd move the parsing part to a separate (protected?) method to be called in both places, leaving the "initial set" with the general app data while having the "full list" retrieved by a separate method.

That said and combined: there seem to be more than the 40 reviews initially collected, but I do not yet have an idea on how to fetch those. Maybe that's the reason the other scraper has no such reference either (or I missed it). But wait:

$nextToken = $json[1][1] ?? null;
return [$reviews, $nextToken];

That looks like it would return a pointer on where to get more. Let's see what $json[1][1] looks like:

Array
(
    [0] => 
    [1] => CmgKZgpkMCwxMDAxMDAwLjQ2MTM5NzI2MDQsMTg4Nzg1MTU3NzEyLCJodHRwOi8vbWFya2V0LmFuZHJvaWQuY29tL2RldGFpbHM_aWQ9djI6Y29tLmtpbmcuY3Jhc2g6MSIsMSxmYWxzZQ
)

So there is something.

        $reviews = $this->gplay->getReviews(
            $appId,
            $limit = 555,
            SortEnum::NEWEST()
        );

Do you think the same when seeing $limit there? So here's the entry-point – and here's where we can find the needed query parameters and the call. Would still be some work to cobble that together… This part looks familiar to me, though:

        $formParams = [
            'f.req' => '[[["' . self::RPC_ID_REVIEWS . '","[null,null,[2,' . $sort->value(
                ) . ',[' . $limit . ',null,' . ($token === null ? 'null' : '\\"' . $token . '\\"')
                . ']],[\\"' . $requestApp->getId() . '\\",7]]",null,"generic"]]]',
        ];

I remember having used that f.req in some other context as well…

As it seems you did it so you win it.

Uh? Wow, thanks!

I am going to have a trip to different countries soon. It seems you are in Munich. I was invited to give a talk at a university in Belgium. It seems Munich is near if it was, I would like to meet you and give you a special gift from my country.
If that not works. I will do it online.

Munich is correct – but "near" depends on your point of view. It's about 700 km. I'd be happy to meet you, though, if you'll be in Munich!

@IzzySoft
Copy link
Contributor

If you're still looking for a full list, I've just stumbled upon this: https://gist.github.com/kamoo1/af655f05700eb76bb29aec876493ed90 (which is Python, but might fit your use case).

@BaseMax
Copy link
Owner Author

BaseMax commented Jun 8, 2023

Yeaaaaaaaah! Nice. It will be good if we can have this one here in PHP to make it possible to iterate and get all reviews on most pages.

@IzzySoft
Copy link
Contributor

IzzySoft commented Jun 8, 2023

I unfortunately lack the time to implement this (at least currently™). Can hardly keep up with my existing queue. But the current code already contains the "token" needed to fetch more results, so our code could use that probably. With a separate method, it might be more performant (and avoiding unneeded traffic) to start at a dedicated place (without obtaining the "base data" for the app), though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants