Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing Token #76

Open
davidwozabal opened this issue Jun 20, 2020 · 7 comments
Open

Passing Token #76

davidwozabal opened this issue Jun 20, 2020 · 7 comments

Comments

@davidwozabal
Copy link

I am trying to use the library on captchas that I get with Google-Scholar when trying to get citing papers for a source. A typical URL looks like

https://scholar.google.com/scholar?cites=12685256029779217548&as_sdt=2005&sciodt=0,5&hl=en

which if fetched with python sometimes produces a captcha. The HTML code of the captcha site contains the following tags, which seem to be relevant for the use of the anticaptcha library:

<script> 
	function gs_captcha_cb(){grecaptcha.render("gs_captcha_c",{"sitekey":"6LfFDwUTAAAAAIyC8IeC3aGLqVpvrB6ZpkfmAibj","callback":function(){document.getElementById("gs_captcha_f").submit()}});};
</script>
<form method="get" id="gs_captcha_f">
	<h1>Please show you&#39;re not a robot</h1>
	<div id="gs_captcha_c"></div>
	<script src="//www.google.com/recaptcha/api.js?onload=gs_captcha_cb&render=explicit&hl=en" async defer></script>
	<input type=hidden name="hl" value="en">
	<input type=hidden name="as_sdt" value="0,5">
	<input type=hidden name="sciodt" value="0,5">
	<input type=hidden name="cites" value="12685256029779217548">
	<input type=hidden name="scipsc" value="">
</form>

I had a look at recaptcha_selenium.py. However, the above HTML code does not contain the function onSuccess() and my attempts to construct another function call such as

driver.execute_script("document.getElementById('gs_captcha_f').submit({})';".format(token))

did not yield anything.

Is there a way to deal with the situation above using the anticaptcha library?

@ad-m
Copy link
Owner

ad-m commented Jun 21, 2020

I can not reproduce captcha challenge. Could you verify result when you adapt callback sniffer (see https://github.com/ad-m/python-anticaptcha/blob/master/examples/recaptcha_selenium_callback.py )?

@davidwozabal
Copy link
Author

I can not reproduce captcha challenge.

The problem is that the captcha only appears after several requests of the above type. Hence, it is hard to reproduce.

Could you verify result when you adapt callback sniffer (see https://github.com/ad-m/python-anticaptcha/blob/master/examples/recaptcha_selenium_callback.py )?

I am not sure how to adapt the example. If I interpret the code correctly, you are passing the token twice. The first time by setting the content of g-recaptcha-response in

driver.execute_script("document.getElementById('g-recaptcha-response').innerHTML='{}';".format(token))

and the second time by calling

driver.execute_script("grecaptcha.recaptchaCallback[0]('{}')".format(token))

The problem is that the page that I am getting has no element g-recaptcha-response and when I execute the second line I get the error

selenium.common.exceptions.JavascriptException: Message: javascript error: Cannot read property '0' of undefined

I guess the object grecaptcha is called different in my case?

If I just execute the first comment (setting the response) and then submit the form by calling

driver.execute_script("document.getElementById('gs_captcha_f').submit()';")

I get the error

selenium.common.exceptions.JavascriptException: Message: javascript error: Invalid or unexpected token

@davidwozabal
Copy link
Author

I tried to get a reproducible captcha and came up with the following request

https://scholar.google.com/scholar?cites=12685256029779217548&as_sdt=2005&sciodt=0,5&hl=en&num=20

The argument num=20 produces a captcha for every call embedded in a site with a slightly different code than the captures I was facing before. However, if I could solve this, it would maybe be a start.

I tried adapting the code from recaptcha_selenium_callback.py and ended up with the following code

from selenium.webdriver.chrome.options import Options
from python_anticaptcha import AnticaptchaClient, NoCaptchaTaskProxylessTask

request = 'https://scholar.google.com/scholar?cites=12685256029779217548&as_sdt=2005&sciodt=0,5&hl=en&num=20'
options = Options()
driver = Chrome(chrome_options=options)
driver.get(request)

api_key = '...'
site_key = '6LfwuyUTAAAAAOAmoS0fdqijC2PbbdH4kjq62Y1b'
client = AnticaptchaClient(api_key)
task = NoCaptchaTaskProxylessTask(request, site_key)
job = client.createTask(task)
job.join()
token = job.get_solution_response()

driver.execute_script(
        "document.getElementById('g-recaptcha-response').innerHTML='{}';".format(token)
    )
driver.execute_script("submitCallback('{}')".format(token))
result = driver.page_source

The code runs without any errors. However, the display in the browser window does not change and also the variable result still contains the captcha page.

Where did I go wrong?

@ad-m
Copy link
Owner

ad-m commented Jun 21, 2020

@davidwozabal , could you provide code to reproduce captcha challenge? I do not receive the captcha challenge at the address provided. If I receive such a code - I will be able to analyze the problem more effectively.

@davidwozabal
Copy link
Author

The link

https://scholar.google.com/scholar?cites=12685256029779217548&as_sdt=2005&sciodt=0,5&hl=en&num=20

above produces a captcha challenge for me (even if I open it from a normal browser from different computers).

@fashan7
Copy link

fashan7 commented Aug 22, 2021

@fashan7
Copy link

fashan7 commented Aug 23, 2021

Found the solution for the problem
see #92

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants