Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the tracking parameter t=hj in the duckduckgo URL intentional? #465

Closed
Alexhans opened this issue Aug 14, 2022 · 7 comments
Closed

Is the tracking parameter t=hj in the duckduckgo URL intentional? #465

Alexhans opened this issue Aug 14, 2022 · 7 comments

Comments

@Alexhans
Copy link

Hi,

I was looking at the different query parameters in the search urls when I found that duckduckgo's t is a tracking code (I don't know what hj as the value means:
https://help.duckduckgo.com/privacy/t/

Through partnerships with developers and companies, DuckDuckGo has been integrated into many applications. In these partnerships, a portion of DuckDuckGo's advertising revenue is sometimes shared back. To assign advertising revenue and collect anonymous aggregate usage information, developers add a unique "&t=" parameter to searches made through their applications.

'duckduckgo': SCHEME + 'duckduckgo.com/html?q=site:{0}%20{1}&t=hj&ia=web'

@gleitz
Copy link
Owner

gleitz commented Aug 22, 2022

I'm not sure if it is necessary or not, but currently duckduckgo is not working as an engine due to #404

@gleitz gleitz closed this as completed Aug 22, 2022
@gleitz
Copy link
Owner

gleitz commented Aug 22, 2022

If you'd like to figure out that issue, I could use some help with it!

@Alexhans
Copy link
Author

I can try. I actually came across this while quickly playing around to see If I could add brave support.

Support works but I wanted to understand the decisions around URLs and usage. I did get temporarily blocked when I added the unit tests in brave which it's somewhat expected (I still remember the Google has been DDoSing SourceHut for over a year story).

The only thing I can think of is ask duckduckgo & brave to see if they have specific ways to interact programmatically with their websites.

I do think the answer will not be satisfactory for duckduckgo since, in their instant answers api page they state:

This API does not include all of our links, however. That is, it is not a full search results API or a way to get DuckDuckGo results into your applications beyond our instant answers. Because of the way we generate our search results, we unfortunately do not have the rights to fully syndicate our results, free or paid. For the same reason, we cannot allow framing our results without our branding. Please see our partnerships page for more info on guidelines and getting in touch with us.

So crawling ethically (Without trying to circumvent through proxies or similar) will invariably get blocked. For DDG, it might be a case of choosing whether to remove it entirely or just support instant answers through their API (For any API based access, users could get their own tokens like in OpenBBTerminal

@gleitz
Copy link
Owner

gleitz commented Aug 25, 2022

Yes I am not optimistic that API access will be given, so we're left with crawling.

I also get rate limited during development, which is why I have the caching mechanism in place when running tests.

@gleitz
Copy link
Owner

gleitz commented Aug 25, 2022

I didn't know brave had a search engine. I would accept that PR if you want to open it.

@Alexhans
Copy link
Author

Alexhans commented Sep 8, 2022

I'll take a look at what we discussed over the weekend and create the pull request. It's been a busy period.

@gleitz
Copy link
Owner

gleitz commented Sep 13, 2022

No worries - take your time and thanks again for any support you can give to the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants