Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DuckDuckGo returns 418 when requesting with Python

I'm writing a script that opens firefox with the first duckduckgo result it finds for a given term.
I know. Its very useful.

But when copying a url from my browser and requesting it with python:

url = "https://duckduckgo.com/?t=ffab&q=python+request+duckduckgo&ia=software"
req = r.get(url)

Duckduckgo returns a 418.

What is happening?
Does duckduckgo recognize that I'm doing automated requests and decides to turn into a teapot?
And if so, how can I avoid it?

Also I know there's a duckduckgo api for python but I'm doing this project to get started with requests and beautifulsoup.

like image 867
iaquobe Avatar asked Sep 05 '25 16:09

iaquobe


2 Answers

You need to add a 'user-agent' header, even one as simple as:

req = r.get(url, headers={'user-agent': 'my-app/0.0.1'})

Update: complete code with reasonably named variables

import requests

url = "https://duckduckgo.com/?t=ffab&q=python+request+duckduckgo&ia=software"
response = requests.get(url, headers={'user-agent': 'my-app/0.0.1'})
response.raise_for_status() # throw an exception if not a 200 return code
# or test response.status_code if you do not want to throw an exception
data = response.text # this is the HTML assuming that is what the URL returns
print(data)

Prints:

<!DOCTYPE html><html lang="en_US" class="no-js has-zcm  no-theme "><head><meta http-equiv="content-type" content="text/html; charset=utf-8"><title>python request duckduckgo at DuckDuckGo</title><link rel="stylesheet" href="/s1909.css" type="text/css"><link rel="stylesheet" href="/r1909.css" type="text/css"><meta name="robots" content="noindex,nofollow"><meta name="referrer" content="origin"><meta name="apple-mobile-web-app-title" content="python request duckduckgo"><link rel="preconnect" href="https://links.duckduckgo.com"><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /><link id="icon60" rel="apple-touch-icon" href="/assets/icons/meta/DDG-iOS-icon_60x60.png?v=2"/><link id="icon76" rel="apple-touch-icon" sizes="76x76" href="/assets/icons/meta/DDG-iOS-icon_76x76.png?v=2"/><link id="icon120" rel="apple-touch-icon" sizes="120x120" href="/assets/icons/meta/DDG-iOS-icon_120x120.png?v=2"/><link id="icon152" rel="apple-touch-icon" sizes="152x152" href="/assets/icons/meta/DDG-iOS-icon_152x152.png?v=2"/><link rel="image_src" href="/assets/icons/meta/DDG-icon_256x256.png"/><script type="text/javascript">var ct,fd,fq,it,iqa,iqm,iqs,iqp,iqq,qw,dl,ra,rv,rad,r1hc,r1c,r2c,r3c,rfq,rq,rds,rs,rt,rl,y,y1,ti,tig,iqd,locale,settings_js_version='s2475.js',is_twitter='',rpl=1;fq=0;fd=1;it=0;iqa=0;iqbi=0;iqm=0;iqs=0;iqp=0;iqq=0;qw=3;dl='en';ct='US';iqd=0;r1hc=0;r1c=0;r3c=0;rq='python%20request%20duckduckgo';rqd="python request duckduckgo";rfq=0;rt='';ra='ffab';rv='';rad='';rds=30;rs=0;spice_version='2000';spice_paths='{}';locale='en_US';settings_url_params={};rl='us-en';rlo=0;df='';ds='';sfq='';iar='';vqd='3-149609696422854606330346289888770817762-151254838983446808561626137548835915940';safe_ddg=0;show_covid=0;</script><meta name="viewport" content="width=device-width, initial-scale=1" /><meta name="HandheldFriendly" content="true" /><meta name="apple-mobile-web-app-capable" content="no" /></head><body class="body--serp"><input id="state_hidden" name="state_hidden" type="text" size="1"><span class="hide">Ignore this box please.</span><div id="spacing_hidden_wrapper"><div id="spacing_hidden"></div></div><script type="text/javascript" src="/lib/l118.js"></script><script type="text/javascript" src="/locale/en_US/duckduckgo14.js"></script><script type="text/javascript" src="/util/u469.js"></script><script type="text/javascript" src="/d2827.js"></script><div class="site-wrapper  js-site-wrapper"><div class="welcome-wrap js-welcome-wrap"></div><div id="header_wrapper" class="header-wrap js-header-wrap"><div id="header" class="header  cw"><div class="header__search-wrap"><a tabindex="-1" href="/?t=ffab" class="header__logo-wrap js-header-logo"><span class="header__logo js-logo-ddg">DuckDuckGo</span></a><div class="header__content  header__search"><form id="search_form" class="search--adv  search--header  js-search-form" name="x" action="/"><input type="text" name="q" tabindex="1" autocomplete="off" id="search_form_input" class="search__input search__input--adv js-search-input" value="python request duckduckgo"><input id="search_form_input_clear" class="search__clear  js-search-clear" type="button" tabindex="3" value="X"/><input id="search_button" class="search__button  js-search-button" type="submit" tabindex="2" value="S" /><a id="search_dropdown" class="search__dropdown" href="javascript:;" tabindex="4"></a><div id="search_elements_hidden" class="search__hidden  js-search-hidden"></div></form></div></div><div id="duckbar" class="zcm-wrap  zcm-wrap--header  is-noscript-hidden"></div></div><div class="header--aside js-header-aside"></div></div><div id="zero_click_wrapper" class="zci-wrap"></div><div id="vertical_wrapper" class="verticals"></div><div id="web_content_wrapper" class="content-wrap "><div class="serp__top-right  js-serp-top-right"></div><div class="serp__bottom-right  js-serp-bottom-right"><div class="js-feedback-btn-wrap"></div></div><div class="cw"><div id="links_wrapper" class="serp__results js-serp-results"><div class="results--main"><div class="search-filters-wrap"><div class="js-search-filters search-filters"></div></div><noscript><meta http-equiv="refresh" content="0;URL=/html?q=python%20request%20duckduckgo"><link href="/css/noscript.css" rel="stylesheet" type="text/css"><div class="msg msg--noscript"><p class="msg-title--noscript">You are being redirected to the non-JavaScript site.</p>Click <a href="/html/?q=python%20request%20duckduckgo">here</a> if it doesn't happen automatically.</div></noscript><div id="message" class="results--message"></div><div class="ia-modules js-ia-modules"></div><div id="ads" class="results--ads results--ads--main is-invisible js-results-ads"></div><div id="links" class="results is-invisible js-results"></div></div><div class="results--sidebar js-results-sidebar"><div class="sidebar-modules js-sidebar-modules"></div><div class="is-invisible js-sidebar-ads"></div></div></div></div></div><div id="bottom_spacing2"> </div></div><script type="text/javascript"></script><script type="text/JavaScript">function nrji() {nrj('/t.js?q=python%20request%20duckduckgo&l=us-en&s=0&dl=en&ct=US&ss_mkt=us&p_ent=&ex=-1');nrj('/d.js?q=python%20request%20duckduckgo&l=us-en&s=0&a=ffab&dl=en&ct=US&ss_mkt=us&vqd=3-149609696422854606330346289888770817762-151254838983446808561626137548835915940&p_ent=&ex=-1&sp=1');;};DDG.ready(nrji, 1);</script><script src="/g2379.js"></script><script type="text/javascript">DDG.page = new DDG.Pages.SERP({ showSafeSearch: 0, instantAnswerAds: false });</script><div id="z2"> </div><div id="z"></div></body></html>

You have to understand that the HTML may contain JavaScript that executes after the page is loaded which modifies the page content. So what you see visibly in a browser may not correspond to what you see in HTML loaded via requests. If that is the case you probably need a different tool such as selenium to drive an actual web browser.

like image 157
Booboo Avatar answered Sep 07 '25 16:09

Booboo


While this might not be an exact issue, duckduckgo is definitely not for bots and scraping its search content. Take a look at their robots.txt file. This file from websites tells you how to treat their website for crawlers vs users – what pages are allowed to be crawled and which ones can't be crawled.

From the looks of it, all of what you're trying to crawl is Disallowed. There's a chance you're getting teapot as a response because that is their response to crawlers without permission.

If you're trying to learn about requests, it might be better to avoid search engines. Most of the common ones that I know of disallow outside crawlers.

like image 34
M Z Avatar answered Sep 07 '25 18:09

M Z