Thursday, 5 December 2013

Significance of User-Agent Field: Detection of Automated/Programatic HTTP Requests....!

Suppose you request a webpage using a program, say a Python script. Have you ever wondered whether a web server can differentiate it from a manual request by a user via a web browser, say Firefox. Yes it is possible for the web server to differentiate using various methods. One such method is analyzing the value of the 'User-Agent' field in the HTTP request header from the client.

You can see the difference from the output of the following Python script:

import lxml.html as lh
import urllib2
url = "http://www.murl.mobi/headers.php"
xpath = "/html/body/table/tr[2]/td/table/tr/td[2]/div[2]/text()"
doc = lh.parse(urllib2.urlopen(url))
headers = doc.xpath(xpath)
print 'Output of the first request without user-agent header supplied'
print '\n'.join(headers)
http_headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; rv:25.0)" +
"Gecko/20100101 Firefox/25.0"}
request_object = urllib2.Request(url, None, http_headers)
doc = lh.parse(urllib2.urlopen(request_object))
headers = doc.xpath(xpath)
print 'Output of the second request with user-agent header supplied'
print '\n'.join(headers)
The output of the above script is:

Output of the first request without user-agent header supplied
Accept-Encoding: identity
Host: www.murl.mobi
Connection: close
User-Agent: Python-urllib/2.7
Output of the second request with user-agent header supplied
Accept-Encoding: identity
Host: www.murl.mobi
Connection: close
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0

1 comment:

  1. casino.org - drmcd
    Casino.org is 광주 출장샵 the world's leading 강원도 출장마사지 independent independent online gambling authority. Casino information from the best gambling providers 부산광역 출장안마 in the 경산 출장샵 industry. Get your 대전광역 출장샵 tips delivered  Rating: 3.4 · ‎7 votes

    ReplyDelete