Remo Talks.....: 2014

Monday, 27 October 2014

Bing Search Using Python

Partial output of the above Python script is shown below:

Sunday, 5 October 2014

Calculation of Beta of Stocks Using Python Libraries (Stock Risk Analysis)

As an example, let us consider Coca Cola (NYSE:KO). Historical Coca Cola stock data can be downloaded from Google Finance:
Historical NYSE:KO Data

Suppose we consider NYSE:SPY as the market indicator/index in calculating beta. Historical stock data of NYSE:SPY can be downloaded from Google Finance:
Historical NYSE:SPY Data

The following python script finds the beta value for market indicator and symbol historical data file names passed as command line parameters:

Saturday, 30 August 2014

Stock Price/Volume Analysis Using Python and PyCluster

In this blog post we will be looking at how k-means (http://en.wikipedia.org/wiki/K-means_clustering) cluster analysis can be used to create clusters of (price, volume) data of stocks.

The following python script can be used to create clusters. The input is trading date, close price and volume obtained from a comma separated file. The number of clusters can be set at the time of execution of the script. Furthermore, in this specific example, we will be clustering the data into 2/3/4/5 clusters. Also, note that, if there are less than a specified percentage of points within a cluster, we believe these points maybe a result of some extraordinary events related to that particular stock (outlier). This percentage largely depends on the number of data points and the number of clusters.

The following figure shows the output of the above Python script with 2 clusters:

The following figure shows the output of the above Python script with 3 clusters:

The following figure shows the output of the above Python script with 4 clusters:

The following figure shows the output of the above Python script with 5 clusters:

Wednesday, 18 June 2014

Print HTTP Response Header: Python

When you make an HTTP request to retrieve a webpage you get an HTTP response back. This response includes the usual response body that we see in the web browsers and an HTTP header, which is usually not shown in the web browsers. But this HTTP header may give some useful insight about the web page, web server, cookies, etc.

The following simple python script prints the HTTP response header for the URL passed as a command line parameter:

Output of the above script, with http://python.org as the command line parameter, is shown below:

Monday, 5 May 2014

Compare HTML Pages: HTML Tags Counter

There are many instances in which we would want to compare HTML templates programmatically. One of the simplest methods or one of the factors that could be used to rule out the similarity of two HTML pages is to count the number of HTML tags in those pages and compare the same.

The following Python script helps to get the HTML tag count of an HTML webpage given a URL:

A sample output of the above script is given below:

Sunday, 4 May 2014

An Extremely Simple and Effective Web Crawler in Python

Web crawlers also known as web spiders are used in retrieving web links/pages by following links starting from the seed/initial web page. Crawlers are widely used in building search engines. The retrieved links/pages have numerous applications.

Primary functions/operations of web link crawlers are:

1. Retrieve seed web page

2. Extract all valid URLs/links

3. Visit every link extracted in step 2.

4. Stop if depth has reached maximum depth.

The following Python script is an extremely simple and effective web crawler! It can be configured to use different seed URLs and also different depth.

A partial output of the above script with http://www.cnn.com as the seed URL is given below:

Saturday, 19 April 2014

Typo Squatting: Beware of Typos in Domain Names/URLs

Typo Squatting(TypoSquatting) is a form of cyber squatting which is based on the errors made by users when they enter a web address in a browser. This generally happens with very popular websites which are accessed by a large number of users.

Typosquatters are the people who register such typo websites generally with malicious intent. These typo websites may lead to either Parked Domains, Phishing Websites or Malicious Websites.

Typos are very common and hence typosquatters register a large number of domain names that are typos of popular websites. When a user makes an error while trying to access a website may be lead to a different website than the actual one the use intended to visit. This can result in one of many unpredictable consequences. Hence, TypoSquatting is an important topic in the web/cyber security research/industry. Also, from the internet user's point of view, it is vital to ensure that maximum care is taken in avoiding typos.

There are many reasons/actions which result in typos when entering a web address in the browser or when entering any form of text using a regular keyboard. Among those, two common forms of Typos are:

1. Character Omission Typo: This occurs when a user misses a character while entering a URL. For example, if the user enters http://www.bloger.com while intending to visit http://www.blogger.com, this results in a character omission typo.
2. Character Swap Typo: This occurs when a user accidentally swaps two adjacent characters in a web address. Considering a similar example, suppose the user enters http://www.bolgger.com while intending to visit http://www.blogger.com, this results in a character swap typo.

You may use the following Python script to analyze the type and number of typos that may result when trying to access a website. Note that, the following script takes the domain name as input. That is, if the user is trying to access http://www.blogger.com, blogger.com is called the domain name.

Output of the above script is given below: