Saturday, 30 August 2014

Stock Price/Volume Analysis Using Python and PyCluster

In this blog post we will be looking at how k-means (http://en.wikipedia.org/wiki/K-means_clustering) cluster analysis can be used to create clusters of (price, volume) data of stocks. 

The following python script can be used to create clusters. The input is trading date, close price and volume obtained from a comma separated file. The number of clusters can be set at the time of execution of the script. Furthermore, in this specific example, we will be clustering the data into 2/3/4/5 clusters. Also, note that, if there are less than a specified percentage of points within a cluster, we believe these points maybe a result of some extraordinary events related to that particular stock (outlier). This percentage largely depends on the number of data points and the number of clusters.



The following figure shows the output of the above Python script with 2 clusters:



The following figure shows the output of the above Python script with 3 clusters:





The following figure shows the output of the above Python script with 4 clusters:

The following figure shows the output of the above Python script with 5 clusters: