2012/07/27

python tips: usage of scipy.spatial.KDTree

kd-tree is a well-known algorithm for searching spatially distributed points. Below code shows how to use kd-tree implemented in scipy.spatial.
#!/usr/bin/env python                                                                                                                                                                                          
#coding:utf-8                                                                                                                                                                                                  
                                                                                                                                                                                                               
import numpy as np                                                                                                                                                                                             
import scipy.spatial as ss                                                                                                                                                                                     
from itertools import combinations
from itertools import permutations

def main():
    x, y = np.mgrid[0:100, 0:100]
    points = zip(x.ravel(), y.ravel())
    tree = ss.KDTree(points)
    # get index of each point which distance from (x=0, y=0) is under 1
    a = tree.query_ball_point([0, 0], 1)
    print [points[i] for i in a]

if __name__ == "__main__":
    main()
output of above code is like:
~$ python above_code.py 
[(0, 0), (0, 1), (1, 0)]

2012/07/24

python tips: usage of nkf.python2

NKF.python2 enables us to convert Japanese text encoding without specifying input text coding. This article explains how to install and use NKF.python2. First, install NKF.python2 on your linux system:
# download nkf using git
$ git clone git://git.sourceforge.jp/gitroot/nkf/nkf.git
Cloning into 'nkf'...
remote: Counting objects: 1378, done.
remote: Compressing objects: 100% (432/432), done.
remote: Total 1378 (delta 945), reused 1378 (delta 945)
Receiving objects: 100% (1378/1378), 503.08 KiB, done.
Resolving deltas: 100% (945/945), done.

$ cd nkf
$ sudo make install #if you want
$ cd NKF.python2
$ python setup.py build
$ python setup.py install 
Second, check NKF.python2 is installed correctly:
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import nkf
>>> 
Finally, test NKF.python2 (path_to_file points japanese enceded text file):
with open(path_to_file, 'r') as F:
    data = [nkf("-w -Lu -d", row) for row in F]
Above code converts encoding of each row to utf-8 *without* specifying types of encoding of input file. This feature is very useful if you do not know which encode is used for input file or you need to read several input files which have different encoding each simultaneously.

2012/07/23

python tips: calculate distance between N points and M points using numpy array

This is memory consuming way of calculate distance between N points and M points. Let l_lat and l_lon are vectors for N points and v_lat and v_lon are for M points. Using numpy array, distance calculation can be written as follows.
#!/usr/bin/env python                                                                                                                                                                                        
#coding:utf-8                                                                                                                                                                                                

import numpy

M = 3
N = 5

# points1                                                                                                                                                                                                    
l_lat = numpy.arange(N)
l_lon = numpy.arange(N)
m_l_lat = numpy.tile(l_lat, (M, 1))
m_l_lon = numpy.tile(l_lon, (M, 1))

# points2                                                                                                                                                                                                    
v_lat = numpy.arange(M)
v_lon = numpy.arange(M)
m_v_lat_t = numpy.tile(v_lat, (N, 1)).T
m_v_lon_t = numpy.tile(v_lon, (N, 1)).T

distance = numpy.sqrt((m_v_lat_t - m_l_lat) ** 2 + (m_v_lon_t - m_l_lat) ** 2)
print distance
Although code generates lat and lon with numpy.arange(), in real case you should fill each vector with appropriate lat and lon values. Below is an output of previous code.
~$ python test_calc_distance.py 
[[ 0.          1.41421356  2.82842712  4.24264069  5.65685425]
 [ 1.41421356  0.          1.41421356  2.82842712  4.24264069]
 [ 2.82842712  1.41421356  0.          1.41421356  2.82842712]]

100