
how to paste latex expression to blogger

Step1. install firefox add-on named greasemonkey
Step2. reboot firefox
Step3. install script from userscripts.org (http://userscripts.org/scripts/show/41481)


above equation is converted like below :-)

postgresDB operation with python

below code describes how to check if 'table' is exist in database or not using pg package. dirty code ... :-(

import pg
import sys

def is_table_exist1(dbobj, table):
""" return True if table is exist in database """
print "is_table_exist1 called"
for i in dbobj.get_tables():
if i == table:
return True
except IndexError, (errno):
print "%s" %(errno)
return False
return False

def is_table_exist2(dbobj, table):
""" return True if table is exist in database """
print "is_table_exist2 called"
print dbobj.get_attnames(table)
return True
except pg.ProgrammingError, (errno):
print "%s" %(errno)
return False

def main():
# set default hostname
defhost = pg.get_defhost()

# set default port number
defport = pg.get_defport()

# set default database
defbase = pg.get_defbase()

# initialize pg.DB object
mydb = pg.DB(user='postgres', passwd='postgres')

# print names of all databases
dblist = mydb.get_databases()

# check if table is exist in database
if is_table_exist1(mydb, 'public.foo') == False:

# check if table is exist in database
if is_table_exist2(mydb, 'public.foo') == False:

for r in mydb.query(
"SELECT v1, v2, v3, v4 FROM %s" % ('public.foo')
print '%(v1)s %(v2)s %(v3)s %(v4)s' % r

#close dbobj

if __name__ == "__main__":

hello, pig

This is my first time to touch pig which is one of hadoop-related project.

yaboo@maniac:~$ pig -x local

2011-06-22 23:45:32,781 [main] INFO org.apache.pig.Main - Logging error messages to: /home/yaboo/pig_1308753932779.log
2011-06-22 23:45:32,849 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///

grunt> A = load '/etc/passwd' using PigStorage(':'); # read /etc/password and sep is ':'

grunt> B = foreach A generate $0 as id; # get first column

grunt> dump B; # output result to stdout

2011-06-22 23:46:40,493 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2011-06-22 23:46:40,493 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-06-22 23:46:47,276 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2011-06-22 23:46:47,277 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1

grunt>store B into 'id.out'; # stored B into id.out directory

how to make jar file for custom Hive UDF

I tried to make my original Hive User Defined Function (Hive UDF). But it was bit difficult for me to compile java file correctly, because I was a beginner of JAVA and did not know much about JAVA compile options. For compiling JAVA file and making JAR file for Hive UDF, I implemented a shell script which is almost totally copied from here 8^)

script for compiling JAVA file and making JAR file

This code is written by sh script. If you want to use bash version, you can get source code from here


if [ $# -ne 1 ]; then
echo "Usage: $0 "
exit 1

CLASSPATH=$(ls $HIVE_HOME/lib/hive-serde-*.jar):$(ls $HIVE_HOME/lib/hive-exec-*.jar):$(ls $HADOOP_HOME/hadoop-core-*.jar)
echo "javac -classpath $CLASSPATH -d $JARDIR/ $1 && jar -cf $JARNAME -C $JARDIR/ . && tell $1"

tell() {
echo "$1 successfully compiled."

mkdir -p $JARDIR
javac -classpath $CLASSPATH -d $JARDIR/ $1 && jar -cf $JARNAME -C $JARDIR/ . && tell $1

custom UDF

Here, I show custom UDF named Lower. This code is copied from official Hive Wiki

package com.example.hive.udf;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public final class Lower extends UDF {
public Text evaluate(final Text s) {
if (s == null) {
return null;
return new Text (s.toString().toLowerCase());

Usage of script

> sh compile.sh Lower.java
Lower.java successfully compiled.
> ls Lower*
Lower.jar Lower.java Lower

calculate cosine with python numpy


Calculate "cosine" determined by pair of vectors using python and its package named numpy. Firstly I show you the definition of cosine in linear space, and Secondly I share sample python code for calculating cosine.

definition of cosine in linear space

python code for calculating cosine

import numpy

def get_cosine(v1, v2):
""" calculate cosine and returns cosine """
n1 = get_norm_of_vector(v1)
n2 = get_norm_of_vector(v2)
ip = get_inner_product(v1, v2)
return ip / (n1 * n2)

def get_inner_product(v1, v2):
""" calculate inner product """
return numpy.dot(v1, v2)

def get_norm_of_vector(v):
""" calculate norm of vector """
return numpy.linalg.norm(v)

def get_radian_from_cosine(cos):
return numpy.arccos(cos)

def get_degrees_from_radian(cos):
return numpy.degrees(cos)

def main():
v1 = numpy.array([1, 0])
v2 = numpy.array([1, numpy.sqrt(3)])
cosine = get_cosine(v1, v2)
radian = get_radian_from_cosine(cosine)
print get_degrees_from_radian(radian)

if __name__ == "__main__":


hadoop-streaming: inner join

hadoop-streaming HOWTO
step1. implement mapper and reducer
step2. chmod a+x <mapper/reducer>
step3. execute like below:
> bin/hadoop jar contrib/streaming/hadoop-*-streaming.jar -file /home/hadoop/mapper.py -mapper /home/hadoop/mapper.py -file /home/hadoop/reducer.py -reducer /home/hadoop/reducer.py -input inputdir/* -output outputdir

#!/usr/bin/env python                                                          

import sys
import itertools

def read_input(file, separator=','):
    for line in file:
        yield line.strip().split(separator)

def main():
    rows = read_input(sys.stdin)

    for row in rows:
        if row[0] != None:
            mapper_output = get_convert_func(row[0])

def get_convert_func(fid):
    """ return function for convert each data """

    if fid == '001':
        return convert001
    elif fid == '002':
        return convert002
        return None

def convert001(row, separator='\t'):
    if len(row) != 6:
        return None

    fid = row[0]
    k1 = row[1]
    k2 = row[2]
    k3 = row[3]
    k4 = row[4]
    v1 = row[5]

    K = get_key_or_value_string(k1, k2, k3, k4)
    V = get_key_or_value_string(fid, v1)
    print "%s%s%s" % (K, separator, V)

def convert002(row, separator='\t'):
    if len(row) != 7:
        return None

    fid = row[0]
    k1 = row[1]
    k2 = row[2]
    k3 = row[3]
    k4 = row[4]
    v1 = row[5]
    v2 = row[6]

    K = get_key_or_value_string(k1, k2, k3, k4)
    V = get_key_or_value_string(fid, v1, v2)
    print "%s%s%s" % (K, separator, V)

def get_key_or_value_string(*args):
    return (',').join(args)

if __name__ == "__main__":

#!/usr/bin/env python

import sys
from itertools import groupby
from operator import itemgetter

def read_mapper_output(file, separator='\t'):
    for row in file:
        yield row.strip().split(separator, 1)

def main(separator='\t'):
    data = read_mapper_output(sys.stdin, separator=separator)

    for K, group in groupby(data, itemgetter(0)):
        L = ['-1', '-1', '-1', '-1', '-1', '-1', '-1']
        L[0], L[1], L[2], L[3] = get_columns_from_key(K, 4, ',')

        for K, V in group:
            columns = V.split(',')
            if columns[0] == '001':
                L[4] = columns[1] 
                L[5] = columns[2]
            elif columns[0] == '002':
                L[6] = columns[1]

        if is_healthy_output(L):
            print get_string_from_list(L)

def is_healthy_output(row):
    """ return True if output row does not have '-1' column """
    if [c for c in row if c == '-1'] == []:
        return True
        return False

def get_string_from_list(L):
    return (',').join(L)

def get_columns_from_key(key, length, separator=','):
    L = key.split(',')
    if len(L) != length:
        return None
        return L

if __name__ == "__main__":

Installing emacs-w3m in emacs 23.1.1

step.1 install w3m-el in my emacs23.1

> cvs -d :pserver:anonymous@cvs.namazu.org:/storage/cvsroot login
> cvs -d :pserver:anonymous@cvs.namazu.org:/storage/cvsroot co emacs-w3m
> cd emacs-w3m
> autoconf
> ./configure
> make
> sudo make install

step.2 configure ~/.emacs.d/init.el
add below line in the end of you init.el or .emacs
(require 'w3m-load)


postgres tips: import/export CSV file

skip header and import csvfile to table_name
COPY table_name FROM '/abspath/to/csvfile' WITH CSV HEADER

export query result to csvfile with header
COPY (SELECT foo,bar FROM whatever) TO /abspath/to/csvfile' WITH CSV HEADER


Automatic detection of japanese character encoding with python

> sudo aptitude install python-dev
> sudo easy_install pykf

def get_file_encode(input_path):
    """ get japanese encoding information from file using pykf """
    encode = None

    enc_ja = [pykf.EUC, pykf.SJIS, pykf.UTF8, pykf.JIS]
    edic = {pykf.UNKNOWN:None, pykf.ASCII:'ASCII', pykf.SJIS:'SHIFT-JIS',
           pykf.EUC:'EUC-JP', pykf.JIS:'ISO-2022-JP', pykf.UTF8:'UTF-8',
           pykf.UTF16:'utf-16', pykf.UTF16_BE:'utf-16_be',pykf.ERROR:None}

    input_file = open(input_path)
    for line in input_file:
        c = pykf.guess(line)
        if [e for e in enc_ja if e == c] != []:
            encode = enc[c]

    return encode


Setup japanese LaTeX environment on Ubuntu Lucid

I installed texlive in Ubuntu 10.04LTS according to this site ;-)

Step1. install dvipsk-ja (default package is broken at 2011.06.11 JST)
> sudo add-apt-repository ppa:cosmos-door/dvipsk-ja
> sudo aptitude update
> sudo aptitude upgrade
> sudo aptitude install dvipsk-ja

Step2. install packages
> sudo aptitude install texlive texlive-math-extra texlive-latex-extra texlive-latex-extra-doc texlive-fonts-extra texlive-fonts-extra-doc texlive-fonts-recommended texlive-fonts-recommended-doc texlive-formats-extra texlive-latex-recommended texlive-latex-recommended texlive-extra-utils texlive-font-utils texlive-doc-ja ptex-bin jbibtex-bin mendexk okumura-clsfiles latex-cjk-japanese cmap-adobe-japan1 cmap-adobe-japan2 cmap-adobe-cns1 cmap-adobe-gb1 gs-cjk-resource ghostscript xdvik-ja dvi2ps dvi2ps-fontdesc-morisawa5 jmpost latexmk latex-mk pybliographer yatex

Step3. update latex environment
> updmap
> sudo mktexlsr
> sudo updmap-sys
> sudo dpkg-reconfigure ptex-jisfonts
> sudo jisftconfig add

Test. Can I make japanese document with texlive?

Firstly, I prepare for a sample latex document encoded by EUC-JP.



> platex ex1.tex

This is pTeXk, Version 3.141592-p3.1.11 (euc) (Web2C 7.5.4)
 %&-line parsing enabled.
pLaTeX2e <2006/11/10>+0 (based on LaTeX2e <2009/09/24> patch level 0)
Document Class: jarticle 2006/06/27 v1.6 Standard pLaTeX class
(/usr/share/texmf/ptex/platex/base/jsize10.clo)) (./ex1.aux) [1] (./ex1.aux) )
(see the transcript file for additional information)
Output written on ex1.dvi (1 page, 268 bytes).
Transcript written on ex1.log.

> xdvi ex1.dvi

[Japanese characters are displayed correctly :-)]

> dvipdfmx ex1.dvi

ex1.dvi -> ex1.pdf

** WARNING ** Failed to load AGL file "pdfglyphlist.txt"...
** WARNING ** Failed to load AGL file "glyphlist.txt"...
** ERROR ** Could not find encoding file "H".

Output file removed.

> sudo vi /usr/share/texmf/web2c/texmf.cnf
% CMap files.
CMAPFONTS = .;$TEXMF/fonts/cmap//;/usr/share/ghostscript/CMap

> dvipdfmx ex1.dvi
ex1.dvi -> ex1.pdf

** WARNING ** Failed to load AGL file "pdfglyphlist.txt"...
** WARNING ** Failed to load AGL file "glyphlist.txt"...
2248 bytes written

> acroread ex1.pdf

[Japanese characters are displayed correctly :-)]

Private method HOWTO in python

# -*- coding: utf-8 -*-                                                       

import os
import mycls

def main():
    henley = mycls.ITWorkers('PHP', 'Henley', 'web creator', 'henley@livegate.com', 32, 700)

    henley.OS = 'Mac'
    '_' はPEP8にて外部公開していないメソッドの                   
    '__' は外部から呼び出すことができない                        

if __name__ == "__main__":

# -*- coding: utf-8 -*-                                                       

import os

class Workers:
    """ This is a class of workers working in the company """
    def __init__(self, name, position, email, age, salary):
        self.name = name
        self.position = position
    self.email = email
        self.age = age
        self.salary = salary

class ITWorkers(Workers):
    """ This is a class of IT engineers. """
    OS = 'WinNT'

    def __init__(self, language, *av):
        Workers.__init__(self, *av)
        self.language = language

    def _printme(self, name):
    print 'my name is %s.' % (name)

    def __printme(self, name):
    print 'my name is %s.' % name

    def work(self, n):
    """ IT engineers should work. """

    if self.position == 'web creator':
            w = 'makes web site'
    elif self.position == 'server administrator':
            w = 'checks the traffic'
    elif self.position == 'programmer':
            w = 'write program'

    print '%s, %s for %d, hours using %s on %s' % (self.name, w, n, self.language, self.OS)

> python main.py
Henley, makes web site for 8, hours using PHP on Mac
my name is Henley.
my name is Henley.
my name is test.
Traceback (most recent call last):
  File "main.py", line 29, in <module>
  File "main.py", line 25, in main
AttributeError: ITWorkers instance has no attribute '__printme'


Join multiple dictionary and flatten joined list in Python2

import random
import csv

def flatten(L):
if isinstance(L, list):
return reduce(lambda a,b: a + flatten(b), L, [])
return [L]

def main():
dict1 = {}
for i in range(0,100,2):
dict1[i] = [random.random(),random.random()]
dict2 = {}
for i in range(0,100,3):
dict2[i] = [random.random(),random.random()]
dict3 = {}
for i in range(0,100,5):
dict3[i] = [random.random(),random.random()]

f = open("./test.csv", "a")
writer = csv.writer(f)

for k1,v1 in dict1.items():
if k1 not in dict2:
if k1 not in dict3:
v2 = dict2[k1]
v3 = dict3[k1]

l = []

if __name__ == "__main__":


Hello, 8GB memory

I purchased 8GB memory produced by SanMax and attached it to my workstation.
Now, my workstation has 16GB physical memory!!


import CSV file into sqlitedb from python

Here I show you how to operate sqlitedb via python.

import sqlite3

def line_generator(input_path):
    _file = open(input_path, "r")
    for _line in _file:
        yield _line.strip().split(',')

def main():
    con = connect_db()

def connect_db():
    con = sqlite3.connect(":memory:")
    return con

def disconnect_db(con):

def create_table(con):
    con.execute("create table test(id int, name text);")

def drop_table(con):
    con.execute("create table test(id int, name text);")

def import_table(con):
    iterator = line_generator("/home/yaboo/test1.csv")
    for i in iterator:
        con.execute("insert into test values (?, ?)", i)

def export_table(con):
    f = open("/home/yaboo/output_test1.csv", "w")
    writer = csv.writer(f)
    for row in con.execute("SELECT * FROM test;"):
        writer.writerow([col.encode('utf-8') if isinstance(col, unicode) else col for col in row])

def select_table(con)
    cur = con.cursor()
    cur.execute("select * from test;")
    for line in cur: print line

if __name__ == "__main__":


Ubuntu boot time on my workstation

Surprisingly, boot time of ubuntu10.04 is shorter than 10sec!
Below show you time line of boot processes


How to calculate boot time
  1. # aptitude install bootchart
  2. # aptitude install pybootchartgui
  3. # reboot
  4. display /var/log/bootchart/[machine name]-[ubuntu version]-[date]-[index].png 
  5. # aptitude remove bootchart pybootchartgui
very fast :-)

Hello, ess (1st contact)

I tried to use emacs + ess (Emacs Speaks Statistics) to program R code quickly.

  1. aptitude install ess
  2. echo "(require 'ess-site)" >> ~/.emacs.d/init.el
  1. boot emacs
  2. M-x 2 => divide emacs buffer
  3. M-x R => boot R-console
  • C-c C-j: execute current line
  • C-c M-j: execute current line and move to end of R-console
  • C-c C-b: execute current buffer
  • C-c M-b: execute current buffer and move to end of R-console
  • C-c C-r: execute current region
  • C-c M-r: execute current region and move to end of R-console

ESS COMMAND REFERENCE (more ess command is introduced in RjpWiki.)
Command of ess (by RjpWiki)


my suit case is middle-bottom, embossed black

Hello, keyboard (2011.06.05)

I bought HHKB pro2 for my workstation.

Happy Hacking Keyboard Professional 2

Needless to say,
this keyboard is fucking fabulous!!