yablog: 2011/06

2011/06/23

how to paste latex expression to blogger

Step1. install firefox add-on named greasemonkey
Step2. reboot firefox
Step3. install script from userscripts.org (http://userscripts.org/scripts/show/41481)

$$x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}$$

above equation is converted like below :-)

postgresDB operation with python

ラベル: postgres, python, tips

below code describes how to check if 'table' is exist in database or not using pg package. dirty code ... :-(


import pg
import sys

def is_table_exist1(dbobj, table):
    """ return True if table is exist in database """
    print "is_table_exist1 called"
    try:
        for i in dbobj.get_tables():
            if i == table:
                return True
    except IndexError, (errno):
        print "%s" %(errno)
        return False
    return False

def is_table_exist2(dbobj, table):
    """ return True if table is exist in database """
    print "is_table_exist2 called"
    try:
        print dbobj.get_attnames(table)
        return True
    except pg.ProgrammingError, (errno):
        print "%s" %(errno)
        return False

def main():
    # set default hostname                                                      
    pg.set_defhost('localhost')
    defhost = pg.get_defhost()

    # set default port number                                                   
    pg.set_defport(5432)
    defport = pg.get_defport()

    # set default database                                                      
    pg.set_defbase('testdb')
    defbase = pg.get_defbase()

    # initialize pg.DB object                                                   
    mydb = pg.DB(user='postgres', passwd='postgres')

    # print names of all databases                                              
    dblist = mydb.get_databases()

    # check if table is exist in database                                       
    if is_table_exist1(mydb, 'public.foo') == False:
        mydb.close()
        sys.exit(1)

    # check if table is exist in database                                       
    if is_table_exist2(mydb, 'public.foo') == False:
        mydb.close()
        sys.exit(1)

    for r in mydb.query(
        "SELECT v1, v2, v3, v4 FROM %s" % ('public.foo')
        ).dictresult():
        print '%(v1)s %(v2)s %(v3)s %(v4)s' % r

    #close dbobj                                                                
    mydb.close()

if __name__ == "__main__":
    main()

memo: set().intersection

check this later.

2011/06/22

hello, pig

ラベル: hadoop, tips

This is my first time to touch pig which is one of hadoop-related project.

yaboo@maniac:~$ pig -x local

2011-06-22 23:45:32,781 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/yaboo/pig_1308753932779.log
2011-06-22 23:45:32,849 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///

grunt> A = load '/etc/passwd' using PigStorage(':'); # read /etc/password and sep is ':'

grunt> B = foreach A generate $0 as id; # get first column

grunt> dump B; # output result to stdout
 
2011-06-22 23:46:40,493 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2011-06-22 23:46:40,493 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
...
2011-06-22 23:46:47,276 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2011-06-22 23:46:47,277 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(root)
(daemon)
(bin)
(sys)
(sync)
(games)
(man)
(lp)
(mail)
(news)
(uucp)
(proxy)
(www-data)
(backup)
(list)
(irc)
(gnats)
(nobody)
(libuuid)
(syslog)
(messagebus)
(avahi-autoipd)
(avahi)
(couchdb)
(speech-dispatcher)
(usbmux)
(haldaemon)
(kernoops)
(pulse)
(rtkit)
(saned)
(hplip)
(gdm)
(yaboo)
(sshd)
(hadoop)

grunt>store B into 'id.out'; # stored B into id.out directory

bumblebee issue

ラベル: 8^)

Nice commit!!

https://github.com/MrMEEE/bumblebee/commit/a047be85247755cdbe0acce6

2011/06/19

how to make jar file for custom Hive UDF

ラベル: hadoop, java, tips

I tried to make my original Hive User Defined Function (Hive UDF). But it was bit difficult for me to compile java file correctly, because I was a beginner of JAVA and did not know much about JAVA compile options. For compiling JAVA file and making JAR file for Hive UDF, I implemented a shell script which is almost totally copied from here 8^)

script for compiling JAVA file and making JAR file

This code is written by sh script. If you want to use bash version, you can get source code from here


#!/bin/sh

if [ $# -ne 1 ]; then
    echo "Usage: $0 "
    exit 1
fi

CNAME=${1%.java}
JARNAME=$CNAME.jar
JARDIR=/home/yaboo/$CNAME
CLASSPATH=$(ls $HIVE_HOME/lib/hive-serde-*.jar):$(ls $HIVE_HOME/lib/hive-exec-*.jar):$(ls $HADOOP_HOME/hadoop-core-*.jar)
echo "javac -classpath $CLASSPATH -d $JARDIR/ $1 && jar -cf $JARNAME -C $JARDIR/ . && tell $1"

tell() {
    echo
    echo "$1 successfully compiled."
    echo
}

mkdir -p $JARDIR
javac -classpath $CLASSPATH -d $JARDIR/ $1 && jar -cf $JARNAME -C $JARDIR/ . && tell $1

custom UDF

Here, I show custom UDF named Lower. This code is copied from official Hive Wiki


package com.example.hive.udf;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public final class Lower extends UDF {
    public Text evaluate(final Text s) {
        if (s == null) {
            return null;
        }
        return new Text (s.toString().toLowerCase());
    }
}

Usage of script

> sh compile.sh Lower.java
Lower.java successfully compiled.
> ls Lower*
Lower.jar Lower.java Lower

calculate cosine with python numpy

ラベル: math, python, tips

purpose

Calculate "cosine" determined by pair of vectors using python and its package named numpy. Firstly I show you the definition of cosine in linear space, and Secondly I share sample python code for calculating cosine.

definition of cosine in linear space

python code for calculating cosine

import numpy

def get_cosine(v1, v2):
    """ calculate cosine and returns cosine """
    n1 = get_norm_of_vector(v1)
    n2 = get_norm_of_vector(v2)
    ip = get_inner_product(v1, v2)
    return ip / (n1 * n2)

def get_inner_product(v1, v2):
    """ calculate inner product """
    return numpy.dot(v1, v2)

def get_norm_of_vector(v):
    """ calculate norm of vector """
    return numpy.linalg.norm(v)

def get_radian_from_cosine(cos):
    return numpy.arccos(cos)

def get_degrees_from_radian(cos):
    return numpy.degrees(cos)

def main():
    v1 = numpy.array([1, 0])
    v2 = numpy.array([1, numpy.sqrt(3)])
    cosine = get_cosine(v1, v2)
    radian = get_radian_from_cosine(cosine)
    print get_degrees_from_radian(radian)

if __name__ == "__main__":
    main()

2011/06/17

hadoop-streaming: inner join

ラベル: hadoop, python, tips

hadoop-streaming HOWTO
step1. implement mapper and reducer
step2. chmod a+x <mapper/reducer>
step3. execute like below:
> bin/hadoop jar contrib/streaming/hadoop-*-streaming.jar -file /home/hadoop/mapper.py -mapper /home/hadoop/mapper.py -file /home/hadoop/reducer.py -reducer /home/hadoop/reducer.py -input inputdir/* -output outputdir

mapper.py

#!/usr/bin/env python                                                          

import sys
import itertools

def read_input(file, separator=','):
    for line in file:
        yield line.strip().split(separator)

def main():
    rows = read_input(sys.stdin)

    for row in rows:
        if row[0] != None:
            mapper_output = get_convert_func(row[0])
            mapper_output(row)

def get_convert_func(fid):
    """ return function for convert each data """

    if fid == '001':
        return convert001
    elif fid == '002':
        return convert002
    else:
        return None

def convert001(row, separator='\t'):
    if len(row) != 6:
        return None

    fid = row[0]
    k1 = row[1]
    k2 = row[2]
    k3 = row[3]
    k4 = row[4]
    v1 = row[5]

    K = get_key_or_value_string(k1, k2, k3, k4)
    V = get_key_or_value_string(fid, v1)
    print "%s%s%s" % (K, separator, V)

def convert002(row, separator='\t'):
    if len(row) != 7:
        return None

    fid = row[0]
    k1 = row[1]
    k2 = row[2]
    k3 = row[3]
    k4 = row[4]
    v1 = row[5]
    v2 = row[6]

    K = get_key_or_value_string(k1, k2, k3, k4)
    V = get_key_or_value_string(fid, v1, v2)
    print "%s%s%s" % (K, separator, V)


def get_key_or_value_string(*args):
    return (',').join(args)


if __name__ == "__main__":
    main()

reducer.py

#!/usr/bin/env python

import sys
from itertools import groupby
from operator import itemgetter

def read_mapper_output(file, separator='\t'):
    for row in file:
        yield row.strip().split(separator, 1)

def main(separator='\t'):
    data = read_mapper_output(sys.stdin, separator=separator)

    for K, group in groupby(data, itemgetter(0)):
        L = ['-1', '-1', '-1', '-1', '-1', '-1', '-1']
        L[0], L[1], L[2], L[3] = get_columns_from_key(K, 4, ',')

        for K, V in group:
            columns = V.split(',')
            if columns[0] == '001':
                L[4] = columns[1] 
                L[5] = columns[2]
            elif columns[0] == '002':
                L[6] = columns[1]
            else:
                pass

        if is_healthy_output(L):
            print get_string_from_list(L)

def is_healthy_output(row):
    """ return True if output row does not have '-1' column """
    if [c for c in row if c == '-1'] == []:
        return True
    else:
        return False

def get_string_from_list(L):
    return (',').join(L)

def get_columns_from_key(key, length, separator=','):
    L = key.split(',')
    if len(L) != length:
        return None
    else:
        return L

if __name__ == "__main__":
    main()

Installing emacs-w3m in emacs 23.1.1

ラベル: emacs

step.1 install w3m-el in my emacs23.1

> cvs -d :pserver:anonymous@cvs.namazu.org:/storage/cvsroot login
> cvs -d :pserver:anonymous@cvs.namazu.org:/storage/cvsroot co emacs-w3m
> cd emacs-w3m
> autoconf
> ./configure
> make
> sudo make install

step.2 configure ~/.emacs.d/init.el
add below line in the end of you init.el or .emacs
(require 'w3m-load)

2011/06/15

postgres tips: import/export CSV file

ラベル: postgres, tips

skip header and import csvfile to table_name

COPY table_name FROM '/abspath/to/csvfile' WITH CSV HEADER

export query result to csvfile with header

COPY (SELECT foo,bar FROM whatever) TO /abspath/to/csvfile' WITH CSV HEADER

2011/06/13

Automatic detection of japanese character encoding with python

ラベル: python, tips

> sudo aptitude install python-dev
> sudo easy_install pykf

def get_file_encode(input_path):
    """ get japanese encoding information from file using pykf """
    encode = None

    enc_ja = [pykf.EUC, pykf.SJIS, pykf.UTF8, pykf.JIS]
    edic = {pykf.UNKNOWN:None, pykf.ASCII:'ASCII', pykf.SJIS:'SHIFT-JIS',
           pykf.EUC:'EUC-JP', pykf.JIS:'ISO-2022-JP', pykf.UTF8:'UTF-8',
           pykf.UTF16:'utf-16', pykf.UTF16_BE:'utf-16_be',pykf.ERROR:None}

    input_file = open(input_path)
    for line in input_file:
        c = pykf.guess(line)
        if [e for e in enc_ja if e == c] != []:
            encode = enc[c]
            break

    input_file.close()
    return encode

2011/06/11

Setup japanese LaTeX environment on Ubuntu Lucid

ラベル: latex, tips, ubuntu

I installed texlive in Ubuntu 10.04LTS according to this site ;-)

Step1. install dvipsk-ja (default package is broken at 2011.06.11 JST)
> sudo add-apt-repository ppa:cosmos-door/dvipsk-ja
> sudo aptitude update
> sudo aptitude upgrade
> sudo aptitude install dvipsk-ja

Step2. install packages
> sudo aptitude install texlive texlive-math-extra texlive-latex-extra texlive-latex-extra-doc texlive-fonts-extra texlive-fonts-extra-doc texlive-fonts-recommended texlive-fonts-recommended-doc texlive-formats-extra texlive-latex-recommended texlive-latex-recommended texlive-extra-utils texlive-font-utils texlive-doc-ja ptex-bin jbibtex-bin mendexk okumura-clsfiles latex-cjk-japanese cmap-adobe-japan1 cmap-adobe-japan2 cmap-adobe-cns1 cmap-adobe-gb1 gs-cjk-resource ghostscript xdvik-ja dvi2ps dvi2ps-fontdesc-morisawa5 jmpost latexmk latex-mk pybliographer yatex

Step3. update latex environment
> updmap
> sudo mktexlsr
> sudo updmap-sys
> sudo dpkg-reconfigure ptex-jisfonts
> sudo jisftconfig add

Test. Can I make japanese document with texlive?

Firstly, I prepare for a sample latex document encoded by EUC-JP.

----------------------------------------------

[ex1.tex]

\documentclass{jarticle}

\begin{document}

ちょっとチェック

\end{document}

----------------------------------------------

> platex ex1.tex

This is pTeXk, Version 3.141592-p3.1.11 (euc) (Web2C 7.5.4)

%&-line parsing enabled.

(./ex1.tex

pLaTeX2e <2006/11/10>+0 (based on LaTeX2e <2009/09/24> patch level 0)

(/usr/share/texmf/ptex/platex/base/jarticle.cls

Document Class: jarticle 2006/06/27 v1.6 Standard pLaTeX class

(/usr/share/texmf/ptex/platex/base/jsize10.clo)) (./ex1.aux) [1] (./ex1.aux) )

(see the transcript file for additional information)

Output written on ex1.dvi (1 page, 268 bytes).

Transcript written on ex1.log.

> xdvi ex1.dvi

[Japanese characters are displayed correctly :-)]

> dvipdfmx ex1.dvi

ex1.dvi -> ex1.pdf

** WARNING ** Failed to load AGL file "pdfglyphlist.txt"...

** WARNING ** Failed to load AGL file "glyphlist.txt"...

** ERROR ** Could not find encoding file "H".

Output file removed.

> sudo vi /usr/share/texmf/web2c/texmf.cnf

% CMap files.

CMAPFONTS = .;$TEXMF/fonts/cmap//;/usr/share/ghostscript/CMap

> dvipdfmx ex1.dvi

ex1.dvi -> ex1.pdf

** WARNING ** Failed to load AGL file "pdfglyphlist.txt"...

** WARNING ** Failed to load AGL file "glyphlist.txt"...

[1]

2248 bytes written

> acroread ex1.pdf

[Japanese characters are displayed correctly :-)]

Private method HOWTO in python

ラベル: python, tips

main.py

# -*- coding: utf-8 -*-                                                       

import os
import mycls


def main():
    henley = mycls.ITWorkers('PHP', 'Henley', 'web creator', 'henley@livegate.com', 32, 700)

    henley.OS = 'Mac'
    henley.work(8)
    """                                                                       
    '_' はPEP8にて外部公開していないメソッドの                   
    命名規則として用いることが推奨されている。だが、外部から                  
    呼び出すことは可能                                                        
    """
    henley._printme('test')
    """                                                                       
    '__' は外部から呼び出すことができない                        
    ただ、この使用方法はPEP8に準拠していない                                  
    """
    henley.__printme('test')


if __name__ == "__main__":
    main()

mycls.py

# -*- coding: utf-8 -*-                                                       

import os


class Workers:
    """ This is a class of workers working in the company """
    def __init__(self, name, position, email, age, salary):
        self.name = name
        self.position = position
    self.email = email
        self.age = age
        self.salary = salary


class ITWorkers(Workers):
    """ This is a class of IT engineers. """
    OS = 'WinNT'

    def __init__(self, language, *av):
        Workers.__init__(self, *av)
        self.language = language

    def _printme(self, name):
    print 'my name is %s.' % (name)

    def __printme(self, name):
    print 'my name is %s.' % name

    def work(self, n):
    """ IT engineers should work. """

    if self.position == 'web creator':
            w = 'makes web site'
    elif self.position == 'server administrator':
            w = 'checks the traffic'
    elif self.position == 'programmer':
            w = 'write program'

    print '%s, %s for %d, hours using %s on %s' % (self.name, w, n, self.language, self.OS)
    self._printme(self.name)
    self.__printme(self.name)

Test....
> python main.py
Henley, makes web site for 8, hours using PHP on Mac
my name is Henley.
my name is Henley.
my name is test.
Traceback (most recent call last):
File "main.py", line 29, in <module>
main()
File "main.py", line 25, in main
henley.__printme('test')
AttributeError: ITWorkers instance has no attribute '__printme'

2011/06/10

Check this site ...

http://jutememo.blogspot.com/2008/09/python-map-filter-reduce.html

Join multiple dictionary and flatten joined list in Python2

ラベル: python, tips

import random
import csv

def flatten(L):
    if isinstance(L, list):
        return reduce(lambda a,b: a + flatten(b), L, [])
    else:
        return [L]

def main():
    dict1 = {}
    for i in range(0,100,2):
        dict1[i] = [random.random(),random.random()]
    dict2 = {}
    for i in range(0,100,3):
        dict2[i] = [random.random(),random.random()]
    dict3 = {}
    for i in range(0,100,5):
        dict3[i] = [random.random(),random.random()]

    f = open("./test.csv", "a")
    writer = csv.writer(f)

    for k1,v1 in dict1.items():
    if k1 not in dict2:
            continue
    if k1 not in dict3:
            continue
    v2 = dict2[k1]
    v3 = dict3[k1]

    l = []
    l.append(k1)
    l.append(v1)
    l.append(v2)
    l.append(v3)
    writer.writerow(flatten(l))

if __name__ == "__main__":
    main()

2011/06/09

Hello, 8GB memory

ラベル: equipment

I purchased 8GB memory produced by SanMax and attached it to my workstation.
Now, my workstation has 16GB physical memory!!

2011/06/08

import CSV file into sqlitedb from python

ラベル: python, sqlitedb, tips

Here I show you how to operate sqlitedb via python.

import sqlite3

def line_generator(input_path):
    _file = open(input_path, "r")
    for _line in _file:
        yield _line.strip().split(',')

def main():
    con = connect_db()
    create_table(con)
    import_table(con)
    export_table(con)
    select_table(con)
    disconnect_db(con)

def connect_db():
    con = sqlite3.connect(":memory:")
    return con

def disconnect_db(con):
    con.close()

def create_table(con):
    con.execute("create table test(id int, name text);")

def drop_table(con):
    con.execute("create table test(id int, name text);")

def import_table(con):
    iterator = line_generator("/home/yaboo/test1.csv")
    for i in iterator:
        con.execute("insert into test values (?, ?)", i)

def export_table(con):
    f = open("/home/yaboo/output_test1.csv", "w")
    writer = csv.writer(f)
    for row in con.execute("SELECT * FROM test;"):
        writer.writerow([col.encode('utf-8') if isinstance(col, unicode) else col for col in row])

def select_table(con)
    cur = con.cursor()
    cur.execute("select * from test;")
    for line in cur: print line

if __name__ == "__main__":
    main()

2011/06/05

Ubuntu boot time on my workstation

ラベル: equipment, tips

Surprisingly, boot time of ubuntu10.04 is shorter than 10sec!
Below show you time line of boot processes

/var/log/bootchart/maniac-lucid-20110605-1.png

How to calculate boot time

# aptitude install bootchart
# aptitude install pybootchartgui
# reboot
display /var/log/bootchart/[machine name]-[ubuntu version]-[date]-[index].png
# aptitude remove bootchart pybootchartgui

very fast :-)

Hello, ess (1st contact)

ラベル: emacs, R, tips

I tried to use emacs + ess (Emacs Speaks Statistics) to program R code quickly.

INSTALL

aptitude install ess
echo "(require 'ess-site)" >> ~/.emacs.d/init.el

USAGE

boot emacs
M-x 2 => divide emacs buffer
M-x R => boot R-console

SAMPLE ESS COMMANDS

C-c C-j: execute current line
C-c M-j: execute current line and move to end of R-console
C-c C-b: execute current buffer
C-c M-b: execute current buffer and move to end of R-console
C-c C-r: execute current region
C-c M-r: execute current region and move to end of R-console

ESS COMMAND REFERENCE (more ess command is introduced in RjpWiki.)

Command of ess (by RjpWiki)

2011/06/04

Hello, suit case (2011.06.04)

ラベル: equipment

I plan to go Hawaii in the end of June 2011.
For this trip, I bought big suit case (100L).

my suit case is middle-bottom, embossed black

Hello, keyboard (2011.06.05)

ラベル: equipment

I bought HHKB pro2 for my workstation.

Happy Hacking Keyboard Professional 2

Needless to say,
this keyboard is fucking fabulous!!