пятница, 11 января 2013 г.

Quick recipe how to launch simple daemonized web service using python

Writing linux daemon in C or C++  could be done easily. It is a native daemon creating approach for unix. What can we do if we need to use a python script like a daemon? There are several solutions how to daemonize your python code. In this post I just want to mention a simple and convenient pattern which I usually use. It is based on the linux deamon template written by Sander Marechal (2.X 3.X). This is a pretty good code, I integrated it into several prototypes and they work perfectly. It contains simple to use Daemon class which python daemon developer should inherit.

вторник, 8 января 2013 г.

Supervised Learning Model for lemmatization and thoughts about compression.

The lemmatizer is dependent on training data quality. It can  be generally used for any natural language except the Porter Stemmer part (current naive implementation supports Russian language only).

Code and functionality description:

The model is represented by LemmatizerModel java class.
Entry point and consuming code stored in Main.java
TrainDataParser ‘s purpose is to open and parse training data.

MODEL DESIGN

In order to make it fast to search through the trained data and to store the model in a packed data structure, the model is represented as a prefix tree, where word forms are represented by paths from the root, each node contains the value (Character) and two kinds of children nodes - the next characters and one special which contains a lemmas list for current tree layer.


In this picture you can see an example of the prefix tree with lemmas stored