User login |
Greek stemmer classAfter heavy googling I found a greek stemmer which is the product of the Master Thesis of Georgios Ntais at Royal Institute of Technology [KTH] (Stockholm, Sweden) supervised by assoc. professor Hercules Dalianis. The stemmer is implemented in javascript and you can find it online here. You can download the php port, published under GNU license. You can check out the first demo page. It is a php4 compatible class. The usage is really simple just create an instance of the class and call stem_word method. The input for stem_word must be in upper case. require("GreekStemmer.class.php"); $st = new GreekStemmer(); echo $st->stem_word('????????'); I hope the companion class is (i hope) stable. Greek_text class contains some helpful static methods for handling greek text:
More detailed documentation is coming soon... Links |
So I believe the system
So I believe the system takes as input a word and removes its inflexional suffix according to a rule based algorithm and then the algorithm follows the known Porter algorithm for the English language and it is developed according to the grammatical rules of the Modern Greek language? Am I understanding this correctly?
A. D Singleton - adrikl[@]gmail.com
Language Translator - Millionaire Mind Book Europe
true Panos Kyriakakis Owner
true
Panos Kyriakakis
Owner of Salix.gr
Larissa, Greece
Nicely done, while it has
EDIT: changed to greeklish because greek is not supported (turns to ??????)
Nicely done, while it has some flaws, for the most part it works well.
What would be the correct stem for these though? Shouldn't they both be the same?
cheking diminutives TEMAXIO -> step 6-2 TEMAXI
cheking diminutives TEMAXIA -> step 3 TEMAX
Also, there is a bug in step 4, the regex is:
$re = '/'.$v.'$/';but $v doesn't exist, maybe it should be
$re = '/'.$this->v.'$/';Because otherwise the conditional will always match, however in that case the following doesn't stem properly:
APSENIKOS -> ARSENIK
APSENIKA -> ARSEN
Finally - It would be better to convert this to snowball (http://snowball.tartarus.org/) rather than use regexs all over the place. Also a good suite of tests is quite necessary.
splitWords function a
splitWords function a replacement for php's str_word_count function that works with any locale setting (that one holds me
congstar prepaid
Greek stemmer and Lucene
@basos I would be very interested in any adaptations of a greek stemmer to be used with Lucene, could you publish your code?
I am happy!
I am so happy, first time that happens to me :D
Someone take a code from here, improve it and share it back here with us.
Baso, could you give us a geolocation information about you?
This class is the result of the work of greeks not just in greece ;)
Best Regards
Panos Kyriakakis
Owner of Salix.gr
Larissa, Greece
Much appreciation for this
Much appreciation for this article. Very riveting and accurately composed blog post. I will return in the near future.
nikon d3s
The usage is really simple
The usage is really simple just create an instance of the class and call stem_word method.
Excessive Sweating()How to Stop Excessive Sweating
New version
Hello,
i want to say that you did a nice work, it is very important to find out the port of greek stemmer for PHP.
I modified it a little to use the PHP5 OO model, since PHP4 is considered ancient already,
and to use utf8 encoding. It is very important especially for greek users to use and promote unicode (utf8) in all levels of applicatio development. You change the code once and it will play on all machines.
Also i corrected some minor errors, and optimized a little removing unneeded double calls to preg_match.
I sent the code thru the contact form to avoid overfilling this post.
My purpose for this code is to be used in a search engine. So, this piece of code is easily intergrated into the Zend port of the infamous Lucene search engine (initialy implemented for Java). It is a good PHP search engine implemted in a very modular way, that can be integrated to various projects. The adoption of the greek stemmer there was fairly straightforward.
I have some small classes for this to work, if you are interested also i can sent them.
basos
Thanks for the Tip
I appreciate your sharing this piece of information. I'm from Ikaria. It's unfortunate that we have to resort to third parties to provide Greek FTS. Isn't there like 50 languages already supported? Is Greek really that far down the line?