User login |
Greek stemmer classAfter heavy googling I found a greek stemmer which is the product of the Master Thesis of Georgios Ntais at Royal Institute of Technology [KTH] (Stockholm, Sweden) supervised by assoc. professor Hercules Dalianis. The stemmer is implemented in javascript and you can find it online here. You can download the php port, published under GNU license. You can check out the first demo page. It is a php4 compatible class. The usage is really simple just create an instance of the class and call stem_word method. The input for stem_word must be in upper case. require("GreekStemmer.class.php"); $st = new GreekStemmer(); echo $st->stem_word('????????'); I hope the companion class is (i hope) stable. Greek_text class contains some helpful static methods for handling greek text:
More detailed documentation is coming soon... Links |
Greek stemmer and Lucene
@basos I would be very interested in any adaptations of a greek stemmer to be used with Lucene, could you publish your code?
It is very important
It is very important especially for greek users to use and promote unicode (utf8) in all levels of application development.
under armour
I am happy!
I am so happy, first time that happens to me :D
Someone take a code from here, improve it and share it back here with us.
Baso, could you give us a geolocation information about you?
This class is the result of the work of greeks not just in greece ;)
Best Regards
Panos Kyriakakis
Owner of Salix.gr
Larissa, Greece
Much appreciation for this
Much appreciation for this article. Very riveting and accurately composed blog post. I will return in the near future.
nikon d3s
New version
Hello,
i want to say that you did a nice work, it is very important to find out the port of greek stemmer for PHP.
I modified it a little to use the PHP5 OO model, since PHP4 is considered ancient already,
and to use utf8 encoding. It is very important especially for greek users to use and promote unicode (utf8) in all levels of applicatio development. You change the code once and it will play on all machines.
Also i corrected some minor errors, and optimized a little removing unneeded double calls to preg_match.
I sent the code thru the contact form to avoid overfilling this post.
My purpose for this code is to be used in a search engine. So, this piece of code is easily intergrated into the Zend port of the infamous Lucene search engine (initialy implemented for Java). It is a good PHP search engine implemted in a very modular way, that can be integrated to various projects. The adoption of the greek stemmer there was fairly straightforward.
I have some small classes for this to work, if you are interested also i can sent them.
basos
Thanks for the Tip
I appreciate your sharing this piece of information. I'm from Ikaria. It's unfortunate that we have to resort to third parties to provide Greek FTS. Isn't there like 50 languages already supported? Is Greek really that far down the line?