Open Source & stuff 

Salix.gr

User login

Greek stemmer class

After heavy googling I found a greek stemmer which is the product of the Master Thesis of Georgios Ntais at Royal Institute of Technology [KTH] (Stockholm, Sweden) supervised by assoc. professor Hercules Dalianis. The stemmer is implemented in javascript and you can find it online here.

You can download the php port, published under GNU license. You can check out the first demo page. It is a php4 compatible class. The usage is really simple just create an instance of the class and call stem_word method. The input for stem_word must be in upper case.

require("GreekStemmer.class.php");
$st = new GreekStemmer();
echo $st->stem_word('????????');

I hope the companion class is (i hope) stable. Greek_text class contains some helpful static methods for handling greek text:

  • stopwords filter for greek found at lecture slides by Marios Dikaiakos and Georgios Pallis.
  • to_upper function working with any locale setting
  • to_greeklish function, used by the next one
  • titlize function for making greek titles readable with latin characters for nice urls
  • splitWords function a replacement for php's str_word_count function that works with any locale setting (that one holds me, need some improvements)

More detailed documentation is coming soon...

Links
Greek stemmer class demo page
Greek text class demo page
Download GreekStemmer class version 1.0
Download Greek Text class version 1.0
New! Basos's improved version plus lucene addon Download!

Greek stemmer and Lucene

@basos I would be very interested in any adaptations of a greek stemmer to be used with Lucene, could you publish your code?

It is very important

It is very important especially for greek users to use and promote unicode (utf8) in all levels of application development.

under armour

I am happy!

I am so happy, first time that happens to me :D
Someone take a code from here, improve it and share it back here with us.
Baso, could you give us a geolocation information about you?
This class is the result of the work of greeks not just in greece ;)
Best Regards
Panos Kyriakakis
Owner of Salix.gr
Larissa, Greece

Much appreciation for this

Much appreciation for this article. Very riveting and accurately composed blog post. I will return in the near future.
nikon d3s

New version

Hello,
i want to say that you did a nice work, it is very important to find out the port of greek stemmer for PHP.
I modified it a little to use the PHP5 OO model, since PHP4 is considered ancient already,
and to use utf8 encoding. It is very important especially for greek users to use and promote unicode (utf8) in all levels of applicatio development. You change the code once and it will play on all machines.
Also i corrected some minor errors, and optimized a little removing unneeded double calls to preg_match.

I sent the code thru the contact form to avoid overfilling this post.

My purpose for this code is to be used in a search engine. So, this piece of code is easily intergrated into the Zend port of the infamous Lucene search engine (initialy implemented for Java). It is a good PHP search engine implemted in a very modular way, that can be integrated to various projects. The adoption of the greek stemmer there was fairly straightforward.
I have some small classes for this to work, if you are interested also i can sent them.

basos

Thanks for the Tip

I appreciate your sharing this piece of information. I'm from Ikaria. It's unfortunate that we have to resort to third parties to provide Greek FTS. Isn't there like 50 languages already supported? Is Greek really that far down the line?


All Rights Reserved 2006-8 Salix.gr | Hosting by e-emporio