How do I use custom stopwords and stemmer file in WEKA (Java)? -


so far have:

ngramtokenizer tokenizer = new ngramtokenizer(); tokenizer.setngramminsize(2); tokenizer.setngrammaxsize(2);  tokenizer.setdelimiters("[\\w+\\d+]");  stringtowordvector filter = new stringtowordvector(); // customize filter here instances data = filter.usefilter(input, filter); 

the api has these 2 methods stringtowordvector:

setstemmer(stemmer value); setstopwordshandler(stopwordshandler value); 

i have text file containing stopwords , class stems words. how use custom stemmer , stopwords filter? note i'm taking phrases of size 2, can't preprocess , remove stopwords beforehand.

update: worked me (using weka developer version 3.7.12)

to use custom stopwords handler:

public class mystopwordshandler implements stopwordshandler {      private hashset<string> mystopwords;      public mystopwordshandler() {         //load in own stopwords, etc.     }      //must implement method stopwordshandler interface     public boolean isstopword(string word) {         return mystopwords.contains(word);      }  } 

to use custom stemmer, create class implements stemmer interface , write implementations these methods:

public string stem(string word) { ... } public string getrevision() { ... }  

then use custom stopwords handler , stemmer:

stringtowordvector filter = new stringtowordvector(); filter.setstemmer(new mystemmer()); filter.setstopwordshandler(new mystopwordshandler()); 

note: answer below thusitha works stable 3.6 verion, , simpler 1 described above. not work 3.7.12 version.

in latest weka library can use

stringtowordvector filter = new stringtowordvector(); filter.setstopwords(new file("filename")); 

i'm using following dependency

<dependency>    <groupid>nz.ac.waikato.cms.weka</groupid>    <artifactid>weka-stable</artifactid>    <version>3.6.12</version> </dependency> 

in api docs api doc

public void setstopwords(java.io.file value) sets file containing stopwords, null or directory unset stopwords. if file exists, automatically turns on flag use stoplist. parameters: value - file containing stopwords


Comments

Popular posts from this blog

facebook - android ACTION_SEND to share with specific application only -

python - Creating a new virtualenv gives a permissions error -

javascript - cocos2d-js draw circle not instantly -