python - Using the predict_proba() function of RandomForestClassifier in the safe and right way -


i'm using scikit-learn apply machine learning algorithm on datasets. need have probabilities of labels/classes instated of labels/classes themselves. instead of having spam/not spam labels of emails, wish have example: 0.78 probability given email spam.

for such purpose, i'm using predict_proba() randomforestclassifier following:

clf = randomforestclassifier(n_estimators=10, max_depth=none,     min_samples_split=1, random_state=0) scores = cross_val_score(clf, x, y) print(scores.mean())  classifier = clf.fit(x,y) predictions = classifier.predict_proba(xtest) print(predictions) 

and got results:

 [ 0.4  0.6]  [ 0.1  0.9]  [ 0.2  0.8]  [ 0.7  0.3]  [ 0.3  0.7]  [ 0.3  0.7]  [ 0.7  0.3]  [ 0.4  0.6] 

where second column class: spam. however, have 2 main issues results not confident. first issue results represent probabilities of labels without being affected size of data? second issue results show 1 digit not specific in cases 0.701 probability different 0.708. there way next 5 digit example?

many in advance time in reading these 2 issues , questions.

  1. i more 1 digit in results, sure not due dataset ? (for example using small dataset yield simple decision trees , 'simple' probabilities). otherwise may display shows 1 digit, try print predictions[0,0].

  2. i not sure understand mean "the probabilities aren't affected size of data". if concern don't want predict, eg, many spams, done use threshold t such predict 1 if proba(label==1) > t. way can use threshold balance predictions, example limit global probabilty of spams. , if want globally analyse model, compute area under curve (auc) of receiver operating characteristic (roc) curve (see wikipedia article here). roc curve description of predictions depending on threshold t.

hope helps!


Comments

Popular posts from this blog

facebook - android ACTION_SEND to share with specific application only -

python - Creating a new virtualenv gives a permissions error -

javascript - cocos2d-js draw circle not instantly -