python - dealing with dimensions in scikit-learn tree.decisiontreeclassifier -

i trying decision tree using scikit-learn 3 dimensional training data , 2 dimensional target data. simple example, imagine rgb image. lets target data 1's , 0's, 1's represent presence of human face, , 0's represent absence. take example:

red         green        blue        face presence    1000        0001         0011        0000     0110        0110         0001        0110     0110        0110         0000        0110

an array of rgb data represent training data, , 2d array represent target classes (face, no-face).

in python these arrays may like:

rgb = np.array([[[1,0,0,0],[0,1,1,0],[0,1,1,0]],                [[0,0,0,1],[0,1,1,0],[0,1,1,0]],                [[0,0,1,1],[0,0,0,1],[0,0,0,0]]])  face = np.array([[0,0,0,0],[0,1,1,0],[0,1,1,0]])

unfortunately, doesn't work

import numpy np sklearn import tree dt_clf = tree.decisiontreeclassifier() dt_clf = dt_clf.fit(rgb, face)

this throws error:

found array dim 3. expected <= 2

i have tried reshaping , flattening data several ways , error:

number of labels=xxx not match number of samples

does know how can use tree.decisiontreeclassifier accomplish this? thanks.

i think have figured out. it's not pretty. maybe can offer cleaning code. basically, needed organize rgb data array of 12 3-element arrays, or shape=(12,3). example...

np.hsplit(np.dstack(rgb).flatten(), len(face.flatten()))

i flatten face data, final fit call becomes...

dt_clf = dt_clf.fit(np.hsplit(np.dstack(rgb).flatten(), len(face.flatten())),                      face.flatten())

now can test new dataset , see if works. target image indicated face presence when both red , green pixels shown, test might be...

red         green        blue   1100        1100         0011   1100        1100         0001   0000        0000         0000

or...

predict = np.array([[[1,1,0,0],[1,1,0,0],[0,0,0,0]],                     [[1,1,0,0],[1,1,0,0],[0,0,0,0]],                     [[0,0,1,1],[0,0,0,1],[0,0,0,0]]])

so...

predicted = dt_clf.predict(np.hsplit(np.dstack(predict).flatten(),                            len(face.flatten())))

and in proper dimensions...

predicted = np.array(np.hsplit(predicted, face.shape[0]))

which yields

array([[1, 1, 0, 0],        [1, 1, 0, 0],        [0, 0, 0, 0]])

wonderful! see if works on bigger. please feel free offer suggestions make cleaner.

Search This Blog

Szoka

python - dealing with dimensions in scikit-learn tree.decisiontreeclassifier -

Comments

Post a Comment

Popular posts from this blog

facebook - android ACTION_SEND to share with specific application only -

python - Creating a new virtualenv gives a permissions error -

go - Idiomatic way to handle template errors in golang -