as.h2o() in R to upload files to h2o environment takes a long time -

i using h2o carry out modelling, , having tuned model, used carry out lot of predictions approx 6bln predictions/rows, per prediction row needs 80 columns of data

the dataset have broken down input dataset down in 500 x 12 million row chunks each relevant 80 columns of data.

however upload data.table 12 million 80 columns h2o takes quite long time, , doing 500 times me taking prohibitively long time...i think because parsing object first before uploaded.

the prediction part relatively quick in comparison....

are there suggestions speed part up? changing number of cores help?

below reproducible example of issues...

  # load libraries   library(h2o)   library(data.table)    # start h2o using cores...   localh2o = h2o.init(nthreads=-1,max_mem_size="16g")    # create test input dataset   temp <- cj(v1=seq(20),              v2=seq(7),              v3=seq(24),              v4=seq(60),              v5=seq(60))   temp <- do.call(cbind,lapply(seq(16),function(y){temp}))   colnames(temp) <- paste0('v',seq(80))    # part takes long time!!   system.time(tmp.obj <- as.h2o(localh2o,temp,key='test_input'))    #|======================================================================| 100%   #   user  system elapsed    #357.355   6.751 391.048

since running h2o locally, want save data file , use:

h2o.importfile(localh2o, file_path, key='test_intput')

this have each thread read parts of file in parallel. if run h2o on separate server, need copy data location server can read (most people don't set servers read file system on laptops).

as.h2o() serially uploads file h2o. h2o.importfile(), h2o server finds file , reads in parallel.

it looks using version 2 of h2o. same commands work in h2ov3, of parameter names have changed little. new parameter names here: http://cran.r-project.org/web/packages/h2o/h2o.pdf

Search This Blog

Szoka

as.h2o() in R to upload files to h2o environment takes a long time -

Comments

Post a Comment

Popular posts from this blog

facebook - android ACTION_SEND to share with specific application only -

python - Creating a new virtualenv gives a permissions error -

go - Idiomatic way to handle template errors in golang -