java - Why does parallelStream not use the entire available parallelism? -
i have custom forkjoinpool created parallelism of 25.
customforkjoinpool = new forkjoinpool(25);
i have list of 700 file names , used code download files s3 in parallel , cast them java objects:
customforkjoinpool.submit(() -> { return filenames .parallelstream() .map((filename) -> { logger log = logger.getlogger("forkjointest"); long starttime = system.currenttimemillis(); log.info("starting job @ thread:" + thread.currentthread().getname()); myobject obj = readobjectfroms3(filename); long endtime = system.currenttimemillis(); log.info("completed job latency:" + (endtime - starttime)); return obj; }) .collect(collectors.tolist); }); });
when @ logs, see 5 threads being used. parallelism of 25, expected use 25 threads. average latency download , convert file object around 200ms. missing?
may better question how parallelstream figure how split original list before creating threads it? in case, looks decided split 5 times , stop.
why doing forkjoinpool
? it's meant cpu-bound tasks subtasks fast warrant individual scheduling. workload io-bound , 200ms latency individual scheduling overhead negligible.
use executor
:
import static java.util.stream.collectors.tolist; import static java.util.concurrent.completablefuture.supplyasync; executorservice threads = executors.newfixedthreadpool(25); list<myobject> result = filenames.stream() .map(fn -> supplyasync(() -> readobjectfroms3(fn), threads)) .collect(tolist()).stream() .map(completablefuture::join) .collect(tolist());
Comments
Post a Comment