scala - How to flatMap a function on GroupedDataSet in Apache Flink -
i want apply function via flatmap each group produced dataset.groupby. trying call flatmap compiler error:
error: value flatmap not member of org.apache.flink.api.scala.groupeddataset my code:
var mapped = env.fromcollection(array[(int, int)]()) var groups = mapped.groupby("mygroupfield") groups.flatmap( myfunction: (int, array[int]) => array[(int, array[(int, int)])] ) // error: groupeddataset has no member flatmap indeed, in documentation of flink-scala 0.9-snapshot no map or similar listed. there similar method work with? how achieve desired distributed mapping on each group individually on node?
you can use reducegroup(groupreducefunction f) process elements group. groupreducefunction gives iterable on elements of group , collector emit arbitrary number of elements.
flink's groupby() function not group multiple elements single element, i.e., not convert group of (int, int) elements (that share same _1 tuple field) 1 (int, array[int]). instead, dataset[(int, int)] logically grouped such elements have same key can processed together. when apply groupreducefunction on groupeddataset, function called once each group. in each call elements of group handed function. function can process elements of group , convert group of (int, int) elements single (int, array[int]) element.
Comments
Post a Comment