mapreduce - Algorithm/Programming for processing -
i using spark streaming (coding in java), , want understand how can make algorithm following problem. relatively new @ map-reduce , need assistance in designing algorithm.
here problem in details.
problem details
input:
- initial input of text is:
(pattern),(timestamp, message)
(3)(12/5/2014 01:00:01, message)
where 3 pattern type
- i have converted dstream key=p1,p2, p1 , p2 pattern classes input line, , value = pattern, input, timestamp. hence each tuple of dstream follows:
template: (p1,p2),(pattern_id, timestamp, string)
example: (3,4),(3, 12/5/2014 01:00:01, message)
here 3 , 4 pattern types paired together.
- i have model each pair has time difference associated it. example, model in hashmap key-values :
template: (p1,p2)(time difference)
example: (3,4)(2:20)
where time difference in model 2:20, , if have 2 messages of pattern of type 3 , 4 respectively in stream, program should output anomaly if difference between 2 messages greater 2:20.
what best way model in spark streaming?
what have tried far
- i created dstream shown in step 2 step 1
- created broadcast variable sending map learnt in model (step 3 above) workers
- i stuck @ trying figure out algorithm on how generate anomaly stream in spark streaming. cannot figure out how make associative function stream operation
here current code: https://gist.github.com/nipunarora/22c8e336063a2a1cc4a9
Comments
Post a Comment