mapreduce - Algorithm/Programming for processing -


i using spark streaming (coding in java), , want understand how can make algorithm following problem. relatively new @ map-reduce , need assistance in designing algorithm.

here problem in details.

problem details

input:

  1. initial input of text is:

(pattern),(timestamp, message)

(3)(12/5/2014 01:00:01, message)

where 3 pattern type

  1. i have converted dstream key=p1,p2, p1 , p2 pattern classes input line, , value = pattern, input, timestamp. hence each tuple of dstream follows:

template: (p1,p2),(pattern_id, timestamp, string)

example: (3,4),(3, 12/5/2014 01:00:01, message)

here 3 , 4 pattern types paired together.

  1. i have model each pair has time difference associated it. example, model in hashmap key-values :

template: (p1,p2)(time difference)

example: (3,4)(2:20)

where time difference in model 2:20, , if have 2 messages of pattern of type 3 , 4 respectively in stream, program should output anomaly if difference between 2 messages greater 2:20.

what best way model in spark streaming?

what have tried far

  1. i created dstream shown in step 2 step 1
  2. created broadcast variable sending map learnt in model (step 3 above) workers
  3. i stuck @ trying figure out algorithm on how generate anomaly stream in spark streaming. cannot figure out how make associative function stream operation

here current code: https://gist.github.com/nipunarora/22c8e336063a2a1cc4a9


Comments

Popular posts from this blog

facebook - android ACTION_SEND to share with specific application only -

python - Creating a new virtualenv gives a permissions error -

javascript - cocos2d-js draw circle not instantly -