Haskell - Lempel-Ziv 78 Compression - very slow, why? -


so i've finished lempel-ziv compression/decompression, compared c++ unbelievably slow, i've tried on http://norvig.com/big.txt file, program can't process it, while in c++ took 1 second. of haskell gurus @ code , tell me if there obvious flaws?

after adding prepending instead of appending, managed reduce time 16 seconds 0.4!

haskell's laziness deceiving, printing 'compression finished' immediately, in fact compression made program run slow

import system.io   import control.monad import qualified data.map map import debug.trace main =     contents <- readfile "plik.txt"     let compressed = reverse $ compress contents map.empty 1 "" []     let decompressed = reverse $ decompress compressed map.empty 1 ""   --print $ contents   print $ length compressed   print $ length decompressed   --print $ contents == decompressed compress :: string -> map.map string int -> int -> string -> [(int,char)]-> [(int,char)] compress (s:x) m c out =     if map.member (c++[s]) m == false                       if c == ""                                     let newmap = map.insert [s] m                     compress x newmap (i+1) c ((0,s):out)                 else                     let newmap = map.insert (c++[s]) m                     compress x newmap (i+1) "" ((newmap map.! c, s):out)         else compress x m (c++[s]) out  compress s m c out = compress2 out  compress2 ::  [(int,char)]-> [(int,char)] compress2 out = trace("compression finished") out    decompress :: [(int,char)] -> map.map int string -> int -> string -> string decompress (s:x) m out =     if fst s ==  0                     let newmap = map.insert [snd s] m             decompress x newmap (i+1) ((snd s):out)         else             let newmap =  map.insert ((m map.! fst s)++[snd s]) m             decompress x newmap (i+1) ((snd s):(reverse (newmap map.! fst s))++out)  decompress s m out = out  decompress2 ::  string -> string decompress2 out =  trace("decompression finished") out 

as others have said in comments:

  • avoid using string; uses lot of ram , it's quite slow process. use bytestring (if you're expecting process raw binary data) or text (if you're expecting unicode text).

  • don't append lists. (string list — don't use string in first place.) prepend if it's easy. if not, use data.sequence, data.set or data.map appropriate. or maybe text, bytestring or vector.

  • if see haskell program that's slow and uses buckets of ram though input , output files tiny, have somewhere being lazy.

as example, had program produce byte histogram. took 20 minutes , consumed 8 gb of ram. changed data constructor strict, adding single ! it. program takes fraction of second! laziness can make absurdly large difference if wrong.

looking @ code you've posted, covers all. (i'm not sure changes you've tried , you're now.)


Comments

Popular posts from this blog

facebook - android ACTION_SEND to share with specific application only -

python - Creating a new virtualenv gives a permissions error -

javascript - cocos2d-js draw circle not instantly -