Haskell - Lempel-Ziv 78 Compression - very slow, why? -
so i've finished lempel-ziv compression/decompression, compared c++ unbelievably slow, i've tried on http://norvig.com/big.txt file, program can't process it, while in c++ took 1 second. of haskell gurus @ code , tell me if there obvious flaws?
after adding prepending instead of appending, managed reduce time 16 seconds 0.4!
haskell's laziness deceiving, printing 'compression finished' immediately, in fact compression made program run slow
import system.io import control.monad import qualified data.map map import debug.trace main = contents <- readfile "plik.txt" let compressed = reverse $ compress contents map.empty 1 "" [] let decompressed = reverse $ decompress compressed map.empty 1 "" --print $ contents print $ length compressed print $ length decompressed --print $ contents == decompressed compress :: string -> map.map string int -> int -> string -> [(int,char)]-> [(int,char)] compress (s:x) m c out = if map.member (c++[s]) m == false if c == "" let newmap = map.insert [s] m compress x newmap (i+1) c ((0,s):out) else let newmap = map.insert (c++[s]) m compress x newmap (i+1) "" ((newmap map.! c, s):out) else compress x m (c++[s]) out compress s m c out = compress2 out compress2 :: [(int,char)]-> [(int,char)] compress2 out = trace("compression finished") out decompress :: [(int,char)] -> map.map int string -> int -> string -> string decompress (s:x) m out = if fst s == 0 let newmap = map.insert [snd s] m decompress x newmap (i+1) ((snd s):out) else let newmap = map.insert ((m map.! fst s)++[snd s]) m decompress x newmap (i+1) ((snd s):(reverse (newmap map.! fst s))++out) decompress s m out = out decompress2 :: string -> string decompress2 out = trace("decompression finished") out
as others have said in comments:
avoid using
string
; uses lot of ram , it's quite slow process. usebytestring
(if you're expecting process raw binary data) ortext
(if you're expecting unicode text).don't append lists. (
string
list — don't usestring
in first place.) prepend if it's easy. if not, usedata.sequence
,data.set
ordata.map
appropriate. or maybetext
,bytestring
orvector
.if see haskell program that's slow and uses buckets of ram though input , output files tiny, have somewhere being lazy.
as example, had program produce byte histogram. took 20 minutes , consumed 8 gb of ram. changed data constructor strict, adding single !
it. program takes fraction of second! laziness can make absurdly large difference if wrong.
looking @ code you've posted, covers all. (i'm not sure changes you've tried , you're now.)
Comments
Post a Comment