java - Problems with unicode variables in subprocess.check_output Python with Django -
recently in work brought server , cofigure it. contextualize. problem working in code in java (cogroo, grammar checker in portuguese) , have codes in python, make work both codes i'm calling jar file within python code. when working in local machine works fine , when put in server have troubles.
>>> = u"ele anda à cavalo" >>> print(type(a)) >>> <type 'unicode'> >>> u'ele anda \xe0 cavalo' >>> print(a) ele anda à cavalo
in local machine , on server terminal works fine, if same on python script brings me error "ascii' codec can't encode character u'\xe0' print python". in script can't print unicode string. when try call output = subprocess.check_output(cd.encode("utf-8"), shell=true)
var cd has java code , path cd = 'java -jar path/file.jar grammarchecker -country br -lang pt -text "' + auxtextpure + '"'
var auxtextpure
unicode string.
look thats 2 problems, first 1 when var auxtextpure
istring without special caracters a menino returns output os determinantes concordam em n?mero (singular ou plural) e em g?nero (masculino ou feminino) com o substantivo que se referem. , need output accentuation, , second error when using string accentuation ele anda à cavalo brings output verifique repeti??o de palavras. correct output o sinal indicativo de crase indica que temos "a" + "a" expressos em um só "à". somente ocorre crase quando há encontro de preposição "a" com artigo ou pronome demonstrativo "a"/"as". portanto, não ocorre crase antes de palavras masculinas. accentuation, know problem because python or django on server , specific on script can't translate unicode (utf-8) , print on screen or on variable. try make cd.encode("utf-8")
auxtextpure.encode("utf-8")
auxtextpure.decode("utf-8")
, other codes import codecs
, use codecs , try find on internet problem, in place can find how fix this, can me please? much, sorry bad english. leandro costa valadão.
a string should first decoded utf-8 encoded or vice versa. never know if string encoded or decoded originally.
1.
a = u"ele anda à cavalo" print a.encode('utf-8')
.
ele anda à cavalo
2.
a = u"ele anda à cavalo" print a.decode('utf-8')
.
unicodeencodeerror: 'ascii' codec can't encode character u'\xe0' in position 9: ordinal not in range(128)
3.
a = u"ele anda à cavalo" print a.encode('utf-8').decode('utf-8')
.
ele anda à cavalo
4.
a = u"ele anda à cavalo" print a.encode('utf-8').decode('utf-8').encode('utf-8')
.
ele anda à cavalo
funny, isn't it?
Comments
Post a Comment