Comments (15)
Hey @pbeck
Similiar to @habi, I was able to make it work by
- converting the umlauts äöü to ae, oe and ue
- replacing « » with " "
- removing further unicode characters (cyrillic and chinese) at all
However, the issue with new lines not starting with dates, and therefore creating arbitrary section markers remained even after that.
If you're rewriting it anyway, could you also add a function to optionally include the timestamp as well?
Thank you very much, really looking forward to it and let me know if I can provide further help, i.e. by testing it! :)
Cheers!
laserjay
from whatsbook.
PS: If I do change all the Umlauts to their written out expression (ä > ae, ö > oe, etc.), then the script works :)
But the text is then not really nicely readable in German...
from whatsbook.
Hey David,
Thanks for reporting, I’ll look into it! I’ve used lots of umlauts aswell (chats in Swedish), but I don’t remember having issues with them.
Any chance you could try running wa2latex with Python 3? And could you upload a sample snippet that causes the error?
from whatsbook.
Running the command below with Python 3.5.2
(Thanks to Anaconda)
python wa2latex.py _chat_with_umlauts.txt > whatsbook-folio.tex
gives me
Traceback (most recent call last):
File "wa2latex.py", line 148, in <module>
line = emojis.replace_emoji(line)
File "wa2latex.py", line 85, in replace_emoji
text = text.replace(emoji, "\\emoji{" + emoji.encode('unicode-escape').encode('utf-8') + "}")
AttributeError: 'bytes' object has no attribute 'encode'
That's why I tried with Python2 :)
Might there be a problem with the encoding of the exported chat TXT file?
from whatsbook.
What’s the encoding of your txt file?
from whatsbook.
I’ve tried under Linux (at work), where I don’t have access to the file now.
At home (on OS X 10.11.6), the chat.txt
file is UTF-8 encoded, and I get this error with Python 2.7.11
anomalocaris:whatsbook habi$ python wa2latex.py _chat.txt > whatsbook-folio.tex
Traceback (most recent call last):
File "wa2latex.py", line 26, in <module>
import pandas as pd
File "/usr/local/lib/python2.7/site-packages/pandas/__init__.py", line 44, in <module>
from pandas.core.api import *
File "/usr/local/lib/python2.7/site-packages/pandas/core/api.py", line 9, in <module>
from pandas.core.groupby import Grouper
File "/usr/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 17, in <module>
from pandas.core.frame import DataFrame
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 41, in <module>
from pandas.core.series import Series
File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 2909, in <module>
import pandas.tools.plotting as _gfx
File "/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.py", line 28, in <module>
import pandas.tseries.converter as conv
File "/usr/local/lib/python2.7/site-packages/pandas/tseries/converter.py", line 7, in <module>
import matplotlib.units as units
File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1131, in <module>
rcParams = rc_params()
File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 975, in rc_params
return rc_params_from_file(fname, fail_on_error)
File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1100, in rc_params_from_file
config_from_file = _rc_params_in_file(fname, fail_on_error)
File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1018, in _rc_params_in_file
with _open_file_or_url(fname) as fd:
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1000, in _open_file_or_url
encoding = locale.getdefaultlocale()[1]
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/locale.py", line 543, in getdefaultlocale
return _parse_localename(localename)
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/locale.py", line 475, in _parse_localename
raise ValueError, 'unknown locale: %s' % localename
ValueError: unknown locale: UTF-8
from whatsbook.
What happens if you do
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
or perhaps even better
export LC_ALL=de_CH.UTF-8
export LANG=de_CH.UTF-8
(if you’re in German speaking Switzerland)
and then run wa2python.py with python2?
from whatsbook.
If I export the Swiss german variables on OS X, then I get the same error as on Linux
Traceback (most recent call last):
File "wa2latex.py", line 168, in <module>
print(u"\section*{%s}" % date)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 13: ordinal not in range(128)
from whatsbook.
Running wa2latex.py on a file containing åäöÅÄÖ works with Python 2.7.10 on macOS 10.11.5.
My locales are set to en_US.UTF8.
Any chance you could send me a sample of your chatlog? It’s hard for me to debug without proper (non-working 😄) data.
from whatsbook.
I just sent the file to the email address in your GitHub profile.
from whatsbook.
I tried running wa2latex with your chat log, and it worked without any issues on macOS with Python 2.7. I’ll be traveling abroad next week, but I can hopefully figure something out when I’m back.
from whatsbook.
Guys, I have the same problem on Ubuntu 16.04 with Python 2.7.12, LANG and LC_ALL both set to en_US.UTF-8 and the chat history being in UTF-8 (text contains Swiss German characters as well).
I get the following error:
Traceback (most recent call last):
File "wa2latex.py", line 168, in
print(u"\section*{%s}" % date)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15: ordinal not in range(128)
Also, the section markers create sometimes correct ones:
\section*{01.01.2000}
but other times they use the first word on a new line if there's no date present:
\section*{word}
I checked the file with a hex editor and found out, that if there's a 0x200A before the new line/word, it gets a section marker with words, before other new lines there is a 0x0D0A and it parses the date correctly.
from whatsbook.
I haven’t been able to reproduce @habi’s issues, but I’m sure they’re valid – even more so if you also have issues @laserjay. My ambition is to rewrite parts of wa2latex for Python 3 as soon as possible, I’m hoping this will if not solve your issues, at least make them easier to debug.
from whatsbook.
hi
I am facing a similar issue as lasrerjay "However, the issue with new lines not starting with dates, and therefore creating arbitrary section markers remained even after that." Any leads how to solve that?
from whatsbook.
@bakshi-varun Maybe laserjays latest comment might help?
I haven’t had the time to update the script yet and no ETA for when that will happen unfortunately.
from whatsbook.
Related Issues (6)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whatsbook.