Giter Club home page Giter Club logo

Comments (15)

laserjay avatar laserjay commented on June 6, 2024 1

Hey @pbeck

Similiar to @habi, I was able to make it work by

  • converting the umlauts äöü to ae, oe and ue
  • replacing « » with " "
  • removing further unicode characters (cyrillic and chinese) at all

However, the issue with new lines not starting with dates, and therefore creating arbitrary section markers remained even after that.

If you're rewriting it anyway, could you also add a function to optionally include the timestamp as well?

Thank you very much, really looking forward to it and let me know if I can provide further help, i.e. by testing it! :)

Cheers!
laserjay

from whatsbook.

habi avatar habi commented on June 6, 2024

PS: If I do change all the Umlauts to their written out expression (ä > ae, ö > oe, etc.), then the script works :)
But the text is then not really nicely readable in German...

from whatsbook.

pbeck avatar pbeck commented on June 6, 2024

Hey David,

Thanks for reporting, I’ll look into it! I’ve used lots of umlauts aswell (chats in Swedish), but I don’t remember having issues with them.

Any chance you could try running wa2latex with Python 3? And could you upload a sample snippet that causes the error?

from whatsbook.

habi avatar habi commented on June 6, 2024

Running the command below with Python 3.5.2 (Thanks to Anaconda)

python wa2latex.py _chat_with_umlauts.txt > whatsbook-folio.tex

gives me

Traceback (most recent call last):
  File "wa2latex.py", line 148, in <module>
    line = emojis.replace_emoji(line)
  File "wa2latex.py", line 85, in replace_emoji
    text = text.replace(emoji, "\\emoji{" + emoji.encode('unicode-escape').encode('utf-8') + "}")
AttributeError: 'bytes' object has no attribute 'encode'

That's why I tried with Python2 :)
Might there be a problem with the encoding of the exported chat TXT file?

from whatsbook.

pbeck avatar pbeck commented on June 6, 2024

What’s the encoding of your txt file?

from whatsbook.

habi avatar habi commented on June 6, 2024

I’ve tried under Linux (at work), where I don’t have access to the file now.
At home (on OS X 10.11.6), the chat.txt file is UTF-8 encoded, and I get this error with Python 2.7.11

anomalocaris:whatsbook habi$ python wa2latex.py _chat.txt > whatsbook-folio.tex
Traceback (most recent call last):
  File "wa2latex.py", line 26, in <module>
    import pandas as pd
  File "/usr/local/lib/python2.7/site-packages/pandas/__init__.py", line 44, in <module>
    from pandas.core.api import *
  File "/usr/local/lib/python2.7/site-packages/pandas/core/api.py", line 9, in <module>
    from pandas.core.groupby import Grouper
  File "/usr/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 17, in <module>
    from pandas.core.frame import DataFrame
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 41, in <module>
    from pandas.core.series import Series
  File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 2909, in <module>
    import pandas.tools.plotting as _gfx
  File "/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.py", line 28, in <module>
    import pandas.tseries.converter as conv
  File "/usr/local/lib/python2.7/site-packages/pandas/tseries/converter.py", line 7, in <module>
    import matplotlib.units as units
  File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1131, in <module>
    rcParams = rc_params()
  File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 975, in rc_params
    return rc_params_from_file(fname, fail_on_error)
  File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1100, in rc_params_from_file
    config_from_file = _rc_params_in_file(fname, fail_on_error)
  File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1018, in _rc_params_in_file
    with _open_file_or_url(fname) as fd:
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py", line 1000, in _open_file_or_url
    encoding = locale.getdefaultlocale()[1]
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/locale.py", line 543, in getdefaultlocale
    return _parse_localename(localename)
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/locale.py", line 475, in _parse_localename
    raise ValueError, 'unknown locale: %s' % localename
ValueError: unknown locale: UTF-8

from whatsbook.

pbeck avatar pbeck commented on June 6, 2024

What happens if you do

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

or perhaps even better

export LC_ALL=de_CH.UTF-8
export LANG=de_CH.UTF-8

(if you’re in German speaking Switzerland)

and then run wa2python.py with python2?

from whatsbook.

habi avatar habi commented on June 6, 2024

If I export the Swiss german variables on OS X, then I get the same error as on Linux

Traceback (most recent call last):
  File "wa2latex.py", line 168, in <module>
    print(u"\section*{%s}" % date)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 13: ordinal not in range(128)

from whatsbook.

pbeck avatar pbeck commented on June 6, 2024

Running wa2latex.py on a file containing åäöÅÄÖ works with Python 2.7.10 on macOS 10.11.5.
My locales are set to en_US.UTF8.

Any chance you could send me a sample of your chatlog? It’s hard for me to debug without proper (non-working 😄) data.

from whatsbook.

habi avatar habi commented on June 6, 2024

I just sent the file to the email address in your GitHub profile.

from whatsbook.

pbeck avatar pbeck commented on June 6, 2024

I tried running wa2latex with your chat log, and it worked without any issues on macOS with Python 2.7. I’ll be traveling abroad next week, but I can hopefully figure something out when I’m back.

from whatsbook.

laserjay avatar laserjay commented on June 6, 2024

Guys, I have the same problem on Ubuntu 16.04 with Python 2.7.12, LANG and LC_ALL both set to en_US.UTF-8 and the chat history being in UTF-8 (text contains Swiss German characters as well).
I get the following error:

Traceback (most recent call last):
File "wa2latex.py", line 168, in
print(u"\section*{%s}" % date)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15: ordinal not in range(128)

Also, the section markers create sometimes correct ones:
\section*{01.01.2000}
but other times they use the first word on a new line if there's no date present:
\section*{word}

I checked the file with a hex editor and found out, that if there's a 0x200A before the new line/word, it gets a section marker with words, before other new lines there is a 0x0D0A and it parses the date correctly.

from whatsbook.

pbeck avatar pbeck commented on June 6, 2024

I haven’t been able to reproduce @habi’s issues, but I’m sure they’re valid – even more so if you also have issues @laserjay. My ambition is to rewrite parts of wa2latex for Python 3 as soon as possible, I’m hoping this will if not solve your issues, at least make them easier to debug.

from whatsbook.

bakshi-varun avatar bakshi-varun commented on June 6, 2024

hi

I am facing a similar issue as lasrerjay "However, the issue with new lines not starting with dates, and therefore creating arbitrary section markers remained even after that." Any leads how to solve that?

from whatsbook.

pbeck avatar pbeck commented on June 6, 2024

@bakshi-varun Maybe laserjays latest comment might help?
I haven’t had the time to update the script yet and no ETA for when that will happen unfortunately.

from whatsbook.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.