arty-name / livejournal-export Goto Github PK

View Code? Open in Web Editor NEW

44.0 44.0 16.0 14 KB

Export your LiveJournal (posts + comments) to XML, JSON and optionally convert them to HTML and Markdown

Python 100.00%

livejournal-export's People

Contributors

Stargazers

Watchers

Forkers

raidho ianvanness eburtness olegpelipenko angelfirenze01 josiahcarlson skorasaurus aboutaaron asfalothde hsenag adenyes donfanning acloserview dimonier

livejournal-export's Issues

Script killled with "Out of memory"

Reported by Aleksandr Nessler:

I’m trying to download the blog with 2000 posts & over 250000 comments, and i get the running script ‘killed’ while trying to create the json version with all the comments.

I tryed to limit the ammount of processed posts+comment by setting the start-end YEAR, but this parametter does not affect comments, as i can see. It affects only posts.

Probably there shall be limit for the data raised to memory to create the structure for json_export.

xml error parsing error in download_posts.py

Hi,

Thanks for making this; unfortunately I receive the following error when I run python3 export.py in a virtualenv that's specifically for this package. I installed requests, beautifulsoup4, markdown, html2text using pip within my virtualenv.

I also filled out auth.py and ran that first before running export.py.

I should also mention that I did not have a cookie named ljmastersession but I did have a cookie named ljsession and used its value for ljmastersession.

Traceback (most recent call last): File "export.py", line 194, in <module> all_posts = download_posts() File "/home/skors/python/livejournal-export/download_posts.py", line 64, in download_posts xml_posts.extend(list(ET.fromstring(xml).iter('entry'))) File "/usr/lib/python3.5/xml/etree/ElementTree.py", line 1333, in XML parser.feed(text) xml.etree.ElementTree.ParseError: syntax error: line 1, column 0

Is that script works for communities?

Subj.

Thanks

Crash when handling a comment of deleted user

Reported by Aleksandr Nessler:

I’m using your livejournal exporter and i constantly receive a strange issue which prevents further export

raidho@Ubuntu-1404-trusty-64-minimal:~/lj_cosh/lj_export$ ./export.py
Traceback (most recent call last):
  File "./export.py", line 195, in <module>
    all_comments = download_comments()
  File "/home/raidho/lj_cosh/lj_export/download_comments.py", line 91, in download_comments
    start_id, comments = get_more_comments(start_id + 1, users)
  File "/home/raidho/lj_cosh/lj_export/download_comments.py", line 68, in get_more_comments
    comment['author'] = users[str(comment['posterid'])]
KeyError: ‘48713497'

… i checked the usermap.json & found that there is no such user.
I suppose the issue is related users that are deleted from livejournal.

like this one http://evlie.livejournal.com

I found the corresponding user to the id ‘48713497’ on wich your exporter stops.

LJ language option and parser warning.

Generated HTML files use "Комментарии" (Russian) instead of "Comments" (English) as an English speaker. There must be a way to set language settings somewhere.

In addition it throws up this warning

drive\location\lj\export.py:54: GuessedAtParserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 54 of the file drive\location\lj\export.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

slug = BeautifulSoup('
{0}
'.format(slug)).text

Code might need to be more explicit.

Script gets stuck if there are no comments to download

First off, thanks for this script.

The following code in download_comments.py gets stuck in an infinite loop if there are no comments at all

    start_id = -1
    max_id = int(metadata.find('maxid').text)

    print("Start getting comments", start_id, "/", max_id)
    while start_id < max_id:
        ...

start_id is always -1, max_id is always 0.

Question and thanks

Thanks for making this! I finally stopped procrastinating on my exporting my old 2002-12 journal and this was the only tool for that that still seems to work.

I do have a question though: Is it possible to include the Current Mood and Current Music info in the markdown versions? If so, how?

Thank you!

Thanks!

Just wanted to say thanks, this was helpful for me! I had to add encoding: 'utf8' a bunch of places, but otherwise it worked perfectly.

Thanks!

Just wanted to say thanks :)
The LJ download tool is awfully uncomfortable -- this repo works wonders.
❤️

arty-name / livejournal-export Goto Github PK

livejournal-export's People

Contributors

Stargazers

Watchers

Forkers

livejournal-export's Issues

Script killled with "Out of memory"

xml error parsing error in download_posts.py

Is that script works for communities?

Crash when handling a comment of deleted user

LJ language option and parser warning.

Script gets stuck if there are no comments to download

Question and thanks

Thanks!

Thanks!

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent