Giter Club home page Giter Club logo

odt2md's People

Contributors

pgmmpk avatar typiconman avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

odt2md's Issues

Covert English services files

I am trying to convert some English Menaion files using ODT2MD. I get an error message:

Traceback (most recent call last):
  File "/home/sasha/.local/bin/odt2md", line 11, in <module>
    sys.exit(main())
  File "/home/sasha/.local/lib/python3.6/site-packages/odt2md/odt2md.py", line 52, in main
    odt2md(args.input, args.output, profile=args.profile, max_line_width=args.max_line_width)
  File "/home/sasha/.local/lib/python3.6/site-packages/odt2md/odt2md.py", line 19, in odt2md
    markdown_text = styler.format_md(blocks)
  File "/home/sasha/.local/lib/python3.6/site-packages/odt2md/styler.py", line 137, in format_md
    for b in blocks:
  File "/home/sasha/.local/lib/python3.6/site-packages/odt2md/block.py", line 211, in parse_odt
    yield from parse_blocks(para)
  File "/home/sasha/.local/lib/python3.6/site-packages/odt2md/block.py", line 46, in parse_blocks
    yield from extract_spans(ev.scan(para), para_style)
  File "/home/sasha/.local/lib/python3.6/site-packages/odt2md/block.py", line 98, in extract_spans
    assert False, e['tag']
AssertionError: {urn:oasis:names:tc:opendocument:xmlns:text:1.0}toc-mark

0701.odt.zip

Line breaks

The odt2md conversion puts additional line breaks (see enclosed files, after мольба́ми, and after ꙗ҆́кѡ да).
Is this a bug or a feature?

testconv.zip
testconv.odt.zip

Нужны примеры разнообразных сносок

@typiconman Пока odt2md обрабатывает только классические сноски:

Сноска содержит уникальный символ, идентифицирующий сноску. Тело сноски может иметь несколько параграфов. Ссылка изображается суперскриптом. Такие ссылки
поддерживаются в Common Markdown и в ODT представлены явно через элемент note.

Нужно понять как в ODT кодируются ЦСЯ сноски (сноски с кавыками). И надо придумать и обкатать Markdown синтаксис для них.

Пока вот такое примерно определение ЦСЯ сноски:

  1. тело не может никогда содержать нескольких параграфов
  2. сноска уникально идентифицируется количеством кавык.
  3. "нумерация" сносок уникальна на странице. Другая страница начинает нумерацию снова

Пункт 3 весьма проблематичен для HTML и MD, так как страниц тут нет...

Хотелось бы набрать побольше примеров ЦСЯ сносок, чтобы подтвердить 1-2-3 и
придумать как их изящно закодировать...

File conversion error

I get a strange error when I try to convert the attached file:

WARNING:root:Undefined style name: 'C1'
WARNING:root:Undefined style name: 'lesson'
WARNING:root:Undefined style name: 'Standard'
Traceback (most recent call last):
  File "/home/sasha/.local/bin/odt2md", line 11, in <module>
    sys.exit(main())
  File "/home/sasha/.local/lib/python3.6/site-packages/odt2md/odt2md.py", line 52, in main
    odt2md(args.input, args.output, profile=args.profile, max_line_width=args.max_line_width)
  File "/home/sasha/.local/lib/python3.6/site-packages/odt2md/odt2md.py", line 19, in odt2md
    markdown_text = styler.format_md(blocks)
  File "/home/sasha/.local/lib/python3.6/site-packages/odt2md/styler.py", line 137, in format_md
    for b in blocks:
  File "/home/sasha/.local/lib/python3.6/site-packages/odt2md/block.py", line 224, in parse_odt
    yield from parse_blocks(para)
  File "/home/sasha/.local/lib/python3.6/site-packages/odt2md/block.py", line 46, in parse_blocks
    yield from extract_spans(ev.scan(para), para_style)
  File "/home/sasha/.local/lib/python3.6/site-packages/odt2md/block.py", line 98, in extract_spans
    assert False, e['tag']
AssertionError: {urn:oasis:names:tc:opendocument:xmlns:text:1.0}bookmark-start

Something having to do with bookmarks?
out.odt.zip

odt2md ignores sections

Any text or images placed in a section is ignored.
The workaround for that is to get rid of sections in the ODT document: (in LibreOffice) Format -> Sections -> Remove.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.