Giter Club home page Giter Club logo

Comments (4)

chungkwong avatar chungkwong commented on June 25, 2024

大佬好,想咨询下数学公式识别用什么技术可以实现

本项目已经明显过时,近年的进展可以参考我的一篇博客https://chungkwong.cc/crohme.html,以及历届CROHME竞赛报告及引用它们的论文。简单来说,现在主流的方法把它建模为图片转文本问题(图片到latex token序列),然后用基于编码器-解码器结构的人工神经网络解决,其中编码器通常用基于CNN(特别是DenseNet的变种)或ViT的backbone network,而解码器一般为自回归的RNN(另加某种注意力机制)或Transformer decoder。要提高准确率,关键在于数据增广。

from mathocr.

yaphet266 avatar yaphet266 commented on June 25, 2024

感谢回复,那篇文章我看了,相关的论文哪里能看到,讯飞是采用哪种技术路线的

from mathocr.

chungkwong avatar chungkwong commented on June 25, 2024

感谢回复,那篇文章我看了,相关的论文哪里能看到,讯飞是采用哪种技术路线的

根据另一个公式识别竞赛ICDAR 2023 Competition on Recognition of Multi-line Handwritten Mathematical Expressions的论文中对科大讯飞的冠军系统描述为“iFLYTEK-OCR team uses an encoder-decoder architecture that formulates HMER as an image-to-sequence translation problem. Specifically, the Conv2Former is employed as the image encoder, and a bi-directional trained Transformer decoder with Attention Refinement Module is utilized as the latex sequence decoder. A Beam Search Ensemble is proposed to ensemble the models trained with different sizes of characters. Specifically, at each decoding step, probability distributions produced by all member models are averaged by certain weights, and the top-k candidate characters to be output are decided by the averaged probability distribution. As for the data augmentation, blur, random, color jitter, scale, and TIA Transform are applied to improve the generalization ability of the model.”。科大讯飞在过去几年发表了不少数学公式识别方面的论文,很容易搜索到。数学公式识别的论文一般都会引用CROHME 2016的论文,顺藤摸瓜就能找到这个领域的重要论文

from mathocr.

yaphet266 avatar yaphet266 commented on June 25, 2024

收到,太感谢啦

from mathocr.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.