Giter Club home page Giter Club logo

Comments (4)

jerrylususu avatar jerrylususu commented on May 16, 2024 2

感谢指路!尝试加载其他文本正常。排查后发现可能是因为 ChatGLM 的 README 顶部的 HTML 链接导致的,删除以下文本即可:

<p align="center">
   🌐 <a href="https://chatglm.cn/blog" target="_blank">Blog</a> • 🤗 <a href="https://huggingface.co/THUDM/chatglm-6b" target="_blank">HF Repo</a> • 🐦 <a href="https://twitter.com/thukeg" target="_blank">Twitter</a> • 📃 <a href="https://arxiv.org/abs/2103.10360" target="_blank">[GLM@ACL 22]</a> <a href="https://github.com/THUDM/GLM" target="_blank">[GitHub]</a> • 📃 <a href="https://arxiv.org/abs/2210.02414" target="_blank">[GLM-130B@ICLR 23]</a> <a href="https://github.com/THUDM/GLM-130B" target="_blank">[GitHub]</a> <br>
</p>

此外直接以 rb 模式打开也会有问题(依然会乱码),但如果先指定 encoding 得到文件 IO 对象再打开似乎就没事了:

from langchain.document_loaders import UnstructuredFileIOLoader

with open(filepath, "r", encoding="utf8") as f:
    loader = UnstructuredFileIOLoader(file=f, mode="elements")
    docs = loader.load()

from langchain-chatchat.

imClumsyPanda avatar imClumsyPanda commented on May 16, 2024

请问是把readme下载至本地后,使用UnstructuredFileLoader加载的吗?

from langchain-chatchat.

jerrylususu avatar jerrylususu commented on May 16, 2024

请问是把readme下载至本地后,使用UnstructuredFileLoader加载的吗?

是的。因本地 GPU 显存不足,我在 AutoDL 平台上的云虚拟机中进行操作。已确认下载后文件为 UTF8 编码。相关库版本信息:

langchain 0.0.128
transformers 4.26.1
unstructued 0.5.8

from langchain-chatchat.

imClumsyPanda avatar imClumsyPanda commented on May 16, 2024

from langchain-chatchat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.