Giter Club home page Giter Club logo

Comments (4)

pseudotensor avatar pseudotensor commented on June 11, 2024 1

Interesting, thanks. I'll see if I can understand.

from h2ogpt.

pseudotensor avatar pseudotensor commented on June 11, 2024

Looks like unstructured barfs:

Current thread 0x000015cc (most recent call first):
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\site-packages\langchain_community\document_loaders\pdf.py", line 57 in _get_elements
  File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\site-packages\langchain_community\document_loaders\unstructured.py", line 87 in load

Can you pip install an older version of unstructured or see if any other changes help?

I also see:

Current thread 0x000015cc (most recent call first):
  File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\ctypes\__init__.py", line 374 in __init__
  File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\site-packages\magic\loader.py", line 44 in load_lib
  File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\site-packages\magic\__init__.py", line 209 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\site-packages\unstructured\file_utils\filetype.py", line 25 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\site-packages\unstructured\partition\pdf.py", line 57 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\site-packages\langchain_community\document_loaders\pdf.py", line 57 in _get_elements
  File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\site-packages\langchain_community\document_loaders\unstructured.py", line 87 in load
  File "C:\Users\Administrator\h2ogpt\src\gpt_langchain.py", line 3212 in file_to_doc

Maybe there is crash due to multiple threads trying to do some imports or access some libraries. Known python bugs. Maybe can move imports earlier to avoid such races.

from h2ogpt.

pseudotensor avatar pseudotensor commented on June 11, 2024

E.g. you can add these to top of gpt_langchain.py

import magic
from unstructured.partition.pdf import partition_pdf

Let me know if that helps, and I can move some imports outside local scopes.

from h2ogpt.

isaac-aburto avatar isaac-aburto commented on June 11, 2024

I tried with an older version of unstructured, but doesn't work. Same when trying to change some imports in the code.

Thanks to your response I decided to review certain libraries, especially the magic.

The error Windows fatal exception: access violation, apparently happen in the file: C:\ProgramData\miniconda3\envs\h2ogpt\Lib\site-packages\magic\loader.py

def _lib_candidates():

  yield find_library('magic')
  #print("sys.platform: ", sys.platform)

  if sys.platform == "darwin":

    paths = [
      '/opt/local/lib',
      '/usr/local/lib',
      '/opt/homebrew/lib',
    ] + glob.glob('/usr/local/Cellar/libmagic/*/lib')

    for i in paths:
      yield os.path.join(i, 'libmagic.dylib')

  elif sys.platform in ("win32", "cygwin"):

    #prefixes = ['msys-magic-1', 'libmagic', 'magic1', 'cygmagic-1', 'libmagic-1']
    prefixes = ['libmagic']

    for i in prefixes:
      # find_library searches in %PATH% but not the current directory,
      # so look for both
      yield './%s.dll' % (i,)
      yield find_library(i)

The code was trying to get into these dll files, but they did not exist in the folder. What I did was move the file located at: C:\ProgramData\miniconda3\envs\h2ogpt\Lib\site-packages\magic\libmagic\libmagic.dll to C:\ProgramData\miniconda3\envs\h2ogpt\Library\usr\bin and commented the list of files that could not be found.

I don't know if it's the best solution, but it's the only one that has helped me.

from h2ogpt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.