Comments (4)
Interesting, thanks. I'll see if I can understand.
from h2ogpt.
Looks like unstructured barfs:
Current thread 0x000015cc (most recent call first):
File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\site-packages\langchain_community\document_loaders\pdf.py", line 57 in _get_elements
File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\site-packages\langchain_community\document_loaders\unstructured.py", line 87 in load
Can you pip install an older version of unstructured or see if any other changes help?
I also see:
Current thread 0x000015cc (most recent call first):
File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\ctypes\__init__.py", line 374 in __init__
File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\site-packages\magic\loader.py", line 44 in load_lib
File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\site-packages\magic\__init__.py", line 209 in <module>
File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 883 in exec_module
File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\site-packages\unstructured\file_utils\filetype.py", line 25 in <module>
File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 883 in exec_module
File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\site-packages\unstructured\partition\pdf.py", line 57 in <module>
File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 883 in exec_module
File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\site-packages\langchain_community\document_loaders\pdf.py", line 57 in _get_elements
File "C:\ProgramData\miniconda3\envs\h2ogpt\lib\site-packages\langchain_community\document_loaders\unstructured.py", line 87 in load
File "C:\Users\Administrator\h2ogpt\src\gpt_langchain.py", line 3212 in file_to_doc
Maybe there is crash due to multiple threads trying to do some imports or access some libraries. Known python bugs. Maybe can move imports earlier to avoid such races.
from h2ogpt.
E.g. you can add these to top of gpt_langchain.py
import magic
from unstructured.partition.pdf import partition_pdf
Let me know if that helps, and I can move some imports outside local scopes.
from h2ogpt.
I tried with an older version of unstructured, but doesn't work. Same when trying to change some imports in the code.
Thanks to your response I decided to review certain libraries, especially the magic.
The error Windows fatal exception: access violation, apparently happen in the file: C:\ProgramData\miniconda3\envs\h2ogpt\Lib\site-packages\magic\loader.py
def _lib_candidates():
yield find_library('magic')
#print("sys.platform: ", sys.platform)
if sys.platform == "darwin":
paths = [
'/opt/local/lib',
'/usr/local/lib',
'/opt/homebrew/lib',
] + glob.glob('/usr/local/Cellar/libmagic/*/lib')
for i in paths:
yield os.path.join(i, 'libmagic.dylib')
elif sys.platform in ("win32", "cygwin"):
#prefixes = ['msys-magic-1', 'libmagic', 'magic1', 'cygmagic-1', 'libmagic-1']
prefixes = ['libmagic']
for i in prefixes:
# find_library searches in %PATH% but not the current directory,
# so look for both
yield './%s.dll' % (i,)
yield find_library(i)
The code was trying to get into these dll files, but they did not exist in the folder. What I did was move the file located at: C:\ProgramData\miniconda3\envs\h2ogpt\Lib\site-packages\magic\libmagic\libmagic.dll
to C:\ProgramData\miniconda3\envs\h2ogpt\Library\usr\bin
and commented the list of files that could not be found.
I don't know if it's the best solution, but it's the only one that has helped me.
from h2ogpt.
Related Issues (20)
- ValueError: load_in_8bit must be a boolean HOT 5
- Question: correct prompts template for llama3-instruct HOT 9
- httpx.ConnectError with --openai_server=True --ssl-verify=False HOT 12
- h2ogpt on ubuntu server HOT 3
- branding capitalization HOT 1
- Support for https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual HOT 3
- OCR issue HOT 1
- shared / personal collections HOT 1
- Failed to initial linux full script intallation HOT 2
- random assertion errors due to evaluate_nochat HOT 13
- Run docker image on any machine which haven't internet connection HOT 19
- h2ogpt vllm-check init-container stuck when istio injection
- GPU offloading mistralai_mistral-7b-instruct-v0.2 HOT 3
- Windows fatal exception: Access violation HOT 3
- Failed to load models HOT 2
- TimeoutError: answer_question_using_context timed out, took more than 60s
- doctr for scanned pdf HOT 6
- pytorch_model.bin 1.34G download hangs forever on Linux HOT 7
- umbrella podSecurityContext null values are always overwritten by sub-chart default values
- [Question] how model learn data from new document ? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from h2ogpt.