Giter Club home page Giter Club logo

Comments (8)

D1firehail avatar D1firehail commented on May 13, 2024 1

Hey, was asked to help someone trying to use your project who were getting the same error. Below is the reply I gave them, which includes the likely cause.

https://github.com/StanGirard/quiver/blob/adbb41eb40f20fc264dbd68df2079649518e381d/loaders/common.py#L14
https://github.com/StanGirard/quiver/blob/adbb41eb40f20fc264dbd68df2079649518e381d/loaders/common.py#L20
https://github.com/StanGirard/quiver/blob/adbb41eb40f20fc264dbd68df2079649518e381d/utils.py#L4

Looks like they create a temp file, then pass its file name to a function that tries to open it.

https://docs.python.org/3.9/library/tempfile.html#tempfile.NamedTemporaryFile

Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows)

(and I knew what to look for thanks to https://stackoverflow.com/questions/23212435/permission-denied-to-write-to-my-temporary-file)

from quivr.

seaberry0620 avatar seaberry0620 commented on May 13, 2024 1

This worked for me:

import os
import tempfile
import time
from utils import compute_sha1_from_file
from langchain.schema import Document
import streamlit as st
from langchain.text_splitter import RecursiveCharacterTextSplitter

def process_file(vector_store, file, loader_class, file_suffix):
    documents = []
    file_sha = ""
    file_name = file.name
    file_size = file.size
    dateshort = time.strftime("%Y%m%d")

    # Create a temporary file using mkstemp
    fd, tmp_file_name = tempfile.mkstemp(suffix=file_suffix)

    with os.fdopen(fd, 'wb') as tmp_file:
        tmp_file.write(file.getvalue())

    loader = loader_class(tmp_file_name)
    documents = loader.load()
    file_sha1 = compute_sha1_from_file(tmp_file_name)

    chunk_size = st.session_state['chunk_size']
    chunk_overlap = st.session_state['chunk_overlap']

    text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=chunk_size, chunk_overlap=chunk_overlap)

    documents = text_splitter.split_documents(documents)

    # Add the document sha1 as metadata to each document
    docs_with_metadata = [Document(page_content=doc.page_content, metadata={"file_sha1": file_sha1,"file_size":file_size ,"file_name": file_name, "chunk_size": chunk_size, "chunk_overlap": chunk_overlap, "date": dateshort}) for doc in documents]

    vector_store.add_documents(docs_with_metadata)

    # Don't forget to remove the temporary file when you're done with it
    os.remove(tmp_file_name)

    return

This version of common.py should avoid the permission issue you were encountering on Windows.

from quivr.

StanGirard avatar StanGirard commented on May 13, 2024

Ouch something about windows probably 😬

Where did you install quiver and do you have access to the D folder mentioned ?

from quivr.

Klaudioz avatar Klaudioz commented on May 13, 2024

image

I think I followed all the instructions but once the streamlit runs I drag a PDF and when a click on Add to Database, this error is shown. Any idea?

THANK YOU !!!

I can see three letters drives in your answer. Probably that's the issue.
When you upload a file, it's going to a folder in the app, and after it is uploaded as embeddings, it's deleted. I don't know why this "duplication" is needed.

from quivr.

pepeto avatar pepeto commented on May 13, 2024

This is what is shown in the console:

2023-05-13 18:19:16.063 Uncaught app exception
Traceback (most recent call last):
File "M:\Working- ENVS\Python3.10B\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script
exec(code, module.dict)
File "N:- GoogleDrive USAL\Working\PYTHON\quiver-main\main.py", line 57, in
file_uploader(supabase, openai_api_key, vector_store)
File "n:- GoogleDrive USAL\Working\PYTHON\quiver-main\files.py", line 37, in file_uploader
file_processors[file_extension](vector_store, file)
File "n:- GoogleDrive USAL\Working\PYTHON\quiver-main\loaders\pdf.py", line 6, in process_pdf
return process_file(vector_store, file, PyPDFLoader, ".pdf")
File "n:- GoogleDrive USAL\Working\PYTHON\quiver-main\loaders\common.py", line 19, in process_file
documents = loader.load()
File "M:\Working- ENVS\Python3.10B\lib\site-packages\langchain\document_loaders\pdf.py", line 113, in load
return list(self.lazy_load())
File "M:\Working- ENVS\Python3.10B\lib\site-packages\langchain\document_loaders\pdf.py", line 120, in lazy_load
yield from self.parser.parse(blob)
File "M:\Working- ENVS\Python3.10B\lib\site-packages\langchain\document_loaders\base.py", line 87, in parse
return list(self.lazy_parse(blob))
File "M:\Working- ENVS\Python3.10B\lib\site-packages\langchain\document_loaders\parsers\pdf.py", line 16, in lazy_parse
with blob.as_bytes_io() as pdf_file_obj:
File "C:\Program Files\Python310\lib\contextlib.py", line 135, in enter
return next(self.gen)
File "M:\Working- ENVS\Python3.10B\lib\site-packages\langchain\document_loaders\blob_loaders\schema.py", line 86, in as_bytes_io
with open(str(self.path), "rb") as f:
PermissionError: [Errno 13] Permission denied: 'D:\TEMP\tmpim3u4796.pdf'

D:\TEMP has no problem with permissions, it's the temporary directory of the system, all programs and users have permission.

from quivr.

pepeto avatar pepeto commented on May 13, 2024

Hey, was asked to help someone trying to use your project who were getting the same error. Below is the reply I gave them, which includes the likely cause.

https://github.com/StanGirard/quiver/blob/adbb41eb40f20fc264dbd68df2079649518e381d/loaders/common.py#L14

https://github.com/StanGirard/quiver/blob/adbb41eb40f20fc264dbd68df2079649518e381d/loaders/common.py#L20

https://github.com/StanGirard/quiver/blob/adbb41eb40f20fc264dbd68df2079649518e381d/utils.py#L4

Looks like they create a temp file, then pass its file name to a function that tries to open it.

https://docs.python.org/3.9/library/tempfile.html#tempfile.NamedTemporaryFile

Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows)

(and I knew what to look for thanks to https://stackoverflow.com/questions/23212435/permission-denied-to-write-to-my-temporary-file)

That looks exactly like the problem I have. Any idea of how catch the error?

from quivr.

adamengberg avatar adamengberg commented on May 13, 2024

I have the same problem, on Windows as well.

from quivr.

adamengberg avatar adamengberg commented on May 13, 2024

I encountered a PermissionError when trying to open a temporary file on a Windows platform. The issue originates from this block of code in common.py:

with tempfile.NamedTemporaryFile(delete=True, suffix=file_suffix) as tmp_file:
    tmp_file.write(file.getvalue())
    tmp_file.flush()

    loader = loader_class(tmp_file.name)
    documents = loader.load()
    file_sha1 = compute_sha1_from_file(tmp_file.name)

The PermissionError arises because tempfile.NamedTemporaryFile() opens a temporary file that cannot be opened again on Windows platforms while it's still open. This is due to the way Windows handles temporary files differently than Unix-based systems.

To resolve this issue, I modified the code to use tempfile.mkstemp() instead, which creates a temporary file in a more reliable manner across different platforms than tempfile.NamedTemporaryFile(). Importantly, it also ensures that the temporary file is closed before trying to open it again.

Here's the modified block of code:

# Create a temporary file using `tempfile.mkstemp`.
tmp_fd, tmp_file_name = tempfile.mkstemp(suffix=file_suffix)

try:
    # Write to the temporary file.
    with os.fdopen(tmp_fd, 'wb') as tmp_file:
        tmp_file.write(file.getvalue())
        tmp_file.flush()

    # Now you can pass the temporary file's name to `loader_class` and `compute_sha1_from_file`.
    loader = loader_class(tmp_file_name)
    documents = loader.load()
    file_sha1 = compute_sha1_from_file(tmp_file_name)
    
finally:
    # Clean up the temporary file.
    if os.path.exists(tmp_file_name):
        os.remove(tmp_file_name)

from quivr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.