Giter Club home page Giter Club logo

recipes's Introduction

Welcome to Weaviate Recipes ๐Ÿ’š

Weaviate logo

This repository covers end-to-end examples of the various features and integrations with Weaviate.

Category Description
Integrations Notebooks showing you how to use Weaviate plus another technology
Weaviate Features Notebooks covering vector, hybrid and generative search, reranking, multi-tenancy and more

Integrations ๐ŸŒ

Company Category Companies
Cloud Hyperscalers Google, AWS, NVIDIA
Compute Infrastructure Modal, Replicate
Data Platforms Confluent, Spark, Unstructured, Firecrawl
LLM Frameworks DSPy, LangChain, LlamaIndex, Semantic Kernel, Ollama
Observability and Evaluation Arize, Langtrace, LangWatch, Nomic, Ragas, Weights & Biases

Weaviate Features ๐Ÿ”ง

Feature Description
Similarity Search Use Weaviate's nearText operator to run semantic search queries (broken out by model provider)
Hybrid Search Use Weaviate's hybrid operator to run hybrid search queries (broken out by model provider)
Generative Search Build a simple RAG workflow using Weaviate's .generate (broken out by model provider)
Filters Narrow down your search results by adding filters to your queries
Reranking Add reranking to your pipeline to improve search results (broken out by model provider)
Media Search Use Weaviate's nearImage and nearVideo operator to search using images and videos
Classification Learn how to use KNN and zero-shot classification
Multi-Tenancy Store tenants on separate shards for complete data isolation
Product Quantization Compress vector embeddings and reduce the memory footprint using Weaviate's PQ feature
Evaluation Evaluate your search system
CRUD APIs Learn how to use Weaviate's Create, Read, Update, and Delete APIs
Generative Feedback Loops Write back to your database by storing the language model outputs

Feedback โ“

Please note this is an ongoing project, and updates will be made frequently. If you have a feature you would like to see, please create a GitHub issue or feel free to contribute one yourself!

recipes's People

Contributors

alienmaster avatar cdpierse avatar cshorten avatar damianb-bitflipper avatar dsdatsme avatar dudanogueira avatar ericciarla avatar erika-cardenas avatar fodizoltan avatar fzowl avatar hsm207 avatar iamleonie avatar itsajchan avatar richzw avatar rogeriochaves avatar sebawita avatar sleeper avatar thomashacker avatar victorialslocum avatar weisisheng avatar zainhas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recipes's Issues

WeaviateClient has no attribute 'query'

Hey, I've got here from Connor Shorten's video about llama3 local video.
Hey, when running the integrations/dspy/llms/Llama3.ipynb
I get this error: AttributeError: 'WeaviateClient' object has no attribute 'query'
I have:
dspy-ai = "^2.4.5" langfuse = "^2.27.0" weaviate-client = "^4.5.5"
Installed, I've tried version 3 of weaviate-client, but got nowhere, help please :D

Schema for Ollama.

I am trying to run this notebook (as a script) with Ollama.

After solving the timeout problem, I solved the trace_malloc problem, but I'm left with:

/home/vitor/anaconda3/envs/master/lib/python3.12/site-packages/weaviate/warnings.py:121: DeprecationWarning: Dep005: You are using weaviate-client version 3.26.2. The latest version is 4.4.4.
            Please consider upgrading to the latest version. See https://weaviate.io/developers/weaviate/client-libraries/python for details.
  warnings.warn(
[' In intricate webs of thought they weave,\nNeural networks dance, learning to grieve,\nPatterns unraveled, wisdom they achieve.']
[' In intricate webs of thought they weave,\nNeural networks dance, learning to grieve,\nPatterns unraveled, wisdom they achieve.']
Test Question: What do cross encoders do?
Predicted Answer: predicted_answer
/home/vitor/anaconda3/envs/master/lib/python3.12/site-packages/executing/executing.py:713: DeprecationWarning: ast.Str is deprecated and will be removed in Python 3.14; use ast.Constant instead
  right=ast.Str(s=sentinel),
/home/vitor/anaconda3/envs/master/lib/python3.12/ast.py:587: DeprecationWarning: Attribute s is deprecated and will be removed in Python 3.14; use value instead
  return Constant(*args, **kwargs)
---------------------------------------------------------------------------
UnexpectedStatusCodeException             Traceback (most recent call last)
File ~/Projects/mestrado/RAG-DSPy.py:237
    234 test_example = dspy.Example(question="What do cross encoders do?")
    235 test_pred = dspy.Example(answer="They re-rank documents.")
--> 237 type(llm_metric(test_example, test_pred))
    240 # In[19]:
    243 test_example = dspy.Example(question="What do cross encoders do?")

File ~/Projects/mestrado/RAG-DSPy.py:214, in llm_metric(gold, pred, trace)
    211 overall = f"Please rate how well this answer answers the question, `{question}` based on the context.\n `{predicted_answer}`"
    213 with dspy.context(lm=metricLM):
--> 214     context = dspy.Retrieve(k=5)(question).passages
    215     detail = dspy.ChainOfThought(Assess)(context="N/A", assessed_question=detail, assessed_answer=predicted_answer)
    216     faithful = dspy.ChainOfThought(Assess)(context=context, assessed_question=faithful, assessed_answer=predicted_answer)

File ~/anaconda3/envs/master/lib/python3.12/site-packages/dspy/retrieve/retrieve.py:29, in Retrieve.__call__(self, *args, **kwargs)
     28 def __call__(self, *args, **kwargs):
---> 29     return self.forward(*args, **kwargs)

File ~/anaconda3/envs/master/lib/python3.12/site-packages/dspy/retrieve/retrieve.py:39, in Retrieve.forward(self, query_or_queries)
     33 queries = [query.strip().split('\n')[0].strip() for query in queries]
     36 # print(queries)
     37 # TODO: Consider removing any quote-like markers that surround the query too.
---> 39 passages = dsp.retrieveEnsemble(queries, k=self.k)
     40 return Prediction(passages=passages)

File ~/anaconda3/envs/master/lib/python3.12/site-packages/dsp/primitives/search.py:50, in retrieveEnsemble(queries, k, by_prob)
     47 queries = [q for q in queries if q]
     49 if len(queries) == 1:
---> 50     return retrieve(queries[0], k)
     52 passages = {}
     53 for q in queries:

File ~/anaconda3/envs/master/lib/python3.12/site-packages/dsp/primitives/search.py:9, in retrieve(query, k, **kwargs)
      7 if not dsp.settings.rm:
      8     raise AssertionError("No RM is loaded.")
----> 9 passages = dsp.settings.rm(query, k=k, **kwargs)
     10 passages = [psg.long_text for psg in passages]
     12 if dsp.settings.reranker:

File ~/anaconda3/envs/master/lib/python3.12/site-packages/dspy/retrieve/retrieve.py:29, in Retrieve.__call__(self, *args, **kwargs)
     28 def __call__(self, *args, **kwargs):
---> 29     return self.forward(*args, **kwargs)

File ~/anaconda3/envs/master/lib/python3.12/site-packages/dspy/retrieve/weaviate_rm.py:77, in WeaviateRM.forward(self, query_or_queries, k)
     71 passages = []
     72 for query in queries:
     73     results = self._weaviate_client.query\
     74         .get(self._weaviate_collection_name, ["content"])\
     75         .with_hybrid(query=query)\
     76         .with_limit(k)\
---> 77         .do()
     79     results = results["data"]["Get"][self._weaviate_collection_name]
     80     parsed_results = [result["content"] for result in results]

File ~/anaconda3/envs/master/lib/python3.12/site-packages/weaviate/gql/get.py:1905, in GetBuilder.do(self)
   1903     return results
   1904 else:
-> 1905     return super().do()

File ~/anaconda3/envs/master/lib/python3.12/site-packages/weaviate/gql/filter.py:124, in GraphQL.do(self)
    121 except RequestsConnectionError as conn_err:
    122     raise RequestsConnectionError("Query was not successful.") from conn_err
--> 124 res = _decode_json_response_dict(response, "Query was not successful")
    125 assert res is not None
    126 return res

File ~/anaconda3/envs/master/lib/python3.12/site-packages/weaviate/util.py:798, in _decode_json_response_dict(response, location)
    795     except JSONDecodeError:
    796         raise ResponseCannotBeDecodedException(location, response)
--> 798 raise UnexpectedStatusCodeException(location, response)

UnexpectedStatusCodeException: Query was not successful! Unexpected status code: 422, with response body: {'error': [{'message': 'no graphql provider present, this is most likely because no schema is present. Import a schema first!'}]}.
/home/vitor/anaconda3/envs/master/lib/python3.12/site-packages/IPython/core/interactiveshell.py:1423: ResourceWarning: unclosed file <_io.TextIOWrapper name='faq.md' mode='r' encoding='UTF-8'>
  del ns[k]
Object allocated at (most recent call last):
  File "/home/vitor/anaconda3/envs/master/lib/python3.12/site-packages/IPython/core/interactiveshell.py", lineno 310
    return io_open(file, *args, **kwargs)

even after creating the schema as is:

weaviate_client = weaviate.Client("http://localhost:8080")
schema = {
   "classes": [
       {
           "class": "WeaviateBlogChunk",
           "description": "A snippet from a Weaviate blogpost.",
           "moduleConfig": {
               "text2vec-openai": {
                    "skip": False,
                    "vectorizeClassName": False,
                    "vectorizePropertyName": False
                },
                "generative-openai": {
                    "model": "gpt-3.5-turbo"
                }
           },
           "vectorIndexType": "hnsw",
           "vectorizer": "text2vec-openai",
           "properties": [
               {
                   "name": "content",
                   "dataType": ["text"],
                   "description": "The text content of the podcast clip",
                   "moduleConfig": {
                    "text2vec-transformers": {
                        "skip": False,
                        "vectorizePropertyName": False,
                        "vectorizeClassName": False
                    }
                   }
               },
               {
                "name": "author",
                "dataType": ["text"],
                "description": "The author of the blog post.",
                "moduleConfig": {
                    "text2vec-openai": {
                        "skip": True,
                        "vectorizePropertyName": False,
                        "vectorizeClassName": False
                    }
                }
               }
           ]
       }      
   ]
}
    
weaviate_client.schema.create(schema)

What schema should I change it to so that it runs?
Would this require a change in the docker-compose.yml?

Unable to follow (DSPY + WEAVIATE + OLLAMA) YT video

Hi,

I did fail to get this running for a while.

Service

docker run -p 8181:8080 -p 50051:50051 cr.weaviate.io/semitechnologies/weaviate:1.24.10
{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-05-03T15:53:46Z"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2024-05-03T15:53:46Z"}
{"level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"2024-05-03T15:53:46Z"}
{"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50051","time":"2024-05-03T15:53:46Z"}
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2024-05-03T15:53:46Z"}
{"action":"telemetry_push","level":"info","msg":"telemetry started","payload":"\u0026{MachineID:e7f2e3d0-cafe-40b9-b3b1-6df88f74a27a Type:INIT Version:1.24.10 Modules: NumObjects:0 OS:linux Arch:amd64}","time":"2024-05-03T15:53:47Z"}

Client

#dspy-ai==2.4.9  
#weaviate-client==4.5.7

import dspy
llama3_ollama = dspy.OllamaLocal(model="llama3:8b-instruct-q6_K", max_tokens=4000, timeout_s=480)

import weaviate
from dspy.retrieve.weaviate_rm import WeaviateRM
weaviate_client = weaviate.connect_to_local(port=8181)
retriever_model = WeaviateRM("WeaviateBlogChunk", weaviate_client=weaviate_client, k=10)

dspy.settings.configure(lm=llama3_ollama, rm=retriever_model)
llama3_ollama("say hello")
["Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?"]

Then, following the notebook:

import json

file_path = './WeaviateBlogRAG-0-0-0.json'
with open(file_path, 'r') as file:
    dataset = json.load(file)

gold_answers = []
queries = []

for row in dataset:
    gold_answers.append(row["gold_answer"])
    queries.append(row["query"])
    
data = []

for i in range(len(gold_answers)):
    data.append(dspy.Example(gold_answer=gold_answers[i], question=queries[i]).with_inputs("question"))

trainset, devset, testset = data[:25], data[25:35], data[35:]

class TypedEvaluator(dspy.Signature):
    """Evaluate the quality of a system's answer to a question according to a given criterion."""
    
    criterion: str = dspy.InputField(desc="The evaluation criterion.")
    question: str = dspy.InputField(desc="The question asked to the system.")
    ground_truth_answer: str = dspy.InputField(desc="An expert written Ground Truth Answer to the question.")
    predicted_answer: str = dspy.InputField(desc="The system's answer to the question.")
    rating: float = dspy.OutputField(desc="A float rating between 1 and 5. IMPORTANT!! ONLY OUTPUT THE RATING!!")

def MetricWrapper(gold, pred, trace=None):
    alignment_criterion = "How aligned is the predicted_answer with the ground_truth?"
    return dspy.TypedPredictor(TypedEvaluator)(criterion=alignment_criterion,
                                          question=gold.question,
                                          ground_truth_answer=gold.gold_answer,
                                          predicted_answer=pred.answer).rating

class GenerateAnswer(dspy.Signature):
    """Assess the the context and answer the question."""

    context = dspy.InputField(desc="Helpful information for answering the question.")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="A detailed answer that is supported by the context. ONLY OUTPUT THE ANSWER!!")
    
class RAG(dspy.Module):
    def __init__(self, k=3):
        super().__init__()
        
        self.retrieve = dspy.Retrieve(k=k)
        self.generate_answer = dspy.Predict(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        pred = self.generate_answer(context=context, question=question).answer
        return dspy.Prediction(context=context, answer=pred, question=question)

And finally, run:

print(RAG()("What is binary quantization?").answer)

---------------------------------------------------------------------------
_InactiveRpcError                         Traceback (most recent call last)
File ~/projects/sandbox/dspy-ollama/.venv/lib/python3.10/site-packages/weaviate/collections/grpc/query.py:609, in _QueryGRPC.__call(self, request)
    608 res: search_get_pb2.SearchReply  # According to PEP-0526
--> 609 res, _ = self._connection.grpc_stub.Search.with_call(
    610     request,
    611     metadata=self._connection.grpc_headers(),
    612     timeout=self._connection.timeout_config.query,
    613 )
    615 return res

File ~/projects/sandbox/dspy-ollama/.venv/lib/python3.10/site-packages/grpc/_channel.py:1198, in _UnaryUnaryMultiCallable.with_call(self, request, timeout, metadata, credentials, wait_for_ready, compression)
   1192 (
   1193     state,
   1194     call,
   1195 ) = self._blocking(
   1196     request, timeout, metadata, credentials, wait_for_ready, compression
   1197 )
-> 1198 return _end_unary_response_blocking(state, call, True, None)

File ~/projects/sandbox/dspy-ollama/.venv/lib/python3.10/site-packages/grpc/_channel.py:1006, in _end_unary_response_blocking(state, call, with_call, deadline)
   1005 else:
-> 1006     raise _InactiveRpcError(state)

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "extract params: no such class with name 'WeaviateBlogChunk' found in the schema. Check your schema files for which classes are available"
	debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"extract params: no such class with name \'WeaviateBlogChunk\' found in the schema. Check your schema files for which classes are available", grpc_status:2, created_time:"2024-05-03T18:50:50.803414318+03:00"}"
>

During handling of the above exception, another exception occurred:

WeaviateQueryError                        Traceback (most recent call last)
Cell In[7], line 1
----> 1 print(RAG()("What is binary quantization?").answer)

File ~/projects/sandbox/dspy-ollama/.venv/lib/python3.10/site-packages/dspy/primitives/program.py:26, in Module.__call__(self, *args, **kwargs)
     25 def __call__(self, *args, **kwargs):
---> 26     return self.forward(*args, **kwargs)

Cell In[6], line 16, in RAG.forward(self, question)
     15 def forward(self, question):
---> 16     context = self.retrieve(question).passages
     17     pred = self.generate_answer(context=context, question=question).answer
     18     return dspy.Prediction(context=context, answer=pred, question=question)

File ~/projects/sandbox/dspy-ollama/.venv/lib/python3.10/site-packages/dspy/retrieve/retrieve.py:30, in Retrieve.__call__(self, *args, **kwargs)
     29 def __call__(self, *args, **kwargs):
---> 30     return self.forward(*args, **kwargs)

File ~/projects/sandbox/dspy-ollama/.venv/lib/python3.10/site-packages/dspy/retrieve/retrieve.py:39, in Retrieve.forward(self, query_or_queries, k, **kwargs)
     36 # print(queries)
     37 # TODO: Consider removing any quote-like markers that surround the query too.
     38 k = k if k is not None else self.k
---> 39 passages = dsp.retrieveEnsemble(queries, k=k,**kwargs)
     40 return Prediction(passages=passages)

File ~/projects/sandbox/dspy-ollama/.venv/lib/python3.10/site-packages/dsp/primitives/search.py:57, in retrieveEnsemble(queries, k, by_prob, **kwargs)
     54 queries = [q for q in queries if q]
     56 if len(queries) == 1:
---> 57     return retrieve(queries[0], k, **kwargs)
     59 passages = {}
     60 for q in queries:

File ~/projects/sandbox/dspy-ollama/.venv/lib/python3.10/site-packages/dsp/primitives/search.py:12, in retrieve(query, k, **kwargs)
     10 if not dsp.settings.rm:
     11     raise AssertionError("No RM is loaded.")
---> 12 passages = dsp.settings.rm(query, k=k, **kwargs)
     13 if not isinstance(passages, Iterable):
     14     # it's not an iterable yet; make it one.
     15     # TODO: we should unify the type signatures of dspy.Retriever
     16     passages = [passages]

File ~/projects/sandbox/dspy-ollama/.venv/lib/python3.10/site-packages/dspy/retrieve/retrieve.py:30, in Retrieve.__call__(self, *args, **kwargs)
     29 def __call__(self, *args, **kwargs):
---> 30     return self.forward(*args, **kwargs)

File ~/projects/sandbox/dspy-ollama/.venv/lib/python3.10/site-packages/dspy/retrieve/weaviate_rm.py:81, in WeaviateRM.forward(self, query_or_queries, k)
     79 for query in queries:
     80     if self._client_type == "WeaviateClient":
---> 81         results = self._weaviate_client.collections.get(self._weaviate_collection_name).query.hybrid(
     82             query=query,
     83             limit=k,
     84         )
     86         parsed_results = [result.properties[self._weaviate_collection_text_key] for result in results.objects]
     88     elif self._client_type == "Client":

File ~/projects/sandbox/dspy-ollama/.venv/lib/python3.10/site-packages/weaviate/collections/queries/hybrid/query.py:84, in _HybridQuery.hybrid(self, query, alpha, vector, query_properties, fusion_type, limit, offset, auto_limit, filters, rerank, target_vector, include_vector, return_metadata, return_properties, return_references)
     19 def hybrid(
     20     self,
     21     query: Optional[str],
   (...)
     36     return_references: Optional[ReturnReferences[TReferences]] = None,
     37 ) -> QueryReturnType[Properties, References, TProperties, TReferences]:
     38     """Search for objects in this collection using the hybrid algorithm blending keyword-based BM25 and vector-based similarity.
     39 
     40     See the [docs](https://weaviate.io/developers/weaviate/search/hybrid) for a more detailed explanation.
   (...)
     82             If the network connection to Weaviate fails.
     83     """
---> 84     res = self._query.hybrid(
     85         query=query,
     86         alpha=alpha,
     87         vector=vector,
     88         properties=query_properties,
     89         fusion_type=fusion_type,
     90         limit=limit,
     91         offset=offset,
     92         autocut=auto_limit,
     93         filters=filters,
     94         rerank=rerank,
     95         target_vector=target_vector,
     96         return_metadata=self._parse_return_metadata(return_metadata, include_vector),
     97         return_properties=self._parse_return_properties(return_properties),
     98         return_references=self._parse_return_references(return_references),
     99     )
    100     return self._result_to_query_return(
    101         res,
    102         _QueryOptions.from_input(
   (...)
    111         return_references,
    112     )

File ~/projects/sandbox/dspy-ollama/.venv/lib/python3.10/site-packages/weaviate/collections/grpc/query.py:207, in _QueryGRPC.hybrid(self, query, alpha, vector, properties, fusion_type, limit, offset, autocut, filters, return_metadata, return_properties, return_references, generative, rerank, target_vector)
    172 hybrid_search = (
    173     search_get_pb2.Hybrid(
    174         properties=properties,
   (...)
    191     else None
    192 )
    194 request = self.__create_request(
    195     limit=limit,
    196     offset=offset,
   (...)
    204     hybrid_search=hybrid_search,
    205 )
--> 207 return self.__call(request)

File ~/projects/sandbox/dspy-ollama/.venv/lib/python3.10/site-packages/weaviate/collections/grpc/query.py:618, in _QueryGRPC.__call(self, request)
    615     return res
    617 except grpc.RpcError as e:
--> 618     raise WeaviateQueryError(e.details(), "GRPC search")

WeaviateQueryError: Query call with protocol GRPC search failed with message extract params: no such class with name 'WeaviateBlogChunk' found in the schema. Check your schema files for which classes are available.

vector embeddings are not stored in weaviate

Hi,

I tried your example in integrations/llamaindex and it seems to work based on the fact that a response is formulated from the query. However, I seem to have some trouble understanding what's happening under the hood because I see some unexpected behavior:

  1. When I list al the documents in the weaviate DB using http://localhost:8080/v1/objects?class=BlogPost, it lists the content, but there is no property vector containing the embeddings. However, my OpenAI API usage breakdown lists:
    • text-embedding-ada-002-v2, 4 requests 18,266 prompt + 0 completion = 18,266 tokens
    • (I used other markdown files than the example, so the actual numbers may be different when using the blog posts)
  2. When I use the index as a simple retriever index.as_retriever() and then retriever.retrieve("<some query>") I get results, but the listed score is None, which implies that there was no distance function used. This may be consistent with the fact that there are no vectors stored in weaviate. So under the hood, some other approximation was used when we expect vector based proximity?

How can I change your sample code in integrations/llamaindex to:

  1. actually store the OpenAI ada v2 embeddings in weaviate?
  2. actually use these embeddings when retrieving/querying?

Thank you,
Peter

Notebook 4 instantiates OpenTelemetry Resource incorrectly.

This recipe has been super useful for setting Phoenix for me. Thanks for putting me onto it.

I have however noticed, when digging around the OpenTelemetry Python docs, that Resource is being instantiated with the constructor in your 4th notebook. According to the docs: https://opentelemetry-python.readthedocs.io/en/latest/sdk/resources.html , it should be made with .create().

Doing RESOURCE = Resource.create(attributes={}) works just fine.

Thanks again for the notebook.

Error running dspy blogpost receipe.

Hi,

I ran into this error when trying the writing-blogpost-with-dspy example. Which schema should I use to initialize weaviate for this example?

weaviate.exceptions.UnexpectedStatusCodeError: Query was not successful! Unexpected status code: 422, with response body: {'error': [{'message': 'no graphql provider present, this is most likely because no schema is present. Import a schema first!'}]}.

The command I used to run docker image. Docker seems to work fine.

$ sudo docker compose up
[+] Running 1/0
โœ” Container dspy-weaviate-1 Recreated 0.1s
Attaching to dspy-weaviate-1
dspy-weaviate-1 | {"action":"startup","default_vectorizer_module":"text2vec-openai","level":"info","msg":"the default vectorizer modules is set to "text2vec-openai", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-02-27T16:21:40Z"}
dspy-weaviate-1 | {"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to "true"","time":"2024-02-27T16:21:40Z"}
dspy-weaviate-1 | {"action":"init_state.delete_init","level":"error","msg":"disk_space: no such file or directory","time":"2024-02-27T16:21:40Z"}
dspy-weaviate-1 | {"level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"2024-02-27T16:21:40Z"}
dspy-weaviate-1 | {"level":"warning","msg":"Multiple vector spaces are present, GraphQL Explore and REST API list objects endpoint module include params has been disabled as a result.","time":"2024-02-27T16:21:40Z"}
dspy-weaviate-1 | {"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50051","time":"2024-02-27T16:21:40Z"}
dspy-weaviate-1 | {"action":"restapi_management","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2024-02-27T16:21:40Z"}
dspy-weaviate-1 | {"level":"info","msg":"Created shard blogpost_If99mDEIQECO in 2.997278ms","time":"2024-02-27T16:55:30Z"}
dspy-weaviate-1 | {"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-02-27T16:55:30Z","took":239392}

Query call with protocol GRPC search failed with message could not find class WeaviateBlogChunk in schema

I would suggest that this just be made available as some collection file that can be just uploaded. Every DSPy tutorial uses this but only one or two of them mentions that its a separate recipe to create this WeaviateBlogChunk in Weaviate. I suggest that this just be made into an uploadable collection file somehow. Or we have a simpler process for creating the WeaviateBlogChunk collections. I have yet to get it working correctly. If it's so easy why not just have single function that can be called from the other blog posts to create the collection? If its not, than I think its best to inform users they really need to start with that tutorial.

Using DSPy with incomplete questions

Hi,

I have a customer support chatbot. One of the issues I and I am sure many others face is the users' dialogue can fall into 3 categories:

  1. Not relevant: E.g. "Hi Joel, how was your holiday?" Completely unrelated to the question the user might eventually ask.
  2. Partial context: Asks questions that give only part of the required context, E.g. "I have a problem with my Reports." When there are 24 different types of reports and is the problem with creating or outputting the report?
  3. Full context: Has everything required to answer the question. E.g. "I have a issue creating the gardening report, can you help?"

Currently I use GPT4 to classify the initial question and use exemplars for the common partial context, like above, to standardize responses. I use Weaviate as my vector store and have built a Knowledge Graph schema on top, to store all of my previous customer support conversation, including mapping question(s) to eventual answer(s). So can retrieve using Hybrid search and also triples related to entities in the question.

I get the feeling that DSPy could really help the complexity here. Maybe you next video Connor? Enjoying your videos!

wandb.proto.wandb_internal_pb2 error when running dspy MIPRO

I'm trying to run this notebook https://github.com/weaviate/recipes/blob/main/integrations/llm-frameworks/dspy/llms/Llama3.ipynb

It works fine but when I get to trying to run MIPRO

from dspy.teleprompt import MIPRO

import openai
gpt4 = dspy.OpenAI(model="gpt-4o", max_tokens=4000, model_type="chat")

teleprompter = MIPRO(prompt_model=gpt4, 
                     task_model=llama3_ollama, 
                     metric=MetricWrapper, 
                     num_candidates=3, 
                     init_temperature=0.5)
kwargs = dict(num_threads=1, 
              display_progress=True, 
              display_table=0)
MIPRO_compiled_RAG = teleprompter.compile(RAG(), trainset=trainset[:5], num_trials=3, max_bootstrapped_demos=1, max_labeled_demos=0, eval_kwargs=kwargs)

I get this error:

AttributeError: module 'wandb.proto.wandb_internal_pb2' has no attribute 'Result'

I know its a dspy issue but I was wondering if:
a) The issue is with the notebook and not on my end. (Does anyone else encounter the same error)
b) If anyone had any ideas how to get MIPRO working?

TypeError: 'NoneType' object is not iterable

I am using the code from the Getting-Started-with-RAG-in-DSPy.ipynb

test_example = dspy.Example(question="What do cross encoders do?")
test_pred = dspy.Example(answer="They re-rank documents.")

type(llm_metric(test_example, test_pred))

I am getting TypeError: 'NoneType' object is not iterable can't understand the cause of error

The code context = dspy.Retrieve(k=5)(question).passages is throwing this error.

Error no graphql provider present, this is most likely because no schema is present. Import a schema first!

I am trying to run the code below from the - Getting-Started-with-RAG-in-DSPy.ipynb.

test_example = dspy.Example(question="What do cross encoders do?")
test_pred = dspy.Example(answer="They re-rank documents.")

type(llm_metric(test_example, test_pred))

Getting error-
UnexpectedStatusCodeException: Query was not successful! Unexpected status code: 422, with response body: {'error': [{'message': 'no graphql provider present, this is most likely because no schema is present. Import a schema first!'}]}

I am using docker version of Weaviate using docker run -p 8080:8080 -p 50051:50051 cr.weaviate.io/semitechnologies/weaviate:1.25.9

Can't understand how to fix this issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.