Comments (6)
The rel_props
are not being saved to the persistent storage because the refresh_schema
method is not being called or executed properly. This method is responsible for updating the structured_schema
dictionary, which includes the rel_props
.
To resolve this issue, ensure that the refresh_schema
method is called after any changes to the relationships or their properties. This can be done by explicitly calling self.refresh_schema()
after any operation that modifies the relationships.
Here is the relevant part of the code where refresh_schema
is called:
def __init__(
self,
username: str,
password: str,
url: str,
database: Optional[str] = "neo4j",
refresh_schema: bool = True,
sanitize_query_output: bool = True,
enhanced_schema: bool = False,
**neo4j_kwargs: Any,
) -> None:
self.sanitize_query_output = sanitize_query_output
self.enhanced_schema = enhanced_schema
self._driver = neo4j.GraphDatabase.driver(
url, auth=(username, password), **neo4j_kwargs
)
self._async_driver = neo4j.AsyncGraphDatabase.driver(
url,
auth=(username, password),
**neo4j_kwargs,
)
self._database = database
self.structured_schema = {}
if refresh_schema:
self.refresh_schema()
Make sure that refresh_schema
is set to True
when initializing the class, or call self.refresh_schema()
manually after any updates to the relationships.
Additionally, ensure that the persist
method is correctly implemented to save the graph store to a file. Here is an example of how the persist
method can be implemented:
def persist(
self, persist_path: str, fs: Optional[fsspec.AbstractFileSystem] = None
) -> None:
"""Persist the graph store to a file."""
if fs is None:
fs = fsspec.filesystem("file")
with fs.open(persist_path, "w") as f:
f.write(self.graph.json())
This method ensures that the current state of the graph, including rel_props
, is saved to the specified file.
According to the PropertyGraphStore
class, the persist
method is defined as an abstract method, which means it must be implemented in any subclass. Ensure that your subclass correctly implements this method to handle the persistence of rel_props
:
class MyGraphStore(PropertyGraphStore):
def persist(
self, persist_path: str, fs: Optional[fsspec.AbstractFileSystem] = None
) -> None:
"""Persist the graph store to a file."""
if fs is None:
fs = fsspec.filesystem("file")
with fs.open(persist_path, "w") as f:
f.write(self.graph.json())
By ensuring that refresh_schema
is called appropriately and the persist
method is correctly implemented, you can resolve the issue of rel_props
not being saved to persistent storage [1][2][3].
To continue talking to Dosu, mention @dosu.
from llama_index.
Yes, it is possible to save your rel_props
to the persistent graph storage rel_maps
. The NebulaPropertyGraphStore
class provides methods to upsert triplets and manage relationships, which implies that you can store your rel_props
directly in the graph storage.
Here is the relevant method from the NebulaPropertyGraphStore
class:
def upsert_triplet(self, subj: str, rel: str, obj: str) -> None:
"""Add triplet."""
# Note, to enable leveraging existing knowledge graph,
# the (triplet -- property graph) mapping
# makes (n:1) edge_type.prop_name --> triplet.rel
# thus we have to assume rel to be the first edge_type.prop_name
# here in upsert_triplet().
# This applies to the type of entity(tags) with subject and object, too,
# thus we have to assume subj to be the first entity.tag_name
This method allows you to add a triplet (subject, relationship, object) to the graph, which means you can save your rel_props
directly to the persistent graph storage without needing to retrieve them from the Nebula graph space [1][2][3].
To continue talking to Dosu, mention @dosu.
from llama_index.
To ensure that the rel_map
is being populated correctly within your index_store.json
in the persistent storage, you need to verify that the get_rel_map
method is correctly fetching and structuring the relationship data. Here is how you can update your script to include this functionality:
-
Ensure
get_rel_map
is correctly implemented: Verify that theget_rel_map
method is correctly fetching and structuring the relationship data. -
Persist the
rel_map
toindex_store.json
: Serialize therel_map
dictionary to JSON and write it to the file.
Here is an updated version of your script with these changes:
from flask import Flask, request, jsonify
import os
from llama_index.core import (
VectorStoreIndex,
StorageContext,
Document,
Settings,
PromptTemplate,
KnowledgeGraphIndex
)
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.graph_stores.nebula import NebulaGraphStore
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.indices.vector_store.retrievers import VectorIndexRetriever
from llama_index.core.retrievers import KnowledgeGraphRAGRetriever
from llama_index.core.schema import QueryBundle
from llama_index.core.base.base_retriever import BaseRetriever
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.core.schema import TextNode
import base64, logging, json
logging.basicConfig(level=logging.DEBUG)
logging.basicConfig(level=logging.INFO)
Settings.llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
embed_model = OpenAIEmbedding(model="text-embedding-3-large")
Settings.embed_model = embed_model
Settings.chunk_size = 512
os.environ["NEBULA_USER"] = "root"
os.environ["NEBULA_PASSWORD"] = "nebula"
os.environ["NEBULA_ADDRESS"] = "127.0.0.1:9669"
space_name = "test9"
edge_types, rel_prop_names = ["relationship"], ["relationship"]
tags = ["entity"]
def encode_string(s):
return base64.urlsafe_b64encode(s.encode()).decode()
def decode_string(s):
return base64.urlsafe_b64decode(s.encode()).decode()
def sanitize_and_encode(data):
sanitized_data = {}
for key, value in data.items():
if isinstance(value, str):
sanitized_data[key] = encode_string((value))
else:
sanitized_data[key] = value
return sanitized_data
def decode_metadata(metadata):
decoded_metadata = {}
for key, value in metadata.items():
if isinstance(value, str):
decoded_metadata[key] = decode_string(value)
else:
decoded_metadata[key] = value
return decoded_metadata
def load_json_nodes(json_directory):
nodes = []
for filename in os.listdir(json_directory):
if filename.endswith('.json'):
with open(os.path.join(json_directory, filename), 'r') as file:
data = json.load(file)
for node_data in data:
sanitized_metadata = sanitize_and_encode(node_data['metadata'])
node = TextNode(
text=encode_string((node_data['text']))),
id_=node_data['id_'],
embedding=node_data['embedding'],
metadata=sanitized_metadata
)
nodes.append(node)
logging.debug(f"Loaded node ID: {node.id_}, text: {node_data['text']}, metadata: {node_data['metadata']}")
return nodes
def save_rel_map_to_json(rel_map, file_path):
with open(file_path, 'w') as json_file:
json.dump(rel_map, json_file, indent=4)
def create_index():
graph_store = NebulaGraphStore(
space_name=space_name,
edge_types=edge_types,
rel_prop_names=rel_prop_names,
tags=tags
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
json_nodes = load_json_nodes("JSON_nodes_999_large_syll")
documents = [
Document(
text=decode_string(node.text),
id_=node.id_,
metadata=decode_metadata(node.metadata),
embedding=node.embedding
) for node in json_nodes
]
kg_index = KnowledgeGraphIndex.from_documents(
documents,
storage_context=storage_context,
max_triplets_per_chunk=10,
space_name=space_name,
edge_types=edge_types,
rel_prop_names=rel_prop_names,
tags=tags,
max_knowledge_sequence=15,
include_embeddings=True
)
# Set the index_id for KnowledgeGraphIndex
kg_index.set_index_id("kg_index")
kg_index.storage_context.persist(persist_dir='./storage_graph_test10')
logging.debug(f"KG Index created with {len(documents)} documents")
# Create VectorStoreIndex
vector_index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
# Set the index_id for VectorStoreIndex
vector_index.set_index_id("vector_index")
# Persist the storage context
storage_context.persist(persist_dir='./storage_graph_test10')
logging.debug(f"Vector Index created with {len(documents)} documents")
# Get and save the rel_map
rel_map = graph_store.get_rel_map()
save_rel_map_to_json(rel_map, './storage_graph_test10/index_store.json')
return kg_index, vector_index, storage_context
print("Creating Index...")
kg_index, vector_index, storage_context = create_index()
print("Index Created...")
This script ensures that the rel_map
is correctly populated and saved to your index_store.json
in the persistent storage [1][2].
To continue talking to Dosu, mention @dosu.
from llama_index.
How can I add include this to my previous script that creates both vector and graph indices, and saves them to a persistent storage?
@dosu
from llama_index.
So it is possible to save my rel_props to the persistent graph storage rel_maps, or do I have to connect to my nebula graph space to retrieve these rel_props?
@dosu
@dosubot
from llama_index.
so hwo can I update my script to ensure the rel_map is being populated within my index_store.json in my persistent storage?
@dosu
from llama_index.
Related Issues (20)
- [Question]: How can I combine Vector DB and a new query engine? HOT 1
- [Question]: Less context than similarity_top_k HOT 3
- [Feature Request]:
- [Question]: Ingestion Pipelines and Workflows? HOT 2
- [Bug]: poetry add llama-index failing for v0.10.65 HOT 6
- [Bug]: impossible to use PDfReader with an S3 file because of Path() casting HOT 1
- [Bug]: impossible to use PDfReader with an S3 file because of Path() casting HOT 1
- [Question]: How to run HuggingFaceEmbedding on multiple available GPUs? HOT 1
- [Question]: HOT 2
- [Question]: Constructing hybrid indices with Qdrant. HOT 10
- [Bug]: NeptuneAnalyticsPropertyGraphStore incorrectly assigning the embedding to the chunk instead of the entity HOT 2
- [Question]: Data disappear after build vector store with VectorStoreIndex.from_documents() HOT 6
- [Bug]: VectorStoreIndex.build_index_from_nodes() missing 1 required positional argument: 'self' HOT 4
- Pydantic V2 migration (llms)
- Pydantic V2 migration (graph-stores & indices) HOT 1
- Pydantic V2 migration (agent, embeddings, callbacks, extractors, multi-modal-llms, postprocessor)
- Pydantic V2 migration (node parsers, output parsers, program, question gen)
- [Question]: how to insert custom embeddings into qdrant instead of document HOT 1
- [Feature Request]: Support to configure the base url of jina.ai
- [Bug]: ModuleNotFoundError: No module named 'llama_index.legacy.finetuning.gradient'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama_index.