Giter Club home page Giter Club logo

Comments (4)

dosubot avatar dosubot commented on August 16, 2024 2

The issue with your Knowledge Graph (KG) index returning empty responses when loaded from persistent storage, despite the storage index files appearing to be correctly populated, could be due to the "text" field being null in the index_struct.

In the provided JSON structure, the "text" field is set to null:

{
  "index_struct": {
    "text": null,
    "doc_id": "446680de-660e-421a-8fdc-ed244bd466ce",
    "embedding": null,
    "extra_info": null,
    "table": {
      ...
    }
  }
}

The text field is crucial for the Knowledge Graph to return meaningful responses. If this field is null, the index might not have the necessary data to generate responses, leading to empty results.

Additionally, ensure that the storage_context provided to the KnowledgeGraphQueryEngine includes a valid graph_store. The graph_store is essential for querying and retrieving data from the Knowledge Graph. If the graph_store is not correctly set up or is missing, the query engine will not be able to retrieve any data, resulting in empty responses.

To resolve this, ensure that:

  1. The text field is correctly populated with the relevant data when saving the index to persistent storage.
  2. The storage_context includes a valid graph_store when initializing the KnowledgeGraphQueryEngine.

Here is an example of how the KnowledgeGraphQueryEngine should be initialized:

storage_context = StorageContext(graph_store=your_graph_store)
query_engine = KnowledgeGraphQueryEngine(storage_context=storage_context)

Make sure that your_graph_store is properly configured and contains the necessary schema and data.

Additionally, if you are using the KnowledgeGraphIndex class from the LlamaIndex framework, ensure that the graph_store is correctly populated. The class includes a mechanism to upgrade previously saved KG indexes to a new storage format if necessary:

if (
    len(self.index_struct.table) > 0
    and isinstance(self.graph_store, SimpleGraphStore)
    and len(self.graph_store._data.graph_dict) == 0
):
    logger.warning("Upgrading previously saved KG index to new storage format.")
    self.graph_store._data.graph_dict = self.index_struct.rel_map

This ensures that the graph_store is correctly populated with the relationship map from the index_struct. If this upgrade step is not performed, the graph_store might remain empty, leading to empty responses.

Moreover, you can refer to the test_load_graph_from_storage_simple function from the LlamaIndex tests to understand how to properly load a graph from storage. Here is a simplified version of the test:

def test_load_graph_from_storage_simple(
    documents: List[Document],
    tmp_path: Path,
    mock_service_context: ServiceContext,
) -> None:
    storage_context = StorageContext.from_defaults()

    vector_index_1 = VectorStoreIndex.from_documents(
        documents=documents,
        storage_context=storage_context,
        service_context=mock_service_context,
    )

    vector_index_2 = VectorStoreIndex.from_documents(
        documents=documents,
        storage_context=storage_context,
        service_context=mock_service_context,
    )

    summary_index = SummaryIndex.from_documents(
        documents=documents,
        storage_context=storage_context,
        service_context=mock_service_context,
    )

    graph = ComposableGraph.from_indices(
        SummaryIndex,
        children_indices=[vector_index_1, vector_index_2, summary_index],
        index_summaries=["vector index 1", "vector index 2", "summary index"],
        storage_context=storage_context,
        service_context=mock_service_context,
    )

    query_engine = graph.as_query_engine()
    response = query_engine.query("test query")

    storage_context.persist(str(tmp_path))

    new_storage_context = StorageContext.from_defaults(persist_dir=str(tmp_path))

    new_graph = load_graph_from_storage(
        new_storage_context, root_id=graph.root_id, service_context=mock_service_context
    )

    new_query_engine = new_graph.as_query_engine()
    new_response = new_query_engine.query("test query")

    assert str(response) == str(new_response)

This test ensures that the graph is correctly loaded from storage and that the query responses remain consistent [1][2][3][4].

To continue talking to Dosu, mention @dosu.

from llama_index.

dosubot avatar dosubot commented on August 16, 2024 2

To ensure your rel_map is populated when creating and storing your persistent storage, you need to call the get_rel_map method after inserting or updating your data. Here is an example of how you can update your script to include this step:

class SimpleGraphStore(GraphStore):
    """Simple Graph Store.

    In this graph store, triplets are stored within a simple, in-memory dictionary.

    Args:
        simple_graph_store_data_dict (Optional[dict]): data dict
            containing the triplets. See SimpleGraphStoreData
            for more details.
    """

    def __init__(
        self,
        data: Optional[SimpleGraphStoreData] = None,
        fs: Optional[fsspec.AbstractFileSystem] = None,
        **kwargs: Any,
    ) -> None:
        """Initialize params."""
        self._data = data or SimpleGraphStoreData()
        self._fs = fs or fsspec.filesystem("file")

    def upsert_triplet(self, subj: str, rel: str, obj: str) -> None:
        """Add triplet and update rel_map."""
        if subj not in self._data.graph_dict:
            self._data.graph_dict[subj] = []
        if [rel, obj] not in self._data.graph_dict[subj]:
            self._data.graph_dict[subj].append([rel, obj])

        # Update rel_map after upserting the triplet
        self.rel_map = self.get_rel_map(subjs=[subj])

    def get_rel_map(
        self, subjs: Optional[List[str]] = None, depth: int = 2, limit: int = 30
    ) -> Dict[str, List[List[str]]]:
        """Get depth-aware rel map."""
        return self._data.get_rel_map(subjs=subjs, depth=depth, limit=limit)

# Example usage
graph_store = SimpleGraphStore()
graph_store.upsert_triplet("Software", "Mention in", "Footnotes")
print(graph_store.rel_map)

In this example, after upserting the triplet, the get_rel_map method is called to update the rel_map with the new relationships. This ensures that rel_map is always populated with the latest data [1][2][3][4][5].

To continue talking to Dosu, mention @dosu.

from llama_index.

jjoaqu7 avatar jjoaqu7 commented on August 16, 2024

Im still confused, how can I update my script to ensure my rel_map is populated when creating and storing my persistent storage?
@dosu

from llama_index.

jjoaqu7 avatar jjoaqu7 commented on August 16, 2024

Hey @logan-markewich I have tried to understand the following from dosubot, but I am struggling to implement correctly. I would just like to be able to load my created vector and graph indices from the persistent directory without having to connect to my nebula graph server. Is this even possible?
The reason why I want to populate the rel_map, which is empty within my index_store.json (graph store), is because I am assuming that my relationships are supposed to be present here for when I call this persistent dir when I am querying with these indices.
Do you suggest anything I can do, my ultimate goal is to deploy this application without requiring a constant connection to my nebula graph?

from llama_index.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.