tomasonjo / blogs Goto Github PK

Jupyter notebooks that support my graph data science blog posts at https://bratanic-tomaz.medium.com/

Jupyter Notebook 99.96% Python 0.04%

graph-algorithms graph data-science neo4j

blogs's Introduction

Graph Data Science Blog

This repository contains a collection of Jupyter Notebooks that support my Graph Data Science exploration blog posts using Neo4j.

https://bratanic-tomaz.medium.com/

https://tbgraph.wordpress.com/

blogs's People

Contributors

Stargazers

Watchers

Forkers

mammykins fotisz garcer3 gisdev01 ersushantsood dilipti counterfactuals bigdatavik milan-chicago eliekawerk haraprasadj daniel-mietchen vargeus jainds linhkid zohaibabar gotid erodneycorus stjordanis maybeee18 profshen kforcodeai donperegrina kcgithub123 data4sci eayan zalentine tridungle gangminli estkae zoldaten 523a stmnk o7s8r6 sharadc2001 varsha1288 ytiam pwscvr18 newpost12 yasark zrus116 nlp-kg ravinsharma7 gzaifa mongoooo kii-chan-iine harryzhangdata kuczera gitmlody zwytop targonaut himeshph personx000 jocon-grmtx tomyc jofujofu vitlib omarun furyhawk ossib marrary2 joskid pruthvirajv agarciagoni usamapervaiz94 wildone momokhattab98 vjrodrig assou-gate 8bit-code kennee10 visualcomments dpalmz trmtn bilalbenma kam1422 mpofukelvintafadzwa southcoastpy dineshdyne wujiahui0821 goryszewskig caoyongshengcys cberranger hammedb197 pj0616 vamsi7964 ricklentz python-repository-hub aadorian steliord ramcqueary kaistchangmin cactus71480 sruthi5797 skgoh95 arita37 arthurkeen luckydl21 timothyeastridge paritoshk

blogs's Issues

last query ends with error 400

Hi Tomas
thanks for the notebook on https://github.com/tomasonjo/blogs/blob/master/ie_pipeline/SpaCy_informationextraction.ipynb but unfortunately the last query doesn't work.

I had the same issue and the problem was the handling of newlines in the sparql-query but creating the sparql-query here without newlines didn't help.
I tried it with my own neo4j community server and it failed but with the sandbox it works.

Any ideas ?

Cypher syntax error in Neo4j 5.x in "How to Get Started With the Neo4j Graph Data Science Python Client"

I am unable to run the article's degree expression

degree_df = gds.run_cypher("""
MATCH (c:Character)
RETURN c.name AS character,
       size((c)--()) AS degree
""")

degree_df.head()

as I get a Cypher syntax error, but

degree_df = gds.run_cypher("""
MATCH (c:Character)
RETURN c.name AS character, size([p=(c)--()|p]) AS degree
ORDER BY degree DESC
""")

degree_df.head()

seems to do the trick. :-)

Ollama implementation

When changing the LLM from Openai to Ollama with Llama3 I get

ValueError: The 'node_properties' and 'relationship_properties' parameters cannot be used in combination with a LLM that doesn't support native function calling."

Any chance of implementation with local model possible?

Thanks

DevOp RAg

Hey,

it seems this code is not working , it does not generate embedding in the databse ? is there a special configuration I need to do?
it does generate a vector index
`

vector_index = Neo4jVector.from_existing_graph(
OpenAIEmbeddings(),
url=url,
username=username,
password=password,
database='sss',
index_name='tasks',
node_label="Task",
text_node_properties=['name', 'description', 'status'],
embedding_node_property='embedding',
)`

How can I use mistral as my model？

Hi, I want to use mistral as my model in your enhancing_rag_with_graph.ipynb, but I don't know how to modify the code. Please tell me the details, thanks!

update_query not defined in IE_pipeline.ipynb

Hi, trying to run this notebook https://github.com/tomasonjo/blogs/blob/master/ie_pipeline/IE_pipeline.ipynb and the last line fails with "'update_query' is not defined". Sorry if I am missing a newbie system config issue. Thanks

Final Query Throws Error - https://github.com/tomasonjo/blogs/blob/master/llm/neo4jvector_langchain_deepdive.ipynb?ref=blog.langchain.dev

When working through your note book. The final query:
existing_index_return.similarity_search("What do you know about LangChain?", k=1)
Throws an error;
ClientError: {code: Neo.ClientError.Procedure.ProcedureCallFailed} {message: Failed to invoke procedure db.index.vector.queryNodes: Caused by: java.lang.IllegalArgumentException: 'numberOfNearestNeighbours' must be positive}

I'm not sure if this is due to a change in Neo4j's implementation. Full stack trace is below:

�[0;31m---------------------------------------------------------------------------�[0m
�[0;31mClientError�[0m                               Traceback (most recent call last)
Cell �[0;32mIn[29], line 1�[0m
�[0;32m----> 1�[0m existing_index_return�[38;5;241m.�[39msimilarity_search(�[38;5;124m"�[39m�[38;5;124mWhat do you know about LangChain?�[39m�[38;5;124m"�[39m, k�[38;5;241m=�[39m�[38;5;241m1�[39m)

File �[0;32m~/anaconda3/lib/python3.11/site-packages/langchain/vectorstores/neo4j_vector.py:530�[0m, in �[0;36mNeo4jVector.similarity_search�[0;34m(self, query, k, **kwargs)�[0m
�[1;32m    520�[0m �[38;5;250m�[39m�[38;5;124;03m"""Run similarity search with Neo4jVector.�[39;00m
�[1;32m    521�[0m 
�[1;32m    522�[0m �[38;5;124;03mArgs:�[39;00m
�[0;32m   (...)�[0m
�[1;32m    527�[0m �[38;5;124;03m    List of Documents most similar to the query.�[39;00m
�[1;32m    528�[0m �[38;5;124;03m"""�[39;00m
�[1;32m    529�[0m embedding �[38;5;241m=�[39m �[38;5;28mself�[39m�[38;5;241m.�[39membedding�[38;5;241m.�[39membed_query(text�[38;5;241m=�[39mquery)
�[0;32m--> 530�[0m �[38;5;28;01mreturn�[39;00m �[38;5;28mself�[39m�[38;5;241m.�[39msimilarity_search_by_vector(
�[1;32m    531�[0m     embedding�[38;5;241m=�[39membedding,
�[1;32m    532�[0m     k�[38;5;241m=�[39mk,
�[1;32m    533�[0m     query�[38;5;241m=�[39mquery,
�[1;32m    534�[0m )

File �[0;32m~/anaconda3/lib/python3.11/site-packages/langchain/vectorstores/neo4j_vector.py:625�[0m, in �[0;36mNeo4jVector.similarity_search_by_vector�[0;34m(self, embedding, k, **kwargs)�[0m
�[1;32m    610�[0m �[38;5;28;01mdef�[39;00m �[38;5;21msimilarity_search_by_vector�[39m(
�[1;32m    611�[0m     �[38;5;28mself�[39m,
�[1;32m    612�[0m     embedding: List[�[38;5;28mfloat�[39m],
�[1;32m    613�[0m     k: �[38;5;28mint�[39m �[38;5;241m=�[39m �[38;5;241m4�[39m,
�[1;32m    614�[0m     �[38;5;241m*�[39m�[38;5;241m*�[39mkwargs: Any,
�[1;32m    615�[0m ) �[38;5;241m-�[39m�[38;5;241m>�[39m List[Document]:
�[1;32m    616�[0m �[38;5;250m    �[39m�[38;5;124;03m"""Return docs most similar to embedding vector.�[39;00m
�[1;32m    617�[0m 
�[1;32m    618�[0m �[38;5;124;03m    Args:�[39;00m
�[0;32m   (...)�[0m
�[1;32m    623�[0m �[38;5;124;03m        List of Documents most similar to the query vector.�[39;00m
�[1;32m    624�[0m �[38;5;124;03m    """�[39;00m
�[0;32m--> 625�[0m     docs_and_scores �[38;5;241m=�[39m �[38;5;28mself�[39m�[38;5;241m.�[39msimilarity_search_with_score_by_vector(
�[1;32m    626�[0m         embedding�[38;5;241m=�[39membedding, k�[38;5;241m=�[39mk, �[38;5;241m*�[39m�[38;5;241m*�[39mkwargs
�[1;32m    627�[0m     )
�[1;32m    628�[0m     �[38;5;28;01mreturn�[39;00m [doc �[38;5;28;01mfor�[39;00m doc, _ �[38;5;129;01min�[39;00m docs_and_scores]

File �[0;32m~/anaconda3/lib/python3.11/site-packages/langchain/vectorstores/neo4j_vector.py:594�[0m, in �[0;36mNeo4jVector.similarity_search_with_score_by_vector�[0;34m(self, embedding, k, **kwargs)�[0m
�[1;32m    585�[0m read_query �[38;5;241m=�[39m _get_search_index_query(�[38;5;28mself�[39m�[38;5;241m.�[39msearch_type) �[38;5;241m+�[39m retrieval_query
�[1;32m    586�[0m parameters �[38;5;241m=�[39m {
�[1;32m    587�[0m     �[38;5;124m"�[39m�[38;5;124mindex�[39m�[38;5;124m"�[39m: �[38;5;28mself�[39m�[38;5;241m.�[39mindex_name,
�[1;32m    588�[0m     �[38;5;124m"�[39m�[38;5;124mk�[39m�[38;5;124m"�[39m: k,
�[0;32m   (...)�[0m
�[1;32m    591�[0m     �[38;5;124m"�[39m�[38;5;124mquery�[39m�[38;5;124m"�[39m: kwargs[�[38;5;124m"�[39m�[38;5;124mquery�[39m�[38;5;124m"�[39m],
�[1;32m    592�[0m }
�[0;32m--> 594�[0m results �[38;5;241m=�[39m �[38;5;28mself�[39m�[38;5;241m.�[39mquery(read_query, params�[38;5;241m=�[39mparameters)
�[1;32m    596�[0m docs �[38;5;241m=�[39m [
�[1;32m    597�[0m     (
�[1;32m    598�[0m         Document(
�[0;32m   (...)�[0m
�[1;32m    606�[0m     �[38;5;28;01mfor�[39;00m result �[38;5;129;01min�[39;00m results
�[1;32m    607�[0m ]
�[1;32m    608�[0m �[38;5;28;01mreturn�[39;00m docs

File �[0;32m~/anaconda3/lib/python3.11/site-packages/langchain/vectorstores/neo4j_vector.py:241�[0m, in �[0;36mNeo4jVector.query�[0;34m(self, query, params)�[0m
�[1;32m    239�[0m �[38;5;28;01mtry�[39;00m:
�[1;32m    240�[0m     data �[38;5;241m=�[39m session�[38;5;241m.�[39mrun(query, params)
�[0;32m--> 241�[0m     �[38;5;28;01mreturn�[39;00m [r�[38;5;241m.�[39mdata() �[38;5;28;01mfor�[39;00m r �[38;5;129;01min�[39;00m data]
�[1;32m    242�[0m �[38;5;28;01mexcept�[39;00m CypherSyntaxError �[38;5;28;01mas�[39;00m e:
�[1;32m    243�[0m     �[38;5;28;01mraise�[39;00m �[38;5;167;01mValueError�[39;00m(�[38;5;124mf�[39m�[38;5;124m"�[39m�[38;5;124mCypher Statement is not valid�[39m�[38;5;130;01m\n�[39;00m�[38;5;132;01m{�[39;00me�[38;5;132;01m}�[39;00m�[38;5;124m"�[39m)

File �[0;32m~/anaconda3/lib/python3.11/site-packages/langchain/vectorstores/neo4j_vector.py:241�[0m, in �[0;36m<listcomp>�[0;34m(.0)�[0m
�[1;32m    239�[0m �[38;5;28;01mtry�[39;00m:
�[1;32m    240�[0m     data �[38;5;241m=�[39m session�[38;5;241m.�[39mrun(query, params)
�[0;32m--> 241�[0m     �[38;5;28;01mreturn�[39;00m [r�[38;5;241m.�[39mdata() �[38;5;28;01mfor�[39;00m r �[38;5;129;01min�[39;00m data]
�[1;32m    242�[0m �[38;5;28;01mexcept�[39;00m CypherSyntaxError �[38;5;28;01mas�[39;00m e:
�[1;32m    243�[0m     �[38;5;28;01mraise�[39;00m �[38;5;167;01mValueError�[39;00m(�[38;5;124mf�[39m�[38;5;124m"�[39m�[38;5;124mCypher Statement is not valid�[39m�[38;5;130;01m\n�[39;00m�[38;5;132;01m{�[39;00me�[38;5;132;01m}�[39;00m�[38;5;124m"�[39m)

File �[0;32m~/anaconda3/lib/python3.11/site-packages/neo4j/_sync/work/result.py:266�[0m, in �[0;36mResult.__iter__�[0;34m(self)�[0m
�[1;32m    264�[0m     �[38;5;28;01myield�[39;00m �[38;5;28mself�[39m�[38;5;241m.�[39m_record_buffer�[38;5;241m.�[39mpopleft()
�[1;32m    265�[0m �[38;5;28;01melif�[39;00m �[38;5;28mself�[39m�[38;5;241m.�[39m_streaming:
�[0;32m--> 266�[0m     �[38;5;28mself�[39m�[38;5;241m.�[39m_connection�[38;5;241m.�[39mfetch_message()
�[1;32m    267�[0m �[38;5;28;01melif�[39;00m �[38;5;28mself�[39m�[38;5;241m.�[39m_discarding:
�[1;32m    268�[0m     �[38;5;28mself�[39m�[38;5;241m.�[39m_discard()

File �[0;32m~/anaconda3/lib/python3.11/site-packages/neo4j/_sync/io/_common.py:180�[0m, in �[0;36mConnectionErrorHandler.__getattr__.<locals>.outer.<locals>.inner�[0;34m(*args, **kwargs)�[0m
�[1;32m    178�[0m �[38;5;28;01mdef�[39;00m �[38;5;21minner�[39m(�[38;5;241m*�[39margs, �[38;5;241m*�[39m�[38;5;241m*�[39mkwargs):
�[1;32m    179�[0m     �[38;5;28;01mtry�[39;00m:
�[0;32m--> 180�[0m         func(�[38;5;241m*�[39margs, �[38;5;241m*�[39m�[38;5;241m*�[39mkwargs)
�[1;32m    181�[0m     �[38;5;28;01mexcept�[39;00m (Neo4jError, ServiceUnavailable, SessionExpired) �[38;5;28;01mas�[39;00m exc:
�[1;32m    182�[0m         �[38;5;28;01massert�[39;00m �[38;5;129;01mnot�[39;00m asyncio�[38;5;241m.�[39miscoroutinefunction(�[38;5;28mself�[39m�[38;5;241m.�[39m__on_error)

File �[0;32m~/anaconda3/lib/python3.11/site-packages/neo4j/_sync/io/_bolt.py:851�[0m, in �[0;36mBolt.fetch_message�[0;34m(self)�[0m
�[1;32m    847�[0m �[38;5;66;03m# Receive exactly one message�[39;00m
�[1;32m    848�[0m tag, fields �[38;5;241m=�[39m �[38;5;28mself�[39m�[38;5;241m.�[39minbox�[38;5;241m.�[39mpop(
�[1;32m    849�[0m     hydration_hooks�[38;5;241m=�[39m�[38;5;28mself�[39m�[38;5;241m.�[39mresponses[�[38;5;241m0�[39m]�[38;5;241m.�[39mhydration_hooks
�[1;32m    850�[0m )
�[0;32m--> 851�[0m res �[38;5;241m=�[39m �[38;5;28mself�[39m�[38;5;241m.�[39m_process_message(tag, fields)
�[1;32m    852�[0m �[38;5;28mself�[39m�[38;5;241m.�[39midle_since �[38;5;241m=�[39m perf_counter()
�[1;32m    853�[0m �[38;5;28;01mreturn�[39;00m res

File �[0;32m~/anaconda3/lib/python3.11/site-packages/neo4j/_sync/io/_bolt5.py:376�[0m, in �[0;36mBolt5x0._process_message�[0;34m(self, tag, fields)�[0m
�[1;32m    374�[0m �[38;5;28mself�[39m�[38;5;241m.�[39m_server_state_manager�[38;5;241m.�[39mstate �[38;5;241m=�[39m �[38;5;28mself�[39m�[38;5;241m.�[39mbolt_states�[38;5;241m.�[39mFAILED
�[1;32m    375�[0m �[38;5;28;01mtry�[39;00m:
�[0;32m--> 376�[0m     response�[38;5;241m.�[39mon_failure(summary_metadata �[38;5;129;01mor�[39;00m {})
�[1;32m    377�[0m �[38;5;28;01mexcept�[39;00m (ServiceUnavailable, DatabaseUnavailable):
�[1;32m    378�[0m     �[38;5;28;01mif�[39;00m �[38;5;28mself�[39m�[38;5;241m.�[39mpool:

File �[0;32m~/anaconda3/lib/python3.11/site-packages/neo4j/_sync/io/_common.py:247�[0m, in �[0;36mResponse.on_failure�[0;34m(self, metadata)�[0m
�[1;32m    245�[0m handler �[38;5;241m=�[39m �[38;5;28mself�[39m�[38;5;241m.�[39mhandlers�[38;5;241m.�[39mget(�[38;5;124m"�[39m�[38;5;124mon_summary�[39m�[38;5;124m"�[39m)
�[1;32m    246�[0m Util�[38;5;241m.�[39mcallback(handler)
�[0;32m--> 247�[0m �[38;5;28;01mraise�[39;00m Neo4jError�[38;5;241m.�[39mhydrate(�[38;5;241m*�[39m�[38;5;241m*�[39mmetadata)

�[0;31mClientError�[0m: {code: Neo.ClientError.Procedure.ProcedureCallFailed} {message: Failed to invoke procedure `db.index.vector.queryNodes`: Caused by: java.lang.IllegalArgumentException: 'numberOfNearestNeighbours' must be positive}

😍When update GraphRAG inference blog?

Thanks for your awlsome blog in GraphRAG. It really helps me to understand the workflow.

When will it updates? Can't waiting for it.😆😆

Movie_recommendations model.encoder() error

I've been following along with your following notebook and came across an error was wondering if you might have any insights as how to resolve it? I am fairly new to PyTorch and PyG, so wasn't sure how to fix the error. Any advice would be greatly appreciated!

I have replaced the data with my own, but my data is very similar to the movie example you used. I have not had any other errors aside from the following cell (and the training loop).

Cell:

# Due to lazy initialization, we need to run one model step so the number
# of parameters can be inferred:
with torch.no_grad():
    model.encoder(train_data.x_dict, train_data.edge_index_dict)

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

Error:

`ValueError: `MessagePassing.propagate` only supports `torch.LongTensor` of shape `[2, num_messages]` or `torch_sparse.SparseTensor` for argument `edge_index`.`

P.S.-Your notebook has been really awesome and all the notes are very helpful!

Is there a typescript version for this?

text2cypher model returns additions to the cypher, this cause GraphCypherQAChain to fall

Hi,
i tryed text2cypher-demo-4bit-gguf-unsloth.Q4_K_M.gguf (aka. text2cypher_gguf locally) using ollama, as following:

llm = ChatOpenAI(model="text2cypher_gguf:latest",
base_url="http://localhost:11434/v1",
api_key="NA", temperature=0)

from langchain.chains.graph_qa.cypher import GraphCypherQAChain
graph = Neo4jGraph(url=os.environ["NEO4J_URI"], username=os.environ["NEO4J_USERNAME"], password=os.environ["NEO4J_PASSWORD"])
chain = GraphCypherQAChain.from_llm(
llm, graph=graph, verbose=True
)
chain.invoke({"query": "what materials are in the project? when creating the cypher write only the cypher command without any additions that might cause error during execution" })

i get the following errors:
Generated Cypher:
MATCH (p:IfcProject)<-[:HasContext]-(:IfcGeometricRepresentationContext)<-[:HasContext]-(:IfcMaterial)
RETURN DISTINCT p.name AS projectName, m.name AS materialName
ORDER BY projectName, materialName
LIMIT 10
<|im_end|>

and then, when GraphCypherQAChain trys to execute the cypher, it complains that the cypher is illegal (because of <|im_end|> suffix). i suggest to remove it from the generated cypher. see full error below:

CypherSyntaxError Traceback (most recent call last)
File /usr/local/anaconda3/lib/python3.11/site-packages/langchain_community/graphs/neo4j_graph.py:419, in Neo4jGraph.query(self, query, params)
418 try:
--> 419 data = session.run(Query(text=query, timeout=self.timeout), params)
420 json_data = [r.data() for r in data]

File /usr/local/anaconda3/lib/python3.11/site-packages/neo4j/_sync/work/session.py:314, in Session.run(self, query, parameters, **kwargs)
313 parameters = dict(parameters or {}, **kwargs)
--> 314 self._auto_result._run(
315 query, parameters, self._config.database,
316 self._config.impersonated_user, self._config.default_access_mode,
317 bookmarks, self._config.notifications_min_severity,
318 self._config.notifications_disabled_classifications,
319 )
321 return self._auto_result

File /usr/local/anaconda3/lib/python3.11/site-packages/neo4j/_sync/work/result.py:221, in Result._run(self, query, parameters, db, imp_user, access_mode, bookmarks, notifications_min_severity, notifications_disabled_classifications)
220 self._connection.send_all()
--> 221 self._attach()

File /usr/local/anaconda3/lib/python3.11/site-packages/neo4j/_sync/work/result.py:409, in Result._attach(self)
408 while self._attached is False:
--> 409 self._connection.fetch_message()

File /usr/local/anaconda3/lib/python3.11/site-packages/neo4j/_sync/io/_common.py:178, in ConnectionErrorHandler.getattr..outer..inner(*args, **kwargs)
177 try:
--> 178 func(*args, **kwargs)
179 except (Neo4jError, ServiceUnavailable, SessionExpired) as exc:

File /usr/local/anaconda3/lib/python3.11/site-packages/neo4j/_sync/io/_bolt.py:855, in Bolt.fetch_message(self)
852 tag, fields = self.inbox.pop(
853 hydration_hooks=self.responses[0].hydration_hooks
854 )
--> 855 res = self._process_message(tag, fields)
856 self.idle_since = monotonic()

File /usr/local/anaconda3/lib/python3.11/site-packages/neo4j/_sync/io/_bolt5.py:370, in Bolt5x0._process_message(self, tag, fields)
369 try:
--> 370 response.on_failure(summary_metadata or {})
371 except (ServiceUnavailable, DatabaseUnavailable):

File /usr/local/anaconda3/lib/python3.11/site-packages/neo4j/_sync/io/_common.py:245, in Response.on_failure(self, metadata)
244 Util.callback(handler)
--> 245 raise Neo4jError.hydrate(**metadata)

CypherSyntaxError: {code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input '|': expected "+" or "-" (line 5, column 2 (offset: 210))
"<|im_end|>"
^}

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
Cell In[33], line 1
----> 1 chain.invoke({"query": "what materials are in the project? when creating the cypher write only the cypher command without any additions that might cause error during execution" })

File /usr/local/anaconda3/lib/python3.11/site-packages/langchain/chains/base.py:166, in Chain.invoke(self, input, config, **kwargs)
164 except BaseException as e:
165 run_manager.on_chain_error(e)
--> 166 raise e
167 run_manager.on_chain_end(outputs)
169 if include_run_info:

File /usr/local/anaconda3/lib/python3.11/site-packages/langchain/chains/base.py:156, in Chain.invoke(self, input, config, **kwargs)
153 try:
154 self._validate_inputs(inputs)
155 outputs = (
--> 156 self._call(inputs, run_manager=run_manager)
157 if new_arg_supported
158 else self._call(inputs)
159 )
161 final_outputs: Dict[str, Any] = self.prep_outputs(
162 inputs, outputs, return_only_outputs
163 )
164 except BaseException as e:

File /usr/local/anaconda3/lib/python3.11/site-packages/langchain_community/chains/graph_qa/cypher.py:338, in GraphCypherQAChain._call(self, inputs, run_manager)
335 # Retrieve and limit the number of results
336 # Generated Cypher be null if query corrector identifies invalid schema
337 if generated_cypher:
--> 338 context = self.graph.query(generated_cypher)[: self.top_k]
339 else:
340 context = []

File /usr/local/anaconda3/lib/python3.11/site-packages/langchain_community/graphs/neo4j_graph.py:425, in Neo4jGraph.query(self, query, params)
423 return json_data
424 except CypherSyntaxError as e:
--> 425 raise ValueError(f"Generated Cypher Statement is not valid\n{e}")

ValueError: Generated Cypher Statement is not valid
{code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input '|': expected "+" or "-" (line 5, column 2 (offset: 210))
"<|im_end|>"
^}

error built-in class

I tried running the code but I get the following error
<class 'main.RebelComponent'> is a built-in class

Would really appreciate your help in sorting it out. Thanks

Get Auth Error when running notebook when query Neo4j

do yo have survey comparison for different algorithms how to convert text to graph

performance for RAG with Neo4j knowledge graph depends from how texts was transformed to graph
do yo have survey/ comparison for different algorithms how to convert text to graph ?
for example
1
sparse ( tfidf or count vectorizer ) vs dense (LLM bert or not bert GPT) embeddings
2
hybrid : both sparse ( tfidf or count vectorizer ) vs dense (LLM bert or not bert GPT) embeddings , with different waits
3
different prompts to convert text to dense LLM embedding
4
big text (many files , less files but big files ) and small text
5
synonyms
shallow nodes similarity ( only similar to next node ) and deep nodes similarity
6
etc
to avoid
https://contextual.ai/introducing-rag2/
A typical RAG system today uses a frozen off-the-shelf model for embeddings, a vector database for retrieval, and a black-box language model for generation, stitched together through prompting or an orchestration framework. This leads to a “Frankenstein’s monster” of generative AI: the individual components technically work, but the whole is far from optimal.
see also
https://www.linkedin.com/pulse/data-science-machine-learning-thoughts-quotes-sander-stepanov/?trackingId=IUH7lVdxTPS%2BJcZX%2FYf7oA%3D%3D

Inconsistent results when creating PropertyGraphIndex (orphaned nodes)

In the following code snippet, when processing a list of my own documents, there are usually orphan nodes (where each node is a chunk) that get created along with the other nodes. I have a well-defined schema derived from the original example, and the orphan nodes clearly do have entities and relationships that are compliant with the schema, but they are orphaned nonetheless.

I know this because I can process 10 articles, of which 1 will result in an orphaned node. Then if I process only that orphaned node, it will generate schema-based relationships and entities. So the behavior is inconsistent but frequent. In a set of 64 articles, I might get 10 orphaned nodes.

Has anyone else seen this problem or have any suggestions? BTW, I've tried with strict=True and with strict=False.

from llama_index.core import PropertyGraphIndex

kg_extractor = SchemaLLMPathExtractor(
llm=llm,
possible_entities=entities,
possible_relations=relations,
kg_validation_schema=validation_schema,
# if false, allows for values outside of the schema
# useful for using the schema as a suggestion
strict=True,
)

NUMBER_OF_ARTICLES = 250

index = PropertyGraphIndex.from_documents(
documents[:NUMBER_OF_ARTICLES],
kg_extractors=[kg_extractor],
llm=llm,
embed_model=embed_model,
property_graph_store=graph_store,
show_progress=True,
)

Failed to establish connection to Neo4j cloud instance

neo4j.exceptions.ServiceUnavailable: Couldn't connect to 18.212.197.205:7687 (resolved to ()):
Timed out trying to establish connection to ResolvedIPv4Address(('18.212.197.205', 7687))

:loudspeaker: Create a "social media preview" for this amazing repo :art:

❔ About

This repo is an awesome source of knowledge and inspiration. Still, while wanting to share it on social networks, the preview seems poor regarding the potential it has :

🎯 Action

Designing and setting a "social media preview" would be a great way to achieve this 🤩

Could not use apoc procedure

Used a url that worked with GraphDatabase. It is throwing the above error. I changed some settings in neo4j.conf. It seems it also requires now an apoc.conf that I put in same directory as neo4j.conf It contains apoc.export.file.enabled=true. When I ask to show procedures the apoc.meta.data does not show. Did u do a config wrong or is there a way to use the GraphDatabase (which works). I am trying to use the vector database .

thanks

Getting the below error when running the PropertyGraphIndex in llamaindex

Traceback (most recent call last):
File "/Users/joyeed/llamaindex/llamaindex/venv/lib/python3.10/site-packages/neo4j/_sync/driver.py", line 544, in del
File "/Users/joyeed/llamaindex/llamaindex/venv/lib/python3.10/site-packages/neo4j/_sync/driver.py", line 640, in close
File "/Users/joyeed/llamaindex/llamaindex/venv/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py", line 480, in close
File "/Users/joyeed/llamaindex/llamaindex/venv/lib/python3.10/site-packages/neo4j/_sync/io/_pool.py", line 414, in _close_connections
File "/Users/joyeed/llamaindex/llamaindex/venv/lib/python3.10/site-packages/neo4j/_sync/io/_bolt.py", line 950, in close
File "/Users/joyeed/llamaindex/llamaindex/venv/lib/python3.10/site-packages/neo4j/_sync/io/_bolt5.py", line 327, in goodbye
AttributeError: 'NoneType' object has no attribute 'debug'

how to get bolt url for aura neo4j instance?

@tomasonjo - sorry for the n00b question, but how do i get the bolt address of my aura neo4j instance?

https://github.com/tomasonjo/blogs/blob/8289fc3272625de35974ed355db44fa1c58a4e09/llm/llama_index_neo4j_custom_retriever.ipynb#L93C1-L93C46

in console, i only see

neo4j+s://myhash.databases.neo4j.io

...which doesn't work if i put that as the url in your notebook: https://github.com/tomasonjo/blogs/blob/master/llm/llama_index_neo4j_custom_retriever.ipynb

but do you have comparison for RAG with and without Neo4j knowledge graph

great code
but do you have comparison for RAG with and without Neo4j knowledge graph

Solution to "AttributeError: 'NoneType' object has no attribute 'names'"

When I was testing I found the following error：

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-45-054dd3707a68>](https://localhost:8080/#) in <cell line: 1>()
----> 1 print(structured_retriever("Who is Elizabeth I?"))

[<ipython-input-44-1a77de7b1cb4>](https://localhost:8080/#) in structured_retriever(question)
     24     result = ""
     25     entities = entity_chain.invoke({"question": question})
---> 26     for entity in entities.names:
     27         response = graph.query(
     28             """CALL db.index.fulltext.queryNodes('entity', $query, {limit:2})

AttributeError: 'NoneType' object has no attribute 'names'

In fact, you only need to change the openai model from gpt-3.5-turbo-0125 to gpt-4-0125-preview to solve this problem perfectly. I don't know why this is the case, but it does work. Apart from this, there is no problem with this tutorial. Thank you very much to the author!

neo4j connection failure in graph_based_prefiltering notebook

getting this error - please help

JavaNullPointer - Neo.ClientError.Procedure.ProcedureCallFailed - when running cosine similarity for embedding

https://github.com/tomasonjo/blogs/blob/master/llm/Neo4jOpenAIApoc.ipynb
I've ran into error when running retrieve_context() function:

Neo.ClientError.Procedure.ProcedureCallFailed

After looking for a while, I've found out that it's happen with the followings code:

// retrieve the embedding of the question
CALL apoc.ml.openai.embedding([
apiKey) YIELD embedding
// match relevant movies
MATCH (m:Movie)
WITH m, gds.similarity.cosine(embedding, m.embedding) AS score
ORDER BY score DESC
// limit the number of relevant documents
LIMIT toInteger($k)

and then fixed as bellow (adding "WHERE ..." clause)

    // retrieve the embedding of the question
CALL apoc.ml.openai.embedding([$question], $apiKey) YIELD embedding
// match relevant movies
MATCH (m:Movie)
WHERE m.embedding IS NOT NULL AND size(m.embedding) = 1536
WITH m, gds.similarity.cosine(embedding, m.embedding) AS score
ORDER BY score DESC
// limit the number of relevant documents
LIMIT toInteger($k)

Thank you very much for helpfull notebook <

"Node_Label" in the Neo4jVector couldn't be modified

@tomasonjo I am trying to initialize the vector store from the existing graph to perform for vector similarity search. But whatever the value assigned for the "node_label" property wasn't considered.

In the below code example, I assign the "Codeexample" value to the "node_label". However, after the vector store initialization I printed the value of "node_label" and it simply return "Document" every time. This makes my vector similarity search to return empty results.

`from neo4j import GraphDatabase
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import Neo4jVector

driver = GraphDatabase.driver(url, auth=(username, password))

Initialize LangChain components

embedding = OpenAIEmbeddings()
vector_store = Neo4jVector.from_existing_graph(
embedding=embedding,
url=url,
username=username,
password=password,
node_label="Codeexample",
text_node_properties=["id", "name", "description", "code"],
embedding_node_property="embedding",
)

Print Debug Information

print("Node Label:", vector_store.node_label)
----- Node Label: Document ------

Process natural language query

query = "My code example query"
query_embedding = embedding.embed_query(query)

Search in the knowledge graph

results = vector_store.similarity_search(query, k=1)
results`

Please check this and let me know if I am missing anything here. Thanks in advance :)

How to go with HuggingFace* instead of ChatOpenAI?

Hi @tomasonjo , thank you very much for sharing this very informative material.

On this notebook, how could I change

llm = ChatOpenAI(model="gpt-3.5-turbo-16k", temperature=0)

from langchain.llms import HuggingFaceHub

llm = HuggingFaceHub(
    repo_id=repo_id, model_kwargs={"temperature": TEMPERATURE, "max_length": MAX_TOKENS}
)

or any other HuggingFacePipeline, and still make the tutorial work?

Of course, cypher_chain's llms would also have to be changed to other pipelines, but I have not got there yet.

The error I get is:

File ~/Projects/blogs/openaifunction_constructing_graph.py:277, in extract_and_store_graph(document, nodes, rels)
    274 
    275 extract_chain = get_extraction_chain(nodes, rels)
--> 277 data = extract_chain.run(document.page_content)
    278

File ~/anaconda3/envs/master/lib/python3.8/site-packages/langchain/chains/base.py:507, in Chain.run(self, callbacks, tags, metadata, *args, **kwargs)
    505     if len(args) != 1:
    506         raise ValueError("`run` supports only one positional argument.")
--> 507     return self(args[0], callbacks=callbacks, tags=tags, metadata=metadata)[
    508         _output_key
    509     ]
    511 if kwargs and not args:
    512     return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[
    513         _output_key
    514     ]

File ~/anaconda3/envs/master/lib/python3.8/site-packages/langchain/chains/base.py:312, in Chain.__call__(self, inputs, return_only_outputs, callbacks, tags, metadata, run_name, include_run_info)
    310 except BaseException as e:
    311     run_manager.on_chain_error(e)
--> 312     raise e
    313 run_manager.on_chain_end(outputs)
    314 final_outputs: Dict[str, Any] = self.prep_outputs(
    315     inputs, outputs, return_only_outputs
    316 )

File ~/anaconda3/envs/master/lib/python3.8/site-packages/langchain/chains/base.py:306, in Chain.__call__(self, inputs, return_only_outputs, callbacks, tags, metadata, run_name, include_run_info)
    299 run_manager = callback_manager.on_chain_start(
    300     dumpd(self),
    301     inputs,
    302     name=run_name,
    303 )
    304 try:
    305     outputs = (
--> 306         self._call(inputs, run_manager=run_manager)
    307         if new_arg_supported
    308         else self._call(inputs)
    309     )
    310 except BaseException as e:
    311     run_manager.on_chain_error(e)

File ~/anaconda3/envs/master/lib/python3.8/site-packages/langchain/chains/llm.py:104, in LLMChain._call(self, inputs, run_manager)
     98 def _call(
     99     self,
    100     inputs: Dict[str, Any],
    101     run_manager: Optional[CallbackManagerForChainRun] = None,
    102 ) -> Dict[str, str]:
    103     response = self.generate([inputs], run_manager=run_manager)
--> 104     return self.create_outputs(response)[0]

File ~/anaconda3/envs/master/lib/python3.8/site-packages/langchain/chains/llm.py:258, in LLMChain.create_outputs(self, llm_result)
    256 def create_outputs(self, llm_result: LLMResult) -> List[Dict[str, Any]]:
    257     """Create outputs from response."""
--> 258     result = [
    259         # Get the text of the top generated string.
    260         {
    261             self.output_key: self.output_parser.parse_result(generation),
    262             "full_generation": generation,
    263         }
    264         for generation in llm_result.generations
    265     ]
    266     if self.return_final_only:
    267         result = [{self.output_key: r[self.output_key]} for r in result]

File ~/anaconda3/envs/master/lib/python3.8/site-packages/langchain/chains/llm.py:261, in <listcomp>(.0)
    256 def create_outputs(self, llm_result: LLMResult) -> List[Dict[str, Any]]:
    257     """Create outputs from response."""
    258     result = [
    259         # Get the text of the top generated string.
    260         {
--> 261             self.output_key: self.output_parser.parse_result(generation),
    262             "full_generation": generation,
    263         }
    264         for generation in llm_result.generations
    265     ]
    266     if self.return_final_only:
    267         result = [{self.output_key: r[self.output_key]} for r in result]

File ~/anaconda3/envs/master/lib/python3.8/site-packages/langchain/output_parsers/openai_functions.py:174, in PydanticAttrOutputFunctionsParser.parse_result(self, result, partial)
    173 def parse_result(self, result: List[Generation], *, partial: bool = False) -> Any:
--> 174     result = super().parse_result(result)
    175     return getattr(result, self.attr_name)

File ~/anaconda3/envs/master/lib/python3.8/site-packages/langchain/output_parsers/openai_functions.py:157, in PydanticOutputFunctionsParser.parse_result(self, result, partial)
    156 def parse_result(self, result: List[Generation], *, partial: bool = False) -> Any:
--> 157     _result = super().parse_result(result)
    158     if self.args_only:
    159         pydantic_args = self.pydantic_schema.parse_raw(_result)  # type: ignore

File ~/anaconda3/envs/master/lib/python3.8/site-packages/langchain/output_parsers/openai_functions.py:26, in OutputFunctionsParser.parse_result(self, result, partial)
     24 generation = result[0]
     25 if not isinstance(generation, ChatGeneration):
---> 26     raise OutputParserException(
     27         "This output parser can only be used with a chat generation."
     28     )
     29 message = generation.message
     30 try:

OutputParserException: This output parser can only be used with a chat generation.

Error while running "wd = uc.Chrome(options=options)"

hi, when I run the following code:

import undetected_chromedriver.v2 as uc
from pyvirtualdisplay import Display

display = Display(visible=0, size=(800, 600))
display.start()

options = uc.ChromeOptions()
options.add_argument("--no-sandbox")
wd = uc.Chrome(options=options)

I enconter this error :

WebDriverException Traceback (most recent call last)
in <cell line: 9>()
7 options = uc.ChromeOptions()
8 options.add_argument("--no-sandbox")
----> 9 wd = uc.Chrome(options=options)

4 frames
/usr/local/lib/python3.10/dist-packages/selenium/webdriver/common/service.py in assert_process_still_running(self)
117 return_code = self.process.poll()
118 if return_code:
--> 119 raise WebDriverException(f"Service {self.path} unexpectedly exited. Status code was: {return_code}")
120
121 def is_connectable(self) -> bool:

WebDriverException: Message: Service /root/.local/share/undetected_chromedriver/1b2da929686a5f20_chromedriver unexpectedly exited. Status code was: 1

I would be thankful if anyone could help me.

Dataset details

Using script: https://github.com/tomasonjo/blogs/blob/master/llm/graph_based_prefiltering.ipynb

It is not clear which dataset was used. Could you provide a link from Neo4j Datasets ?

Error at # Define random walk query

Following along. I have worked within NEO4J desktop and the data appears correct.
I am getting at error trying to implement your code for word2vec.

`# Define random walk query
random_walks_query = """

MATCH (node)
CALL gds.alpha.randomWalk.stream('all', {
start: id(node),
steps: 15,
walks: 5
})
YIELD nodeIds
// Return the names or the titles
RETURN [id in nodeIds |
coalesce(gds.util.asNode(id).name,
gds.util.asNode(id).title)] as walks

"""

Fetch data from Neo4j

with driver.session() as session:
walks = session.run(random_walks_query)

Train the word2vec model

clean_walks = [row['walks'] for row in walks]
model = Word2Vec(clean_walks, sg=1, window=5, size=100)

Inspect results

model.most_similar('olive oil')`

I am getting:

TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

TypeError Traceback (most recent call last)
in
20 # Train the word2vec model
21 clean_walks = [row['walks'] for row in walks]
---> 22 model = Word2Vec(clean_walks, sg=1, window=5, size=100)
23 # Inspect results
24 model.most_similar('olive oil')

AttributeError: 'NoneType' object has no attribute 'nodes'

threr are some prombles when using process_response and convert_to_graph_documents:
AttributeError: 'NoneType' object has no attribute 'nodes'
in
llm=ChatOpenAI(model_name="gpt-3.5-turbo-0125") # gpt-4-0125-preview occasionally has issues llm_transformer = LLMGraphTransformer(llm=llm) document = Document(page_content="Elon Musk is suing OpenAI") print(document) graph_document = llm_transformer.process_response(document)
and
llm=ChatOpenAI(model_name="gpt-3.5-turbo-0125") # gpt-4-0125-preview occasionally has issues llm_transformer = LLMGraphTransformer(llm=llm) document = Document(page_content="Elon Musk is suing OpenAI") print(document) graph_documents = llm_transformer.convert_to_graph_documents([document]) graph.add_graph_documents( graph_documents, baseEntityLabel=True, include_source=True )
who can help me?

Query regarding ms-graphrag.ipynb notebook

I recently came across your blogs repository and noticed that you have implemented the query-focused Graphrag approach. However, it appears that the implementation is incomplete.

I am currently seeking a comprehensive implementation of this approach and was excited to find your work. Could you please let me know if you had the opportunity to complete it? If so, I would greatly appreciate it if you could share your implementation. If not, any insights into the reasons for leaving it unfinished would be highly valuable.

ValueError: Cannot build index from nodes with no content. Please ensure all nodes have content.

I run the same code and in this line:
Takes 10 min without GPU / 1 min with GPU on Google collab
index = MultiModalVectorStoreIndex.from_documents(
text_docs + all_images, storage_context=storage_context, image_vector_store=image_store
)
I have this error:
ValueError: Cannot build index from nodes with no content. Please ensure all nodes have content.

Error when using custom retriever of LlamaIndex example of PropertyGraph

in the last part of the code provided in the example taken from :

https://www.llamaindex.ai/blog/customizing-property-graph-index-in-llamaindex

response = query_engine.query("What do you know about Maliek Collins or Darragh O’Brien?")
print(str(response))

CypherSyntaxError: {code: Neo.ClientError.Statement.SyntaxError} {message: Unknown function 'vector.similarity.cosine' (line 3, column 21 (offset: 120))
"            WITH e, vector.similarity.cosine(e.embedding, $embedding) AS score"
                     ^}

I have upgraded Llamaindex and neo4j to the latest with no success.

openaifunction_constructing_graph language support

Hi everyone,

I tried to generate knowledge graph, I can achieve to create my knowledge graph from unstructure data. But I want to generate is as Turkish Language sucs node name, relation name etc. I am using "openaifunction_constructing_graph" method. How does I give language code for generation process?

Embedding similarity search

I created a vector index with a hugging face embedding. I see the embeddings in the graph. The vector_index.similarity_search always returns an empty response. Embeddings.embed_query does give a vector. Am I missing something?