Giter Club home page Giter Club logo

tog's Introduction

ToG

News!

Our paper is accepted by ICLR 2024 ! 🥳🥳🥳

ToG is moved to a new repo ToG.

The code for paper: "Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph".

Here is the illustration of ToG:

image

The pipeline of ToG:

image

Project Structure

  • requirements.txt: Pip environment file.
  • data/: Evaluation datasets. See data/README.md for details.
  • CoT/: CoT methods. See CoT/README.md for details.
  • eval/: Evaluation script. See eval/README.md for details.
  • Freebase/: Freebase environment setting. See Freebase/README.md for details.
  • Wikidata/: Wikidata environment setting. See Wikidata/README.md for details.
  • tools/: Common tools used in ToG. See tools/README.md for details.
  • ToG/: Source codes.
    • client.py: Pre-defined Wikidata APIs, copy from Wikidata/.
    • server_urls.txt: Wikidata server urls, copy from Wikidata/.
    • main_freebase.py: The main file of ToG where Freebase as KG source. See README.md for details.
    • main_wiki.py: Same as above but using Wikidata as KG source. See README.md for details.
    • prompt_list.py: The prompts for the ToG to pruning, reasoning and generating.
    • freebase_func.py: All the functions used in main_freebase.py.
    • wiki_func.py: All the functions used in main_wiki.py.
    • utils.py: All the functions used in ToG.

Get started

Before running ToG, please ensure that you have successfully installed either Freebase or Wikidata on your local machine. The comprehensive installation instructions and necessary configuration details can be found in the README.md file located within the respective folder.

The required libraries for running ToG can be found in requirements.txt.

When using the Wikidata service, copy the client.py and server_urls.txt files from the Wikidata directory into the ToG folder.

How to run

See ToG/ README.md

How to eval

Upon obtaining the result file, such as ToG_cwq.jsonl, you should using the jsonl2json.py script from the tools directory to convert the ToG_cwq.jsonl to ToG_cwq.json. Then, evaluate using the script in the eval folder (see README.md in eval folder).

How to cite

If you interested or inspired by this work, you can cite us by:

@misc{sun2023thinkongraph,
      title={Think-on-Graph: Deep and Responsible Reasoning of Large Language Model with Knowledge Graph}, 
      author={Jiashuo Sun and Chengjin Xu and Lumingyuan Tang and Saizhuo Wang and Chen Lin and Yeyun Gong and Heung-Yeung Shum and Jian Guo},
      year={2023},
      eprint={2307.07697},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Experiment:

image

Application:

image

Claims

This project uses the Apache 2.0 protocol. The project assumes no legal responsibility for any of the model's output and will not be held liable for any damages that may result from the use of the resources and output.

tog's People

Contributors

gasolsun36 avatar saizhuowang avatar soledad921 avatar tanglumy avatar

Stargazers

张梓铖 avatar JohnZhou avatar Jeongwhan Choi avatar chernistry avatar aiwym avatar Doris Zhang avatar J Wynia avatar Mohd Kaif avatar SY avatar  avatar  avatar  avatar  avatar 刘向琦 avatar  avatar Nikolaus Schlemm avatar  avatar Hlucomeister avatar  avatar Iftitahu Ni'mah avatar XINGYU TAN avatar Marco Magni avatar  avatar Zhenhua Han avatar Ziyu Gui avatar  avatar 梁云志 avatar Luis Alejandro Smith avatar Jiangyuan Li avatar  avatar polya avatar Alexander Kai Chen avatar  avatar  avatar  avatar  avatar Fourier7754 avatar Hokeun Kim avatar Tong Zhu (朱桐) avatar 陈剑雄 avatar skykiseki avatar Steve Wu avatar Rayman avatar Enze Wu avatar Zhou Tuo avatar HuiKang Hu avatar  avatar Muzhi LI avatar  avatar Faye avatar Qy avatar  avatar  avatar FJDEV avatar Yuan Sui avatar Zicheng Zhao avatar  avatar  avatar Robin Wang avatar Pluto avatar Lukasz Hanusik avatar xiaoning avatar Satyam Tiwary avatar geforcefan avatar vivieast avatar liushixuan avatar  avatar Lei Li avatar  avatar  avatar liyichen avatar Pavarissy avatar Jian Xiang avatar Jagger_Yang avatar Tommaso Calò avatar Mohammad Ali Alomrani avatar  avatar Cola Chan avatar Junchi Yu (余俊驰) avatar ken avatar KeHao Wu avatar DengRongsong avatar Stan Zhai avatar BoniHu avatar  avatar bushuhui avatar  avatar tensorboy avatar Sitao Cheng avatar  avatar  avatar Lemonade avatar SaraiQx avatar  avatar 爱可可-爱生活 avatar Ling-Hao CHEN avatar  avatar dangerous avatar Simone Colombani avatar Xintao Wang avatar

Watchers

 avatar  avatar

tog's Issues

CWQ Ground Truth

Where did you obtain the ground truth answers for the CWQ dataset?
I just reran the provided SPARQL queries on my installation of Freebase and I am getting 81.11% answers correct.
Shouldn't I be getting 100%?

Prompt for Generating topic entities

Hi! Thank you very much for your work.
In your paper you write "ToG first prompts LLMs to automatically extract the topic entities in question and gets the top-N topic entities", but could not find the prompt, which you used to extract those topic entities?
In the repository they seem to be already hard-coded in the dataset JSONs due to a previous pre-processing step?

Could you describe how you extracted the topic entities?

How can I ensure that Freebase is installed correctly?

What's the correct result after running the Test example?
My result is {'head': {'link': [], 'vars': ['name3']}, 'results': {'distinct': False, 'ordered': True, 'bindings': []}}.
I think there is some mistake, but I follow the readme in the "Freebase" folder.

Some confusion about the data set

Is this method validated on relatively complex knowledge graph inference datasets such as win18rr or nell datasets? Recent papers have generally not tested on these datasets, is this because the previous dataset was too complex or because of other considerations?

How can I ensure that Freebase is installed correctly?

What's the correct result after running the Test example?
My result is {'head': {'link': [], 'vars': ['name3']}, 'results': {'distinct': False, 'ordered': True, 'bindings': []}}.
I think there is some mistake, but I follow the readme in the "Freebase" folder.

Linking topic entities to KG

Hi,

Once you generate the initial topic entities using the LLM, how do you find the corresponding entities in the KG? The LLM could output entities that do not exist in the KG. Also, what is the prompt used to find the topic entities?

Thanks,
Mohammad

Question about the datasets

Hi! Thank you for your work!

I ran the test example without installing the Freebase dataset and encountered an error at "results = sparql.query().convert()", which says: "{URLError} <urlopen error [Errno 111] Connection refused". When I run main_freebase.py, it also reports "urllib.error.URLError: <urlopen error [Errno 110] Connection timed out". Can this issue be resolved by installing the Freebase dataset, or is it caused by an internet connection error? Thank you!

Additionally, is it possible to avoid installing the dataset? My server doesn't have 400GB of RAM, making it very difficult for me to run the program. Projects like RoG (Reasoning on Graph) can run on WebQSP and CWQ with only a few hundred MBs of memory. Is it possible to run ToG with a single dataset (like WebQSP) with less memory consumption? Thank you!

Question about CWQ entity linking detail steps.

hi, thanks for your talent work.
I see that topic entities in CWQ have already been linked to Freebase and Wikidata. In the paper, it says"TOG first prompts LLMs to automatically extract the topic entities in question and gets the top-N topic entities. " But the preprocessing details are missing in the code.

Some bugs while reimplement

Hi, I found some bug while reimplement ToG:

  • ./ToG/utils.py line 155: there is a typo of missing underline between “pre” and “head”
  • ./ToG/README.md lines 12 and 23: duplicated argument name, I think the duplicated one should be temperature_reasoning.
  • ./ToG/prompt_list.py lines 14 and 20: typo, it should be Entities instead of Entites. Does it affect LLM’s performance?
  • ./ToG/utils.py lines 105-110: why the code called llama server when ”llama" not in engine.lower()? I think it should call GPT 3.5-turbo in this situation.

Request for input-output text of LLMs

Hi, there

I appreciate the work you've put into this project.

I kindly request that you upload the input-output text data from LLMs during the "Thinking" process. This will provide a deeper understanding of the model's thinking process and its responses. It can also assist in identifying any biases or limitations in the model's output.

Thank you in advance for your consideration.

About the cost

How much does it cost to call the GPT3.5API if the results of the experiment need to be replicated?

Request for CMD to Reproduce Results on WebQSP with Chat-GPT-3.5

Hello,

I am trying to reproduce the results on WebQSP using Chat-GPT-3.5, but I am facing some challenges with the provided command. I find that the exploration process is sensitive to the temperature. I was wondering if you could provide the correct command to reproduce the results successfully.

I currently use python main_freebase.py --dataset webqsp --max_length 256 --temperature_exploration 1 --temperature_reasoning 0 --width 3 --depth 2 --remove_unnecessary_rel True --LLM_type gpt-3.5-turbo --opeani_api_keys sk-xxxx --num_retain_entity 5 --prune_tools llm

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.