Giter Club home page Giter Club logo

iswc-challenge's Introduction

I'm a PostDoc student at the University of Edinburgh. My research focuses on combining symbolic reasoning and machine learning, or "Neuro-Symbolic Learning". I'm also interested in Personal Knowledge Management and developed some plugins for Obsidian. I obtained my PhD at the VU Amsterdam in 2024.

iswc-challenge's People

Contributors

dimitrisalivas avatar hemile avatar jankalo avatar miselico avatar selbaez avatar thiviyant avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

iswc-challenge's Issues

What if we use LLMs to predict aliases?

I had this idea since we are struggling with aliases... It's not as "cheating" as querying WikiData + it fits the spirit of the competition (which involves LLMs)

It looks like for the case of NASA it does the trick.

Prompt:

Known aliases for Stanford University include:
['stanford university', 'stanford']

Known aliases for Apple include:
['apple inc.', 'apple']

Known aliases for Hewlett-Packard include:
['hewlett-packard', 'hp']

Known aliases for NASA include:

Output:
['national aeronautics and space administration', 'nasa']

Prompt for PersonCauseOfDeath

Prompt: How did {} die?


How did Aretha Franklin die?

['pancreatic cancer', 'cancer']

How did Bill Gates die?

['NONE']

How did Ennio Morricone die?

['femoral fracture', 'fracture']

How did Frank Sinatra die?

['myocardial infarction', 'infarction']

How did Michelle Obama die?

['NONE']

<==========================>
GPT-3's answers:

How did Johnny Cash die? ['diabetes', 'diabetes mellitus']

How did Albert Einstein die? ['abdominal aortic aneurysm', 'aneurysm']

How did Paul McCartney die? ['NONE']

Solid predictions here...

Prompt for PersonInstrument

This kind of works, but it's not great.

Which instruments does Liam Gallagher play?
['maraca', 'guitar']
Which instruments does Liam Gallagher play?
['upright piano', 'piano', 'guitar', 'harmonica']
Which instruments does Jay Park play?
['NONE']
Which instruments does Axl Rose play?
['guitar', 'piano', 'pander', 'bass']
Which instruments does Neil Young play?
['guitar']
Which instruments does Matt Bellamy play?

Validating with TRUE/FALSE on each result seems to be a bit better
image
image

Optimizing prompts

Just leaving it here as a perhaps stupid thing we could try.

Currently, we are manually trying prompts with trial and error. We could choose to instead optimize the prompt itself.

This could be done by first getting a set of related words (could be generated by the LM itself). Then, we can use some way to optimize the prompts based on these words. Perhaps using some form of evolutionary computing for this.

Prompt for PersonLanguage

Prompt: Which languages does {} speak?


Which languages does Aamir Khan speak?

['hindi', 'english', 'urdu']

Which languages does Pharrell Williams speak?

['english']

Which languages does Shakira speak?

['catalan', 'english', 'portuguese', 'spanish']

<============================>

Which languages does Novak Djokovic speak?

GPT-3’s answer: ['serbian', 'english', 'french', 'german']

Ground Truth

Yesterday, I came across a few examples where GPT-3 predicted correct answers but those answers were not in the ground truth. Since the model is evaluated on F1 scores, this could affect us negatively.

Prompt for CountryBordersWithCountry

Prompt: Which countries share a border with {}?

Which countries share a border with Greece?

['albania', 'turkey', 'bulgaria']

Which countries share a border with Bosnia and Herzegovina?

['montenegro', 'croatia', 'serbia']

Which countries share a border with Morocco?

['sahara', 'western sahara', 'mauritania', 'algeria', 'spain']

<=====================>
GPT-3's answers:

Which countries share a border with Pakistan?

Which countries share a border with Mozambique? ['malawi', 'tanzania', 'zambia', 'swaziland', 'south africa']

Is fact checking worth it?

Before fact checking:
ChemicalCompoundElement 0.482 0.457 0.465
CompanyParentOrganization 0.745 0.760 0.748
CountryBordersWithCountry 0.631 0.625 0.619
CountryOfficialLanguage 0.623 0.627 0.592
PersonCauseOfDeath 0.500 0.500 0.500
PersonEmployer 0.276 0.335 0.270
PersonInstrument 0.568 0.552 0.531
PersonLanguage 0.755 0.936 0.797
PersonPlaceOfDeath 0.520 0.520 0.520
PersonProfession 0.719 0.514 0.567
RiverBasinsCountry 0.754 0.784 0.757
StateSharesBorderState 0.301 0.232 0.257
*** Average *** 0.573 0.570 0.552

After fact checking:
ChemicalCompoundElement 0.430 0.210 0.272
CompanyParentOrganization 0.760 0.760 0.760
CountryBordersWithCountry 0.623 0.588 0.596
CountryOfficialLanguage 0.670 0.597 0.593
PersonCauseOfDeath 0.500 0.500 0.500
PersonEmployer 0.323 0.268 0.271
PersonInstrument 0.583 0.525 0.540
PersonLanguage 0.863 0.849 0.825
PersonPlaceOfDeath 0.500 0.500 0.500
PersonProfession 0.752 0.423 0.505
RiverBasinsCountry 0.763 0.733 0.735
StateSharesBorderState 0.305 0.227 0.253
*** Average *** 0.589 0.515 0.529

Seems mostly to be worse.

Test fact checking

Fact checking basically just converts all predictions into facts and asks the language model whether this fact is true.

Some interesting failure cases

Detection of errors:
image

image

image

To fix:

  • None results should be different (ie this has no borders).

Prompt for CountryOfficialLanguage

Prompt: Which are the official languages of {}?


Which are the official languages of Finland?

['swedish', 'finnish']

Which are the official languages of India?

['english', 'hindi']

Which are the official languages of Norway?

['norwegian', 'nynorsk', 'sami', 'sámi', 'bokmal', 'nynorsk']

Which are the official languages of Granada?

['grenadian creole english', 'english', 'creole', 'grenadian']

<===============================>

Which are the official languages of Russia?

GPT-3's answer: ['russian', 'belarusian', 'tatar', 'ukrainian']

Prompt for PersonPlaceOfDeath

Prompt: What is the place of death of {}?

What is the place of death of Barack Obama?

['NONE']

What is the place of death of Ennio morricone?

['rome']

What is the place of death of Elvis presley?

['graceland']

What is the place of death of Elon musk?

['NONE']

What is the place of death of Prince?

['chanhassen']

<============================>

What is the place of death of Aretha Franklin?

GPT-3’s answer: ['detroit']

What is the place of death of Bill Gates?

GPT-3’s answer: ['NONE']

Check all prompts

  • Check that all relations have the same number of examples (4?)

  • Check that each relation has at least one None example

  • No line between examples and the question

  • Check that each relation has examples with different answer length

Prompt for StateSharesBorderState

What states border San Marino?

['san leo', 'acquaviva', 'borgo maggiore', 'chiesanuova', 'fiorentino']

What states border Texas?

['chihuahua', 'new mexico, 'nuevo león', 'tamaulipas', 'coahuila', 'louisiana', 'arkansas', 'oklahoma']

What states border Liguria?

['tuscany', 'auvergne-rhoone-alpes', 'piedmont', 'emilia-romagna']

What states border Mecklenberg-western pomerania?

['brandenburg', 'pomeranian', 'schleswig-holstein', 'lower saxony']

What states border Extremadura?

This seems to work fineish. Main problem: Something like "Gelderland" is in the training set, but it's a province, not a state. And using What provinces border Gelderland seems to work better here.

Prompt for PersonProfession

Prompt: What is {}'s profession?

What is Danny DeVito's profession?

['director', 'film director'] 


What is Christina Aguilera's profession?

 ['artist', 'recording artist']


What is Donald Trump's profession?

['businessperson', 'conspiracy theorist', 'politician']

<============================>

What is Bryan Cranston's profession?

GPT-3’s answer: ['actor', 'producer', 'writer']

Prompt for RiverBasinCountry

What countries does the river Drava cross?

['hungary', 'italy', 'austria', 'slovenia', 'croatia']

What countries does the river Huai river cross?

['china']

What countries does the river Paraná river cross?

['bolivia', 'paraguay', 'argentina', 'brazil']

What countries does the river Oise cross?

Works alright.

Meta prompts

An idea from this paper: https://arxiv.org/pdf/2102.07350.pdf.

We can use meta prompts to get the language model to generate its own prompts for a range of tasks.

Something like “This problem asks us to complete the sentences”

Prompts for triple-based Experiment

CountryBordersWithCountry

Dominica CountryBordersWithCountry: ['Venezuela']

North Korea CountryBordersWithCountry: ['South Korea', 'China', 'Russia']

Serbia CountryBordersWithCountry: ['Montenegro', 'Kosovo', 'Bosnia and Herzegovina', 'Hungary', 'Croatia', 'Bulgaria',  'Macedonia', 'Albania', 'Romania']

Fiji CountryBordersWithCountry: []

{subject_entity} {relation}:```


CountryOfficialLanguage

Suriname CountryOfficialLanguage: ['Dutch']

Canada CountryOfficialLanguage: ['English', 'French']

Singapore CountryOfficialLanguage: ['English', 'Malay', 'Mandarin', 'Tamil']

Sri Lanka CountryOfficialLanguage: ['Sinhala', 'Tamil']

{subject_entity} {relation}:```

StateSharesBorderState

San Marino StateSharesBorderState: ['San Leo', 'Acquaviva', 'Borgo Maggiore', 'Chiesanuova', 'Fiorentino']

Whales StateSharesBorderState: ['England']

Liguria StateSharesBorderState: ['Tuscany', 'Auvergne-Rhoone-Alpes', 'Piedmont', 'Emilia-Romagna']

Mecklenberg-Western Pomerania StateSharesBorderState: ['Brandenburg', 'Pomeranian', 'Schleswig-Holstein', 'Lower Saxony']

{subject_entity} {relation}:```


RiverBasinsCountry

Drava RiverBasinsCountry: ['Hungary', 'Italy', 'Austria', 'Slovenia', 'Croatia']

Huai river RiverBasinsCountry: ['China']

Paraná river RiverBasinsCountry: ['Bolivia', 'Paraguay', 'Argentina', 'Brazil']

Oise RiverBasinsCountry: ['Belgium', 'France']

{subject_entity} {relation}:```

ChemicalCompoundElement:

Water ChemicalCompoundElement: ['Hydrogen', 'Oxygen']

Bismuth subsalicylate ChemicalCompoundElement: ['Bismuth']

Sodium Bicarbonate ChemicalCompoundElement: ['Hydrogen', 'Oxygen', 'Sodium', 'Carbon']

Aspirin ChemicalCompoundElement: ['Oxygen', 'Carbon', 'Hydrogen']

{subject_entity} {relation}:```

PersonLanguage:

Aamir Khan PersonLanguage: ['Hindi', 'English', 'Urdu']

Pharrell Williams PersonLanguage: ['English']

Xabi Alonso PersonLanguage: ['German', 'Basque', 'Spanish', 'English']

Shakira PersonLanguage: ['Catalan', 'English', 'Portuguese', 'Spanish', 'Italian', 'French']

{subject_entity} {relation}:```

PersonProfession

Danny DeVito PersonProfession: ['Comedian', 'Film Director', 'Voice Actor', 'Actor', 'Film Producer', 'Film Actor', 'Dub Actor', 'Activist', 'Television Actor']

David Guetta PersonProfession: ['DJ']

Gary Lineker PersonProfession: ['Commentator', 'Association Football Player', 'Journalist', 'Broadcaster']

Gwyneth Paltrow PersonProfession: ['Film Actor','Musician']

{subject_entity} {relation}:```

PersonInstrument:

Liam Gallagher PersonInstrument: ['Maraca', 'Guitar']

Jay Park PersonInstrument: []

Axl Rose PersonInstrument: ['Guitar', 'Piano', 'Pander', 'Bass']

Neil Young PersonInstrument: ['Guitar']

{subject_entity} {relation}:```

PersonEmployer:

Susan Wojcicki PersonEmployer: ['Google']

Steve Wozniak PersonEmployer: ['Apple Inc', 'Hewlett-Packard', 'University of Technology Sydney', 'Atari']

Yukio Hatoyama PersonEmployer: ['Senshu University','Tokyo Institute of Technology']

Yahtzee Croshaw PersonEmployer: ['PC Gamer', 'Hyper', 'Escapist']

{subject_entity} {relation}:```

PersonPlaceOfDeath:

Barack Obama PersonPlaceOfDeath: ['None']

Ennio Morricone PersonPlaceOfDeath: ['Rome']

Elon Musk PersonPlaceOfDeath: ['None']

Prince PersonPlaceOfDeath: ['Chanhassen']

{subject_entity} {relation}:```

PersonCauseOfDeath:

André Leon Talley PersonCauseOfDeath: ['Infarction']

Angela Merkel PersonCauseOfDeath: []

Bob Saget PersonCauseOfDeath: ['Injury', 'Blunt Trauma']

Jamal Khashoggi PersonCauseOfDeath: ['Murder']

{subject_entity} {relation}:```

CompanyParentOrganization:

Microsoft CompanyParentOrganization: ['None']

Sony CompanyParentOrganization: ['Sony Group']

Saab CompanyParentOrganization: ['Saab Group', 'Saab-Scania', 'Spyker N.V.', 'National Electric Vehicle Sweden'', 'General Motors']

Max Motors CompanyParentOrganization: ['None']

{subject_entity} {relation}:```

Scaling experiment

Could be very interesting to make a scaling argument by also running our code on the smaller (cheaper!) Gpt-3 models.

Investigate the Prompts for (potential) leakage

Given that 64% F1 is quite (suspiciously) high, let's go over the prompts one more time to check if we messed up something.

The scores:

performance_post_none_fix


Another potential explanation: The death-related data include a lot of NONE values (if the person is not dead yet). So, perhaps now that we predict NONE correctly we get quite good performance on them.

Question: Is the model just learning who is dead or not, instead of Place or Cause of Death (?)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.