Placeholder Repo for Issues List
patentsview-disambiguation's Introduction
patentsview-disambiguation's People
Forkers
syyunn christykoh markhzz ajafferj tbikaist tbarkai chaoslogic weiliangt patentsview yangchengwu2023 mrp3anutpatentsview-disambiguation's Issues
Missing input in \pv\disambiguation\assignee\run_clustering.py
Hi Monath,
I'm sorry to bother you! I'm a beginner trying to learn your disambiguation program, and I notice that in the code \pv\disambiguation\assignee\run_clustering.py, there is a missing input 'data/assignee/permid/permid_vectorizer.pkl'. This input was further used in the model.py:
name_tfidf = SKLearnVectorizerFeatures(**flgs.assignee_name_model**,
'name_tfidf',
lambda x: clean(split(x.normalized_most_frequent)))
Would you mind sharing this file? or would you mind describing this file. I'm sorry if my question is a little bit naive. Thank you so much for your help!
Best,
Mark
assignee disambiguation: incorporate location in the measure of similarity
Hi Monath,
Sorry to bother you again!!
I was trying to learn from your program. I checked your presentation slides at the USPTO Symposium, where you mentioined that "The assignee model is based on a tf-idf character n-gram string similarity model that uses data from PermID."
Just to confirm, the program uses the location and name spelling similarity to compute the similarity, right? I make that inference because the program encodes three features, where the locations and name_tfidf are used for computing the similarity, and entity_kb_feat is used as constraint.
triples = [(locations, FeatCalc.DOT, CentroidType.NORMED, False, False),
(entity_kb_feat, FeatCalc.NO_MATCH, CentroidType.BINARY, False, True),
(name_tfidf, FeatCalc.DOT, CentroidType.NORMED, False, False)]
Thank you so much!! I'm sorry for bothering you again!
Best,
Mark
Username and password arguments in database config
I was wondering what should go into the following three arguments mentioned in the config/database_config.ini.
[DATABASE]
host =
username =
password =
How to prepare the raw data for disambiguation
Hi Monath,
I want to run the build for the assignee data but I don't understand how the sqlite database should be prepared and where to get the raw data from. I'd really appreciate your help.
Usage - Documentation / Notebook
I think you have developed a state of the art corporate name disambiguation/harmonization engine and that is pretty exciting. This could be very helpful for many research topics within finance. Have you thought about creating a python package or a notebook/documentation that can give researchers a foothold in using the software including the recent updates you have made. https://github.com/PatentsView/PatentsView-Disambiguation/tree/main/pv/disambiguation/assignee
Cheers,
Derek
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.