iamlemec / fastpat Goto Github PK
View Code? Open in Web Editor NEWParse and cluster USPTO patent data. Includes applications, grants, assignments, and maintenance.
License: MIT License
Parse and cluster USPTO patent data. Includes applications, grants, assignments, and maintenance.
License: MIT License
Dear authors,
just to let you know that the package Simhash.py would be perhaps best obtained through pip as opposed to stored in the master file. If not, I get an error (perhaps the package has been updated since then).
Edit: it seems that I need your version of simhash, because the package in the repository does not have a Cluster class defined. However, when I try to use your package, I can not import the mmhash module. Would you happen to know why?
Edit2: if you use python3, you need to install python3-mmhash and it works.
Thank you very much
Hi,
Testing your pipeline.
AttributeError: Can't get attribute 'parse_file_opts' on <module 'mp_main' from '/Users/xxxxxxx/Desktop/patents-master/parse_grant.py'.
Any guidance to resolve?
Andy
1, in the windows environment, unzip function in fetch.py is not working.
2, network volatile will stop the loop of downloading files.
For the first issue, I rewrite the code and it may help.
import os
import time
import zipfile
def fetch_file(zurl, output, overwrite=False, dryrun=False, unzip=False):
system = print if dryrun else os.system
zflags = '' if overwrite else '-n'
if not dryrun and not os.path.exists(output):
print(f'Creating directory {output}')
os.makedirs(output)
_, zname = os.path.split(zurl)
zpath = os.path.join(output, zname)
fetch = overwrite or not os.path.isfile(zpath)
if fetch:
print(f'Fetching {zname}')
system(f'curl -o {zpath} {zurl} --ssl-no-revoke -x 127.0.0.1:7890') # here I adjust the code for using clash, you can just ignore it or support a another prxoy argument.
if fetch or unzip:
print(f'Unzipping {zname}')
with zipfile.ZipFile(zpath, 'r') as zip_ref:
zip_ref.extractall(output)
return fetch
For the second issue, maybe a try...except may help.
If some files is missing, just rerun the code to fill the missing file is fine. But left the download task undone is quite unreasonable.
Hope it helps. And really thanks for your contribution.
Running load_data.py script throws up this error:
Traceback (most recent call last):
File "load_data.py", line 37, in
concat_files(args.input, args.output, c, t)
File "load_data.py", line 16, in concat_files
first = files[0]
IndexError: list index out of range
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.