Giter Club home page Giter Club logo

fastpat's People

Contributors

iamlemec avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastpat's Issues

Simhash package

Dear authors,

just to let you know that the package Simhash.py would be perhaps best obtained through pip as opposed to stored in the master file. If not, I get an error (perhaps the package has been updated since then).

Edit: it seems that I need your version of simhash, because the package in the repository does not have a Cluster class defined. However, when I try to use your package, I can not import the mmhash module. Would you happen to know why?

Edit2: if you use python3, you need to install python3-mmhash and it works.

Thank you very much

Error on Parse Grant

Hi,

Testing your pipeline.

  1. Fetch_grant.py (trimmed file in meta to have two files) This seem to work and fetch and expanded data in to ~/data directory.
  2. parse_grant.py is giving me an error:

AttributeError: Can't get attribute 'parse_file_opts' on <module 'mp_main' from '/Users/xxxxxxx/Desktop/patents-master/parse_grant.py'.

Any guidance to resolve?

Andy

Some suggestions on downloading

1, in the windows environment, unzip function in fetch.py is not working.
2, network volatile will stop the loop of downloading files.

For the first issue, I rewrite the code and it may help.

import os
import time
import zipfile

def fetch_file(zurl, output, overwrite=False, dryrun=False, unzip=False):
    system = print if dryrun else os.system
    zflags = '' if overwrite else '-n'

    if not dryrun and not os.path.exists(output):
        print(f'Creating directory {output}')
        os.makedirs(output)

    _, zname = os.path.split(zurl)
    zpath = os.path.join(output, zname)
    fetch = overwrite or not os.path.isfile(zpath)

    if fetch:
        print(f'Fetching {zname}')
        system(f'curl -o {zpath} {zurl} --ssl-no-revoke -x 127.0.0.1:7890') # here I adjust the code for using clash, you can just ignore it or support a another prxoy argument.

    if fetch or unzip:
        print(f'Unzipping {zname}')
        with zipfile.ZipFile(zpath, 'r') as zip_ref:
            zip_ref.extractall(output)

    return fetch

For the second issue, maybe a try...except may help.

If some files is missing, just rerun the code to fill the missing file is fine. But left the download task undone is quite unreasonable.

Hope it helps. And really thanks for your contribution.

load_data.py is not executing

Running load_data.py script throws up this error:
Traceback (most recent call last):
File "load_data.py", line 37, in
concat_files(args.input, args.output, c, t)
File "load_data.py", line 16, in concat_files
first = files[0]
IndexError: list index out of range

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.