Giter Club home page Giter Club logo

sfr-utilities's Introduction

ResearchNow Utilities

These scripts contain several tools that can be used to invoke test instances of the ResearchNow data ingest pipeline. They can also be used to read results from the Kinesis streams that make up this pipeline. At present there are two scripts that provide this monitoring, but it is likely that others will be added.

Installation

This repo can be cloned and the dependencies installed from the requirements.txt via pip. The scripts are compatible with python3 (up through 3.7) only

Scripts

ePub Kinesis Ingest

This script can read or write to the Kinesis stream that triggers the lambda that processes and stores the ePubs in s3. At present the ingest process runs off of a locally stored postgresql instance, so that mode will fail. However it can be used to monitor the status of the ingest stream.It can be run with python3 psqlToKinesis.py GET This will trigger an export of all records put to the Kinesis stream in the past 24 hours. Please be aware that due to throttling limits imposed by Kinesis, this can take some time to catch up to the present and produce results

ePub Ingest Result

This script reports on the most recently stored results in s3 (up to the past 24 hours). By default it will generate a list of files stored in the past hour but this can be adjust by passing a timestamp to the script. It is invoked with python3 getIngestresults.py [optional-timestamp] The report includes the zipped files and the links to the content.opf file of each exploded ePub file. These links can then be verified for access/fidelty/accuracy. The timestamp should be formatted as 2018-11-09T12:00:00Z

sfr-utilities's People

Contributors

mwbenowitz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.