wipacrepo / file-catalog-indexer Goto Github PK
View Code? Open in Web Editor NEWIndexing package and scripts for the File Catalog
License: MIT License
Indexing package and scripts for the File Catalog
License: MIT License
REST requesting is (A LOT) faster than check-summing a file. So, first, check if the file already exists in the FC before collecting its metadata.
Only proceed with collecting metadata if --patch
is included. (depends on #23)
Replace --no-patch
with --patch
. Since not patching is the most common usage, it shouldn't require a command-line option.
We can now use:
wipac-file-catalog
(pending WIPACrepo/file_catalog#127)iceprod
(pending WIPACrepo/iceprod#307)wipac-rest-tools[telemetry]
We will only need the Packaging the repo could have benefits (which means publishing to PyPI, etc), flake8
and mypy
jobs in wipac-cicd.yml
, at a minimum.but this is not necessary, though tempting so let's do it! (required for @blinkdog's new disk pipeline).
See WIPACrepo/wipac-dev-tools#20
Edit: Indexer will now be published as a package (see #43)
Currently, "content_status"
is solely based on whether the .i3 file can be read.
There are also the "good runs" list files. Do we want to consider these? This could be a new field in the FC record.
Optionally, we wait until we have an event-based store since this matches the "good run" granularity.
Currently, the top year is 2020 (IC86-10). We will need to increase this list, and potentially expand outward. Unrecognized years/seasons will cause this indexer to fail fast.
There's a potential race condition when indexing L2 files, if the client script is using index_file()
directly and sharing a single MetadataManager
instance between threads. This isn't an issue for using index()
.
file-catalog-indexer/indexer/metadata_manager.py
Lines 117 to 120 in 4923e60
Solutions include
threading.Lock()
context manager around the above codeself.L2_dir_data
) keyed on dir_path (instead of a single self.dir_path
& self.real_l2_dir_metadata
)Find metadata fields for files in /data/sim and index into File Catalog.
Check these directories:
resources/path_collector/
resources/filename_patterns/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.