Giter Club home page Giter Club logo

cccatalog-api's Issues

Some tags are empty in our database

Example: https://ccsearch.creativecommons.org/photos/15621775

Error:

Traceback (most recent call last):                                                                                    
  File "/usr/local/lib/python3.7/site-packages/rest_framework/fields.py", line 441, in get_attribute                  
    return get_attribute(instance, self.source_attrs)                                                                 
  File "/usr/local/lib/python3.7/site-packages/rest_framework/fields.py", line 98, in get_attribute                   
    instance = instance[attr]                                                                                         
KeyError: 'name' 

Tags for this image (note the empty tag):

15621775 | [{}, {"name": "business", "accuracy": 0.97734, "provider": "clarifai"}, {"name": "man", "accuracy": 0.93226, "provider": "clarifai"}, {"name": "paper", "accuracy": 0.89205, "provider": "clarifai"}, {
"name": "presentation", "accuracy": 0.89086, "provider": "clarifai"}, {"name": "people", "accuracy": 0.86843, "provider": "clarifai"}, {"name": "contemporary", "accuracy": 0.85604, "provider": "clarifai"}, {"nam
e": "identity", "accuracy": 0.84483, "provider": "clarifai"}, {"name": "facts", "accuracy": 0.81679, "provider": "clarifai"}, {"name": "blank", "accuracy": 0.81582, "provider": "clarifai"}, {"name": "intelligenc
e", "accuracy": 0.81183, "provider": "clarifai"}, {"name": "bill", "accuracy": 0.80434, "provider": "clarifai"}, {"name": "achievement", "accuracy": 0.7914, "provider": "clarifai"}, {"name": "horizontal", "accur
acy": 0.78989, "provider": "clarifai"}, {"name": "education", "accuracy": 0.77418, "provider": "clarifai"}, {"name": "template", "accuracy": 0.77406, "provider": "clarifai"}, {"name": "fine-looking", "accuracy":
 0.77116, "provider": "clarifai"}, {"name": "indoors", "accuracy": 0.76115, "provider": "clarifai"}, {"name": "company", "accuracy": 0.75583, "provider": "clarifai"}, {"name": "looking", "accuracy": 0.75398, "pr
ovider": "clarifai"}, {"name": "banner", "accuracy": 0.74502, "provider": "clarifai"}]

Build PostgreSQL --> Elasticsearch syncer

Check Postgres for updates on tables every few seconds. Whenever there is sizable change to the table, use the bulk insert API to push the data to Elasticsearch.

This should improve our indexing speed considerably over the old Django signals based implementation.

Use something besides primary key to identify objects exposed from the API

Primary keys are an implementation detail that can change without notice (such as if we were to migrate to a new database). As such, we need to expose an alternative identifier besides the primary key.

This means we will have to set an alternative unique identifier on each image when it is loaded in the database. For entities that are created on the API server, such as lists, this can be done upon creation, but for entities created by the data pipeline (images), the unique identifier must be assigned when it is loaded into the database. See uuslug; we would have to perform an operation similar to this in the data pipeline.

@sclachar, any thoughts on this? Could we use the "identifier" column for this purpose (after adding a unique and not null constraint)?

Search by tag query in ES

Once the machine generated tags have been denormalized and stored in the 'image' table, I will get started on this.

Use `removed_from_source` field to filter dead links from Elasticsearch

We routinely crawl the web and consume APIs to find CC licensed works. As a search engine, we have to deal with link rot; images are often deleted or moved from their original location. During crawling, if we detect that an image has been deleted, we will mark it with the removed_from_source field in our database.

ingestion-server, our tool for building an Elasticsearch index, will need to be updated to discard images where removed_from_source=True.

Return authorization token upon list creation

Users need to be able to update and delete lists anonymously. To accommodate this, we can return an authorization token, which can then be stored on the user's machine (either in a cookie or localstorage).

It will be sent in plain text over SSL. An attacker with local access to the user's machine could steal this token and make updates to other lists, but this isn't really a threat to be concerned about since the stakes are really low.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.