Giter Club home page Giter Club logo

Comments (6)

ormsbee avatar ormsbee commented on September 1, 2024

@bradenmacdonald, @symbolist: High level proposal for handling flagging content for removal is ready for review.

from blockstore.

symbolist avatar symbolist commented on September 1, 2024

@ormsbee Thanks for the proposal!

A few observations:

  1. If we had a blacklist of files that applied to a range of bundle versions, if an author edits a course Bundle, how would those files be treated: Will they be copied over and bundle version ranges for the takedown objects updated or will they be skipped in the next bundle version?
  2. Instead of blacklists of files, will storing a blacklist of file hashes (which can get transformed into the appropriate API response representation) make things simpler if we wanted the former behavior?
  3. If we do need to delete data, at a minimum we will need to delete the files in S3 so the clients should be able to handle the case of a file not found error?
  4. When do we completely delete offending data, as opposed to merely blocking access to it?

Maybe the api and the moderation interface should allow specifying that? It can happen at the initial step or later?

from blockstore.

bradenmacdonald avatar bradenmacdonald commented on September 1, 2024

Even though clients are trusted, I think it might be better to return an error when trying to download the actual data of a blacklisted file in the default case, maybe being bypass-able (for the trusted client only) with an ignore_blacklist flag. That seems easier (change code in one place in blockstore) than changing the client code to check the blacklist in every place it's necessary. (I suppose clients may want to check the blacklist anyways in order to display an appropriate message in the UI, though that could also be done by having blockstore return HTTP 451 Unavailable For Legal Reasons.)

I like @symbolist's idea of blacklisting hashes, since a nice feature of Blockstore is that data hashes are tracked for everything. (Not sure what the odds of collision are but I assume negligible). However, for something like a video XBlock OLX, it may not really work effectively because for example there can be many variations of an OLX file that reference the same external video URL, and minor changes to the subtitle settings of other fields can change the hash of the OLX file without changing the fact that it references the video we need to take down.

from blockstore.

ormsbee avatar ormsbee commented on September 1, 2024

(@symbolist) If we had a blacklist of files that applied to a range of bundle versions, if an author edits a course Bundle, how would those files be treated: Will they be copied over and bundle version ranges for the takedown objects updated or will they be skipped in the next bundle version?

My inclination would be to copy the references, extend the range it applies to (or have a range that is essentially "every version after N"). But this is fuzzy, and having ranges at all certainly increases complexity.

(@symbolist) Instead of blacklists of files, will storing a blacklist of file hashes (which can get transformed into the appropriate API response representation) make things simpler if we wanted the former behavior?

(@bradenmacdonald) I like @symbolist's idea of blacklisting hashes, since a nice feature of Blockstore is that data hashes are tracked for everything. (Not sure what the odds of collision are but I assume negligible). However, for something like a video XBlock OLX, it may not really work effectively because for example there can be many variations of an OLX file that reference the same external video URL, and minor changes to the subtitle settings of other fields can change the hash of the OLX file without changing the fact that it references the video we need to take down.

Yeah, I'm liking the hash approach more and more.

I don't think we should worry about many subtle variations of OLX. If people are being malicious and just trying to get around the system, I think that bringing a hammer down on the Bundle or account as a whole is justified, and we shouldn't try to fight it at this layer.

The odds of hash collision (for differing content) are negligible even at a global level, and the only realistic scenario in which I could imagine it happening is that sometime ten years from now, somebody has figured out how to engineer malicious hash collisions for BLAKE2. FWIW, from a practical point of view, I think it's still best to store bad hashes with a Bundle association so that we can query for it efficiently. If we have 10K file hashes in a BundleVersion, it's going to be a lot cheaper to do a single query for all blacklisted hashes associated with this Bundle than to see if any of the 10K show up on a global list.

(@bradenmacdonald) Even though clients are trusted, I think it might be better to return an error when trying to download the actual data of a blacklisted file in the default case, maybe being bypass-able (for the trusted client only) with an ignore_blacklist flag. That seems easier (change code in one place in blockstore) than changing the client code to check the blacklist in every place it's necessary. (I suppose clients may want to check the blacklist anyways in order to display an appropriate message in the UI, though that could also be done by having blockstore return HTTP 451 Unavailable For Legal Reasons.)

Yeah, it's the introspection reason which made me think that maybe we need to keep it accessible. So for instance, say it's a complaint that's lodged against a certain file, and we automatically blacklist it for the moment, but somebody has to review it, etc. Do we have more of a writeup on the product-side requirements of this feature?

Also, it seems like we'd need clients to handle receiving the update that something was blacklisted, since that content may have been published and would need to be removed downstream, say from the modulestore.

I admit, another reason I want to keep it as an additive layer (as opposed to altering the output of the snapshot itself) is because I'm attracted to the idea of implementing it with a plugin relationship. It can manage its own tables about what Bundles are flagged and what hashes are bad and have its own API, and have a small touchpoint where it gets to add its own metadata to the GET view for a single BundleVersion. Once it starts altering the core output of the BundleVersion data, it either has to be wrapped up in the bundles app itself, or else we have to define a more complex plugin model that lets plugins transform values as opposed to having their own namespace to manage.

Maybe that's unavoidable because we need to delete data anyway. But I'd like to try to keep that separation if possible.

from blockstore.

bradenmacdonald avatar bradenmacdonald commented on September 1, 2024

Some minor follow up notes:

  • If it's a hard takedown, there should be notification of everyone who was using the asset, in addition to the content being immediately blocked everywhere it was used
  • If a legal or moderation process has resulted in a fix to the content, but not a takedown, we want to notify users of that content (in other bundles). But we already will have some mechanism for showing authors when a newer version is available of some content they used in general (and allowing them to update or not), so likely no special treatment is needed.

from blockstore.

bradenmacdonald avatar bradenmacdonald commented on September 1, 2024

This is old and most likely no longer relevant to the near term development of blockstore, so I'm going to close it. The discussion will be preserved on GitHub for future reference if useful, e.g. to the learning core work.

from blockstore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.