Giter Club home page Giter Club logo

Comments (15)

rclakmal avatar rclakmal commented on June 12, 2024 2

@shivam-tripathi Sure go ahead :-) I did a basic comparison already on MD5 and SHA1. MD5 was faster and looks suitable for our use case. Let me know if any assistance is needed from my side.

from briefcase.

icemc avatar icemc commented on June 12, 2024 2

@shivam-tripathi sure u can look into it. I don't think someone is currently working on it. Did a quick look and seems like the issue comes from the file ConvertToCSV.java in the method emitSubmissionCsv in case org.javarosa.core.model.Constants.DATATYPE_BINARY: correct me if I'm wrong. Hope I helped

from briefcase.

shivam-tripathi avatar shivam-tripathi commented on June 12, 2024 1

@joeflack4 Hi!
Presently we are trying to remove redundancy when you export the data collected using Briefcase after pulling it from the server. This is essentially an offline process.
However, while pulling the data off the server - this remains an issue. I hope some solution surfaces in future. If I am correct, it needs to be done at Aggregate level - as in Briefcase we cannot determine whether or not the media file is duplicate before the fetch.

from briefcase.

rclakmal avatar rclakmal commented on June 12, 2024

@yanokwa Tested on OS X 10.12 with Birds form

There were two images with the same timestamp in two difference instance folders.

snip20170325_3

After exporting, in the media folder, one was renamed with the suffix -2

snip20170325_4

from briefcase.

shivam-tripathi avatar shivam-tripathi commented on June 12, 2024

@yanokwa: Confirmed the same thing as @rclakmal on Linux Ubuntu 16.04.
selection_016
selection_017

from briefcase.

shivam-tripathi avatar shivam-tripathi commented on June 12, 2024

But noticed something weird :
selection_018
selection_019

It appears that when encountering two separate instance media with same time stamp, it creates a duplicate of only one already copied.
I renamed one of the instance media as one existing in another instance. Also it appears while re-fetching form, if the instance media data has been tampered with - briefcase doesn't verify the contents of instance media folder and skips it.

from briefcase.

shivam-tripathi avatar shivam-tripathi commented on June 12, 2024

Looked at the code, I was mistaken (see the crossed out remarks in the comment above). Making changes to the instances folder makes no difference, as names are read from the XML response. If file is not found, it is skipped.
The redundancy in the image in the comment was (unfortunately causing the confusion) due to it being actually present twice.
The code handles the files with same timestamp by adding suffix.

from briefcase.

icemc avatar icemc commented on June 12, 2024

I confirm what @shivam-tripathi said earlier. When the code encounters files with the same time stamp it solves the problem by adding an incremental suffix.

from briefcase.

yanokwa avatar yanokwa commented on June 12, 2024

Thanks so much for the confirmation, gentlemen! I'm closing this issue because this is exactly the behavior we want.

from briefcase.

yanokwa avatar yanokwa commented on June 12, 2024

Actually. Instead of adding image-2.jpg, I wonder if we can check the MD5 hash and only append a number if those files are actually different. What do you think @shivam-tripathi @icemc @rclakmal?

from briefcase.

rclakmal avatar rclakmal commented on June 12, 2024

@yanokwa We can store MD5 hashes and file paths in a HashTable. MD5 hash would be the key. This will allow us to skip the duplicates. As far as I know, there is a theoretical possibility, however small, that two different files could return same hash. I think we can ignore this for practocal purposes?

Wouldn't you think we should provide this as an option? This introduce extra overhead to the export process and in a big form result set delay could be noticeble.

from briefcase.

icemc avatar icemc commented on June 12, 2024

@yanokwa following what @rclakmal said an MD5 hash will solve the problem (in most cases) but will decrease the performance during export of forms with a large amount of instances. It is left for the community to decide if this extra cost is necessary.

from briefcase.

yanokwa avatar yanokwa commented on June 12, 2024

I'd rather we decide what is best than add options to the app. I think this should be pretty fast because you'd only be doing the MD5 check when you have a matching filename, no? Either way, this is something that can be tested empirically if either of you are up for it.

from briefcase.

shivam-tripathi avatar shivam-tripathi commented on June 12, 2024

With everyone's permission, I would be glad to look into this.

from briefcase.

joeflack4 avatar joeflack4 commented on June 12, 2024

Thanks for your efforts with this. Some of our partners have very limited connections, so the duplicate issues can be a huge issue, particularly with our fork of collect which uses form linking, as some images are shared between our household and individual questionnaires.

from briefcase.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.