Giter Club home page Giter Club logo

joint-pnf's Introduction

Joint Permute-and-Flip

This contains Python codes and supplemental results of our experiments on accuracy, rank error, and run time of our Joint Permute-and-Flip mechanism.

In this study, we also proposed pseudo-SHD scores and new score function for the "joint" approach suitable for genomic statistical analysis.

The procedure to generate simulation data for our experiments can be found in SimulationData folder.

In RunTime folder, we provide supplemental results for the cases where $K=1$, $5$, and $7$, in addition to the results for $K = 3$ shown in the paper. These results indicate that our method becomes more desirable than the normal exponential mechanism as $K$ increases.
Furthermore, we provide the results on computing our pseudo-SHD scores for $\chi^2$-statistics and TDT statistics and demonstrate their efficiency.

Coupled with the results on accuracy and rank error, our Joint Permute-and-Flip can be advisable for publishing the top $K$ significant SNPs in large-scale genomic statistical analysis.

Important Notes

・This study considers pure $\epsilon$-differentially private algorithms for top-K selection without any assumptions about the characteristics of the dataset or data indistinguishability. Therefore, existing methods (for genomic statistical analysis) [Yamamoto and Shibuya, 2022, Yamamoto and Shibuya, 2023] under conditions with such assumptions were not discussed in this study. For those methods, we intend to develop better methods that remove the assumptions and to conduct further research for a fair comparioson with the Joint Permute-and-Flip (and future state-of-the-art methods).

(・This study was also largely aimed at proposing a new accurate and efficient method for the general top-K selection tasks, not limited to the context of genomic statistical analysis.)

・In our experiments, we did not focus on the simple permute-and-flip. This is mainly because joint exponential mechanism achieved lower rank error than the simple permute-and-flip [Gillenwater et al., 2022] and the error of the simple permute-and-flip is always lower than the simple exponential mechanism [McKenna and Sheldon, 2020]. From these two existing studies, we can easily expect that Joint Permute-and-Flip can achieve higher accuracy than the simple permute-and-flip (and joint exponential mechanism).

・In addition to the previous viewpoint, in genomic statistical analysis, we believe that it is also important and essential to provide a collective implication of $K$ outputs by a "joint" approach. In this study, we proposed a "joint" score that aims to extract a set of SNPs in which even the worst element has a high rank, as an example.

・Based on the above considerations, the experiments in the main paper focused on evaluating the usefulness of Joint Permute-and-Flip, with the exception of the simple permute-and-flip.

・For reference, we provide supplemental results on accuracy and rank error of the simple and Joint Permute-and-Flip mechiansms in the corresponding folders. The results show that "joint" approach can increase the quality of the top $K$ outputs. (Please note that when $K = 1$, these two mechanisms are identical.)

(・In Algorithm 2 for releasing the top $K$ elements using the simple Permute-and-Flip, break the for loop in step 8.)

・In our experiments and discussion on $\chi^2$-statistics based on a $2 \times 2$ contingency table, for the sake of simplicity, we consider ${\it neighboring}$ datasets as two datasets that differ only by one or two alleles in the table. This is because when one individual in the dataset varies, at most two alleles' information varies.

Possible Future Directions

・Conducting a theoretical analysis of the output accuracy of various "Joint" Permute-and-Flip mechanisms. (The accuracy is likely to vary depending on how the "joint" score is generated (and the feature of the dataset).)

・Exploring the "best" score (and how to construct it) for joint mechanism in genomic statistical analysis (or in other applications).

Note

For details of our methods and discussion, please see our paper entitled "A Joint Permute-and-Flip and Its Enhancement for Large-Scale Genomic Statistical Analysis" (https://doi.org/10.1109/ICDMW60847.2023.00034) presented at TrustKDD at IEEE ICDM 2023.

Contact

Akito Yamamoto

Division of Medical Data Informatics, Human Genome Center,

the Institute of Medical Science, the University of Tokyo

[email protected]

joint-pnf's People

Contributors

ay0408 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.