Giter Club home page Giter Club logo

ffpa's Introduction

FedFPM: A Unified Federated Analytics Framework for Collaborative Frequent Pattern Mining

Note that FedFPM is named FFPA in this implementation.

Introduction

FedFPM is a unified federated analytics framework for frequent pattern mining on distributed data. It satisfies local differential privacy as a strong local privacy preservation scheme. It gains high data utility (F1 score) of obtained data over benchmarks with lower client usage. This implementation allows FedFPM to work on frequent item mining (FIM), frequent itemset mining (FIsM), and frequent sequence mining (FSM), where FedFPM still has the potential to work on other scenarios.

Features

  • functionality of FedFPM on a simulated environment
  • three datasets: Kosarak for FIM, MovieLens for FIsM, and MovieLens for FSM
  • two benchmarks: basic RAPPOR (with one-hot encoding) for FIM and FIsM, and SFP for FSM
  • find the groundtruth frequent patterns with the Apriori algorithm
  • multiprocessing in simulation (quite helpful for SFP)

Installation

python==3.8
numpy==1.20.1
multiprocess==0.70.11.1

Preparation of datasets and groundtruths

We provide the datasets and groundtruth of frequent patterns in https://drive.google.com/file/d/1AnFFPENc2SpHKwmofhkpHhpxwFyZ_7Xs/view?usp=sharing

You can download the folders "./data" and "./groundtruth" and place them in the project root directory.

You can also prepare those by yourself

To prepare the datasets, you can write datastructures like those in "./models/DataSet.py". Note that you should implement two structures: one is the local data in one client (like class Trajectory). The other is the global dataset of all clients (like class SeqDataSet). All the methods present in "./models/DataSet.py" should be implemented. After that, you need to pickle the latter structure and read them in "./main.py"

The groundtruth of frequent patterns is required to calculate the F1 score. To prepare the groundtruths, you can run the codes with --mode==groundtruth, and the ground truth patterns will be automatically saved. Note that the save includes all patterns that exceed the frequency threshold (--k) along with their corresponding frequency. The save can work for all future runs with higher frequency, e.g., if you have run the code with "--mode==groundtruth --dataset=msnbc --k=0.01", you can simply run "--mode==ffpa --dataset=msnbc --k=0.02" and calculate the F1 score properly.

Run the code

We present some lines to reproduce some of our experiment results.

Run FedFPM on MSNBC dataset (FSM), with target frequency 0.02

python main.py --dataset=msnbc --verbose --mode=ffpa --k=0.02  --process=14 --xi=0.01 --num_candidate=1 --num_participants=100000 --epsilon=2 --max_support=100000

Run SFP on MSNBC dataset (FSM), with target frequency 0.02

python main.py --dataset=msnbc --verbose --mode=sfp --k=0.02  --process=14 --num_participants=26000000 --epsilon=2

Run FedFPM on Kosarak dataset (FIM), with target frequency 0.02

python main.py --dataset=kosarak --verbose --mode=ffpa --k=0.02  --process=14 --xi=0.01 --num_candidate=1 --num_participants=1000000 --epsilon=2 --max_support=100000

Run RAPPOR on Kosarak dataset (FIM), with target frequency 0.02

python main.py --dataset=kosarak --verbose --mode=rappor --k=0.02  --process=14 --num_participants=17000000 --epsilon=2

Citation format

Z. Wang, Y. Zhu, D. Wang, and Z. Han, "FedFPM: A Unified Federated Analytics Framework for Collaborative Frequent Pattern Mining," In Proceedings of IEEE International Conference on Computer Communications (INFOCOM), 2022.

@inproceedings{wang2022fedfpm,
  title={FedFPM: A Unified Federated Analytics Framework for Collaborative Frequent Pattern Mining},
  author={Wang, Zibo and Zhu, Yifei and Wang, Dan and Han, Zhu},
  booktitle={Proceedings of IEEE International Conference on Computer Communications (INFOCOM)},
  year={2022}
}

ffpa's People

Contributors

huskyw avatar inslab-ji avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.