Giter Club home page Giter Club logo

droidmalwaredetectionresearch's Introduction

本项目包含病毒样本,同步时请关闭杀毒软件!!

简介

收集各种Android恶意软件检测工具,使用并在元旦前报告。

测试要求

  1. 如工具有论文支持,将论文按格式下载到papers文件夹
  2. 如工具有离线版本,将工具按格式clone/copy到tools文件夹
  3. 测试工具,完成测试报告,主要包括工具功能简介,来源,使用步骤和结果,将测试报告按格式保存到reports文件夹
  4. 补全Android恶意软件检测工具.xlsx中该工具那一栏,主要字段有“状态”(可选"跳过"或是”完成“),审查人(填自己,方便整合时找人),工具输入(如有多项尽量写全),工具输出(如有多种类型尽量写全),主要技术的关键字(方便我们日后的优化)

注:2016-2017的论文表格中只有标题,可以通过查询数据库crawler/db/db.sqlite3获取更详细的信息,但请勿改动数据库数据。

目录结构

  • Android恶意软件检测工具.xlsx:测试其中的工具,或是向其中添加新的工具
  • tools:工具存放目录,将工具的副本clone/copy到该文件夹下,文件名与工具名一致,web服务除外
  • reports:工具使用报告的存放目录,每个工具为一个文件夹(名称与工具名相同),文件夹下存放报告文件(格式就用docx吧,截图啥的会方便点,主要包含简介,来源,使用步骤和结果,有些工具可能生成报告,将报告样例也放在该文件夹
  • apks:测试apk的存放目录(部分,后期应该有独立的apk库)
  • docs:存放一些需求文档
  • papers:存放论文,格式为(驼峰命名法)论文标题,冒号用“-”替代,逗号用“_”替代
  • crawler:google学术爬虫,基于搜集到的论文寻找更多与移动恶意软件侦测的论文,目前可以爬取IEEE Xplore、ACM、Springer、arXiv、Elsevier、WileyOnlineLibrary、Semantic Scholar和Inderscience收录的论文。数据库在db/db.sqlite3

注:若论文提到的工具无工具名,则使用文章标题每个单词首字母

爬虫

爬虫可以找到引用已知文献

文件说明 & 使用方法

  1. crawler.py: 根据一篇文献查找引用该文献的其他文献
  2. fetch_all.py: 根据整理的Excel表格中出现的论文查找其他文献,每篇文献的相关文献会保存为独立的csv文件
  3. merge.py: 将独立的csv文件的信息合并写入数据库
  4. publisher.py: 从各个论文收录商处搜索/抓取论文的详细信息
  5. compete_paper.py: 调用publisher.py,补全数据库中的论文信息
  6. crawl_ccf.py: 爬取CCF推荐论文,以此作为论文级别的参照
  7. match.py: 将publication与paper的published_in字段对应

筛选

文章众多(1800+篇),必须做筛选,目前分以下批次(可以进一步添加,但是各批次的条件查询的结果不应该有交集)

  1. 标题有"detect", "malware", 类型为:A级别,会议
SELECT * FROM "paper"
WHERE (title like "%etect%" 
	OR title like "%alware%")
	AND published_in in 
	(SELECT name FROM "publication"
	WHERE rank="A" AND type="conference");
  1. 标题有"detect", "malware", 类型为:A级别,期刊
SELECT * FROM "paper"
WHERE (title like "%etect%" 
	OR title like "%alware%")
	AND published_in in 
	(SELECT name FROM "publication"
	WHERE rank="A" AND type="journal");
  1. 标题有"detect", "malware", 类型为:B级别,会议
SELECT * FROM "paper"
WHERE (title like "%etect%" 
	OR title like "%alware%")
	AND published_in in 
	(SELECT name FROM "publication"
	WHERE rank="B" AND type="conference");
  1. 标题有"detect", "malware", 类型为:B级别,期刊
SELECT * FROM "paper"
WHERE (title like "%etect%" OR title like "%alware%")
	AND published_in in 
	(SELECT name FROM "publication"
	WHERE rank="B" AND type="journal");
  1. 中文刊物
SELECT * FROM "paper"
WHERE include LIKE "%cn%" 
	OR include="inforsec.org";
  1. 标题有"detect", "malware", 类型为:arXiv
SELECT * FROM "paper"
WHERE (title like "%etect%" OR title like "%alware%")
AND published_in="arXiv";
  1. 标题有"detect", "malware", 类型为:C级别,会议
SELECT * FROM "paper"
WHERE (title like "%etect%" 
	OR title like "%alware%")
	AND published_in in 
	(SELECT name FROM "publication"
	WHERE rank="C" AND type="conference");
  1. 标题有"detect", "malware", 类型为:C级别,期刊(TODO)
SELECT * FROM "paper"
WHERE (title like "%etect%" 
	OR title like "%alware%")
	AND published_in in 
	(SELECT name FROM "publication"
	WHERE rank="C" AND type="journal");

数据库

数据库保存在crawler/db/db.sqlite3中,原则上只有爬虫有修改数据库权限,研究人员只可以查询。

数据模型

Paper 保存论文信息

class Paper(models.Model):
    title = models.CharField(max_length=255, unique=True)   # 标题
    inc_url = models.URLField(null=True)    # 收录机构提供的url
    pdf_url = models.URLField(null=True)    # 论文pdf的下载地址
    year = models.IntegerField(null=True)   #出版时间
    include = models.CharField(max_length=127, null=True)   #Google学术中指出的出处
    published_in = models.CharField(max_length=255, null=True)  #出版物名称
    doi = models.CharField(max_length=255, null=True)   #DOI号
    keywords = models.CharField(max_length=255, null=True)  #关键词
    abstract=models.TextField(null=True)    #摘要

Publication (靠谱的)出版物

class Publication(models.Model):
    name = models.CharField(max_length=255, unique=True)    #出版物名称
    abbv = models.CharField(max_length=32, null=True, unique=True)  #出版物简称
    pub_house=models.CharField(max_length=32)   #出版商名称
    rank=models.CharField(max_length=2)     #CCF推荐等级
    cls=models.CharField(max_length=128)    # 方向
    type=models.CharField(max_length=32,null=True)  # 杂志 or 会议

Android恶意软件检测工具.xlsx

由于该文件不是纯文本文件,因此在提交时出现冲突时,git diff不适用。

解决方法(仅限MS Office)

添加一个Excel的diff工具(Office中有自带的比较工具):

  1. 复制diffxls的批处理文件(xlscompare.bat)到C:\Program Files\Git\cmd
  2. 根据office版本(默认office16)和windows版本(默认win7/8/10 64位),修改批处理的命令
  3. 确保C:\Program Files\Git\cmd在系统PATH中
  4. 修改用户主目录%HOMEPATH%下的.gitconfig文件,添加一个mergetool和difftool,内容如下(user和email不变)
[user]
    eamil = [email protected]
    name = anemone0
    email = [email protected]

[mergetool "diffxls"]
    cmd = xlscompare.bat "$LOCAL" "$REMOTE"

[difftool "diffxls"]
    cmd = xlscompare.bat "$LOCAL" "$REMOTE"

git push产生冲突后:

  1. 拉取远程版本git pull
  2. 使用自定义的diff工具合并版本,git mergetool --tool=diffxls可以打开Spreadsheet Compare,比较远程本地版本,合并成新的xlsx
  3. 删除为保护现场而产生的*.orig文件
  4. 确认合并完成,使用git add . && git commit提交合并的新版本
  5. 使用git push将新版本提交到远程

注:该方法只适合于MS office,使用其他Office软件的开发者可以参考此方法和diffxls中的excel.bat写一个简易的difftool。

droidmalwaredetectionresearch's People

Contributors

anemone95 avatar big-grass avatar eumenide avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.