惡意代碼檢測

惡意代碼定義

惡意代碼也稱為惡意軟體，是對各種敵對和入侵軟體的概括性術語。包括各種形式的計算機病毒、蠕蟲、特洛伊木馬、勒索軟體、間諜軟體、廣告軟體以及其他的惡意軟體。

惡意代碼的種類

計算機病毒：指寄居在計算機系統中，在一定條件下被執行會破壞系統、程式的功能和資料，影響系統其他程式和自我複制。

蠕蟲：也算是一種病毒，它具有自我複制能力并通過計算和網絡的負載，消耗有限資源。

特洛伊木馬：也可以簡稱為木馬，最初來源于古希臘傳說。計算機木馬是一種潛伏在計算機中為了達到某種特殊目的的程式，比如竊取使用者私密資訊和控制使用者系統等。它與病毒最大的不同點在于，病毒能進行自我複制，而木馬不具有複制功能，不會感染其他程式。

Rootkit：最初是指一組能幫助使用者擷取系統權限的工具包，這裡的是一種惡意程式，用于擷取目标主機權限之後隐藏攻擊者通路痕迹，使得攻擊者不被發現進而能夠長期擁有管理者權限。它具有很好的隐蔽性和潛伏性，難以檢測。

惡意代碼特征（區分程式惡意特征的特征資訊）

系統調用特征
規範化代碼特征
N-gram特征
控制流（CFG特征）
指令序列特征
檔案格式等特征

惡意代碼特征提取

Byte n-gram Features:從檔案的二進制代碼中提取Byte n-gram特征，其中選擇訓練集中每個類的L個最常出現的n克來表示類的配置檔案。

Opcode n-gram Features:首先拆卸所有資料集的可執行檔案和操作碼提取。一個操作碼的彙編語言指令描述要執行的操作。它是短形式的操作碼。一條指令包含一個操作碼和操作數,選擇應該采取的操作。一些操作的操作數操作碼可能操作,根據CPU體系結構,寄存器,值存儲在記憶體和堆棧等等。一個操作碼的作用在算術、邏輯運算和資料處理操作。操作碼能夠統計得出之間的可變性惡意和正版軟體。

Portable Executables：這些特征是從EXE檔案的某些部分提取出來的。利用可執行檔案的結構資訊，通過靜态分析提取可執行檔案的特征。這些有意義的特性表明檔案被操縱或感染以執行惡意活動。

String Features：這些特征是基于純文字編碼在可執行檔案，如windows, getversion, getstartupinfo, getmodulefilename, messagebox，庫等。這些字元串是用PE和非PE可執行檔案編碼的連續可列印字元。

Function Based Features：在程式檔案的運作時行為上提取基于函數的特征。基于函數的特性函數駐留在要執行的檔案中，并利用它們生成表示檔案的各種屬性。

Hybrid Analysis Features：靜态分析和動态分析的結合。

惡意代碼檢測

基于靜态特征的惡意代碼檢測技術

分類特征	參考文獻
The byte code	Kolter J Z, Maloof M A. Learning to detect malicious executables in the wild. [C]. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2004: 470-478. Santos I, Penya Y K, Devesa J, et al. N-grams-based File Signatures for Malware Detection. Proceedings of the 2009 International Conference on Enterprise Information Systems (ICEIS), 2009, 9: 317-320
n-grams
File format	Shafiq M Z, Tabish S M, Mirza F, et al. Pe-miner: Mining structural information to detect malicious executables in realtime. Recent advances in intrusion detection, Springer Berlin Heidelberg, 2009: 121-141. Bai J, Wang J, Zou G. A Malware Detection Scheme Based on Mining Format Information. The Scientific World Journal, 2014.
Gray image	Nataraj L, Karthikeyan S, Jacob G, et al. Malware images: visualization and automatic classification[C] . Proceedings of the 8th international symposium on visualization for cyber security. ACM, 2011: 4. HAN Xiao-guang, QU Wu, YAO Xuan-xia, et al. Research on malicious code variants detection based on texture fingerprint. Journal on Communications, 2014, 35(8):125-135.
Function call graph	Kong D, Yan G. Discriminant malware distance learning on structural information for automated malware classification[C]. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013: 1357-1365.

基于動态特征的惡意代碼檢測技術

分類特征	參考文獻
Variable length	Nair V P, Jain H, Golecha Y K, et al. MEDUSA: MEtamorphic malware dynamic analysis usingsignature from API[C]. Proceedings of the 3rd International Conference on Security of Information and Networks. ACM, 2010: 263-269. Chen F, Fu Y. Dynamic detection of unknown malicious executables base on api interception[C]. Database Technology and Applications, 2009 First International Workshop on. IEEE, 2009: 329-332. Firdausi I, Lim C, Erwin A, et al. Analysis of machine learning techniques used in behavior-based malware detection[C]. Advances in Computing, Control and Telecommunication Technologies (ACT), 2010 Second International Conference on. IEEE, 2010: 201-203.
API	Nair V P, Jain H, Golecha Y K, et al. MEDUSA: MEtamorphic malware dynamic analysis usingsignature from API[C]. Proceedings of the 3rd International Conference on Security of Information and Networks. ACM, 2010: 263-269.
subsequences
Operation code	Shabtai A, Moskovitch R, Feher C, et al. Detecting unknown malicious code by applying classification techniques on opcode patterns. Security Informatics, 2012, 1(1): 1-22. [17] Pai S, Di Troia F, Visaggio C A, et al. Clustering for malware classification. Journal of Computer Virology and Hacking Techniques, 2016: 1-13.
n-grams
Graph	Bonfante G, Kaczmarek M, Marion J Y. Architecture of a morphological malware detector. Journal in Computer Virology, 2009, 5(3): 263-270. Cesare S, Xiang Y, Zhou W. Control flow-based malware variant detection. IEEE Transactions on Dependable and Secure Computing, 2014, 11(4): 307–317.

基于融合特征的惡意代碼檢測技術（各種內建特征類型的檢測方法）

分類特征（動态特征/靜态特征）

參考文獻

Dynamic API

operation code

SantosI, DevesaJ, Brezo F, et al. Opem: A static-dynamic approach for

machine learning based malware detection[C]. International Joint Conference CISIS’12-ICEUTE´ 12-SOCO´ 12 Special Sessions. Springer Berlin Heidelberg, 2013: 271-280

Program behavior

Static DLL、API

Lu Y B, Din S C, Zheng C F, et al. Using multi-feature and classifier ensembles to improve malware detection. Journal of CCIT, 2010, 39(2): 57-72.

API call sequence

PE format

Guo S, Yuan Q, Lin F, et al. A malware detection algorithm based on multi-view fusion. Neural Information Processing, Models and Applications, Springer Berlin Heidelberg, 2010: 259-266.

Krawczyk B, Woźniak M. Evolutionary Cost-Sensitive Ensemble for Malware

Detection. International Joint Conference SOCO’14-CISIS’14-ICEUTE’14, Springer International Publishing, 2014: 433-442.

Dynamic API

Static API

Ozdemir M, Sogukpinar I. An Android Malware Detection Architecture based on Ensemble Learning. Transactions on Machine Learning and Artificial Intelligence, 2014, 2(3): 90-106.

operation code

byte code

Bai, Jinrong, and Junfeng Wang. Improving malware detection using multiview ensemble learning. Security and Communication Networks 9.17 (2016): 4227-4241.

參考文獻：

[1] Bo Yun Zhang.Survey on Malicious Code Intelligent Detection Techniques

[2]Smita Ranveer,Swapnaja Hiray.Comparative Analysis of Feature Extraction Methods of Malware Detection

惡意代碼檢測

惡意代碼定義

惡意代碼的種類

惡意代碼特征（區分程式惡意特征的特征資訊）

惡意代碼特征提取

惡意代碼檢測

繼續閱讀

簡單文檔分類——樸素貝葉斯算法樸素貝葉斯算法簡單文檔分類執行個體步驟總結樸素貝葉斯分類調用(sklearn)

【分類算法】什麼是分類算法定義分類與聚類分類過程方法

分類算法的評價名額

K-近鄰算法以及圖像分類應用

weka之NB算法

使用weka的select attribute

weka中分類器算法

在weka中內建自己的算法

【多變量線性回歸】學習記錄序思路實作終

申請評分模型拒絕推斷（RI）方法申請評分模型拒絕推斷（RI）方法

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

【趨高機器視覺】機器視覺技術原了解析及解決方案

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告