1/12
Flashcards covering key concepts from the lecture on malware detection using machine learning, focusing on Android malware analysis.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Why is machine learning useful for malware detection?
Manual analysis is slow and expensive, the amount of malware is exponentially increasing, and manual analysis is not scalable.
How can we use static analysis of the App Java source code to detect malware with an Android Malware Detector?
Reverse engineer the compiled Java bytecode back to a set of Java classes, analyze these Java classes to detect if they use particular Android API calls, commands and permission requests. Create a feature vector, input the vector into a Malware Classifier.
What are the key components of an Android .apk package?
Manifest, Dalvik executable file, /assets and /res folders.
What types of property detectors are applied to extract features for training the Machine learning model?
Permission detectors and code detectors.
What type of API calls are detected by API call detectors?
Telephony Manager APIs for accessing IMSI, IMEI, sending/receiving SMS, listing/installing other packages etc.
What type of commands are detected by command detectors?
System commands such as ‘chmod’, ‘mount’ ‘/system/bin/su’ ‘chown, etc.
What is the purpose of the feature ranking and selection function in malware detection?
Ranks features according to relevance for detecting suspicious activity.
What does Mutual Information (MI) measure in the context of feature ranking?
Measures how much one random variable tells us about another. In this case, how much does seeing Feature F1 tell us about the probability this sample is malware or benign?
Name some of the top ranked features used in Android malware detection.
getSubscriberId, getDeviceId, getSimSerialNumber, .apk, intent.action.BOOT_COMPLETED, chmod, Runtime.exec(), abortBroadcast, getLine1Number, /system/app. (Note: the list continues from the original)
How is the malware classifier trained?
Trained with real malware and benign software samples, the classifier learns probability models for each feature extracted and combines probabilities of different features to give an overall score for probability of benign and also malware.
How does the classifier make the final classification decision?
If probability of malware > probability of benign then classify as malware
What are the processes involved in classifier training?
Extract features, train classifier
Why is static analysis coupled with a classifier an effective tool for filtering apps?
It is an effective tool for filtering apps to detect unknown Android malware, with > 90% detection rate obtainable with low false positives.