Malware Collection
- We collected malware samples from security blogs of the following companies:
- Blackberry Cylance < https://www.cybereason.com/blog/ >
- Checkpoint < https://blog.checkpoint.com/, https://research.checkpoint.com/ >
- Comodo < https://blog.comodo.com/>
- Cybereason < https://www.cybereason.com/blog/ >
- ESET < https://www.welivesecurity.com/ >
- FireEye < https://www.fireeye.com/blog/ >
- Fortinet < https://www.fortinet.com/blog/ >
- F-Secure < https://labsblog.f-secure.com/ >
- Kaspersky < https://www.kaspersky.com/blog/, https://securelist.com/ >
- MalwareBytes < https://blog.malwarebytes.com/ >
- Mcafee < https://securingtomorrow.mcafee.com/ >
- Microsoft < https://blogs.microsoft.com/blog/ >
- PaloAltoNetworks < https://unit42.paloaltonetworks.com/ >
- PandaSecurity < https://pandasecurity.com/mediacenter/ >
- SentinelOne < https://sentinelone.com/blog/ >
- Sophos < https://nakedsecurity.sophos.com/ >
- Symantec < https://www.symantec.com/connect/blogs/ >
- TrendMicro < https://blog.trendmicro.com/trendlabs-security-intelligence/ >
- Vmware CarbonBlack < https://blog.trendmicro.com/trendlabs-security-intelligence/ >
The source code of our blog crawler is available on GitHub.
- We further categorized the identified blog posts into:
- Google Play Malware - posts that penetrate the Google Play store
- Non-Google Play Malware - posts that describe Android malware from alternative markets
- Non-Android Malware - posts that describe malware from systems other than Android (ex., iOS, PC, etc.)
- Different Language - posts that are not in English
- Technology/News/Promotions - posts that describe current technologies, trends, or product promotions
The table below lists our categorization results for all 6,377 posts that we identified:
Category 2016 2017 2018 2019 2020 2021 Total Google Play Malware 56 96 51 48 24 39 314 Non-Google Play Malware 93 76 67 44 52 24 356 Non-Android Malware 212 284 238 237 221 112 1,304 Different Language 10 24 17 21 122 71 265 News/Promotions 660 798 742 778 748 412 4,138 All 1,031 1,278 1,115 1,128 1,167 658 6,377 The full list of blog posts, and the assigned category for each, can be found in the “Blog Categorization” sheet of this excel file.
We identified malware samples and their families based on the information from the blog posts. A post describes one or more families. Additionally, families described by two separate posts can refer to the same family. We identified duplicates families if a post references another post directly or describes indicators pointing to the same apps.
The table below lists the unique and duplicated families we identified:
Category 2016 2017 2018 2019 2020 2021 Total Identified Families 30 71 49 39 19 27 235 Duplicate Families 3 12 8 5 3 20 51 Unique Families 27 59 41 34 16 7 184 The full list of the identified families can be found in the “Malware Families” sheet of this excel file.
We searched malware repositories (e.g., VirusTotal, VirusShare, Contagio), Android alternative markets (e.g., APKMonk, APKPure), and Android app repositories (e.g., AndroZoo) for samples using indicators described in the posts.
The table below gives the number of families with detected samples and the total number of detected samples for each year:
Category 2016 2017 2018 2019 2020 2021 Total Found Families 20 47 28 21 15 3 134 Found Samples 89 636 301 166 35 11 1,238 We manually analyzed one sample from 105 distinct families. We could not analyze samples from the rest of the families due to the use of packers, reflection, obfuscation, etc. In the next two tables, we show the distribution of analyzed samples, per year, and the distribution of sample we could not analyze, together with the underlying reasons:
Category 2016 2017 2018 2019 2020 2021 Total Samples Analyzed 13 34 27 17 11 4 105 Category 2016 2017 2018 2019 2020 2021 Total Packer 2 3 0 3 0 1 9 Reflection 0 4 0 1 0 0 5 Cannot Find Malicious Behavior 2 2 0 0 0 0 4 No Decryption Key 0 0 1 0 0 1 2 Obfuscation 3 4 0 0 2 0 9