Sign up for my newsletter to see more interviews with the biggest names in cybersecurity.
This week I was lucky enough to talk to Zachary Hanif, the Director of Applied Science at Novetta, a company which provides advanced analytics for cybersecurity and other fields. Prior to Novetta, Zach worked at Endgame for over six years and he just recently announced that he will be advising startups as part of Eastern Foundry. A lot of Zach’s work has focused on machine learning and its applications to security.
This is a longer interview but it’s packed with fascinating insight from a real expert in the field. Check it out below!
You’ve developed the tool TOTEM. Can you describe the problem that TOTEM solves?
To contextualize TOTEM, and the surrounding work we (Tamas Lengyel, George Webster, and I) presented at BlackHat you need to understand what motivated the creation of BinaryPig. Several years ago, the BinPig authors (Jason Trost, Telvis Calhoun, and I) were going through reports on malware that analyzed sample sets of relatively small size. One of the reasons we found that the sample set sizes were small was because the researchers working with them weren’t able to process very large amounts of malware efficiently – they had to rely on homebrew solutions which were tangential to the research they intended to perform. We released BinaryPig to give researchers a strong system that would let them easily process such large malware sets. After BinaryPig was released, we received a lot of feedback from academia and the commercial sector and when I started at Novetta over a year ago, I started working on TOTEM. One of the main instigators of TOTEM’s creation was Operation SMN and our identification of the lack of historical malware processing capabilities in industry. TOTEM also allows analysts to run analysis on large malware samples, but unlike BinaryPig, TOTEM allows analysts to quickly and efficiently run analysis on sample sets of malware that their machines picked up a very long time ago. Also unlike BinaryPig, TOTEM supports streaming analytics so users can analyze malware samples as they are picked up by machines on their defended networks.
Why is it so important to be able to analyze streaming malware?
It’s important to analyze malware in a streaming ingest fashion because samples don’t only appear in batches in the real world. As an organization trying to protect against threats in real time, you have to be able to see and analyze malware in real time. Likewise, if you need to run historical malware sets, your streaming system needs to be able to process the going set of daily malware while easily expanding its processing to handle historical malware sets.
Is this similar to what sandboxes do?
In broad strokes, sandboxes perform dynamic analysis and TOTEM performs static analysis. In addition to static analysis however, TOTEM orchestrates dynamic analysis by telling these dynamic analysis systems what to analyze and other relevant features of samples. Also, not all sandboxes can handle malware in a scalable fashion. Cuckoo is an open source project that’s able to analyze streaming samples however it doesn’t hold up well at the scale we need. Tamas Lengyel, who also presented at Black Hat this year, developed as part of his PHD work a system called Drakvuf, which uses libVMI to preform agentless dynamic analysis. It has some very interesting properties, and we’re working on getting it and TOTEM to work together.
TOTEM is an open source tool right now, have you ever considered commercializing it?
It’s something that we’ve discussed and there are commercial systems out there that do aspects of what TOTEM does. However, I’m committed to the open source community, and definitely designed TOTEM to be open sourced. I agree with my peers who feel very strongly that having more open source security technology is a good thing for the world. I think that advanced systems like Drakvuf exemplify this ideal. With TOTEM specifically, we wanted to go a little beyond that – George and I worked together to define and publish a logical framework for implementing systems like TOTEM which is titled SKALD. In this way, we’ve not only released software, but also a blueprint for others to implement off of.
Several years ago you proposed creating a kind of clearinghouse for domain names that would work to identify malicious domains. Did anything ever come of that?
The domain (and other indicators) clearinghouse idea was a way to facilitate threat information sharing between security professionals. In the security industry, there are a few different “currencies.” Depending on your profession, your currency is either malware samples, or indicators of compromise. Organizations sell this information and if you don’t have the budget, you can’t protect your environment as well as others.
In other risk-related industries there are vetted clearinghouses that distribute threat information to the larger community. If you’re able to share this information anonymously, you suddenly have the capacity for sharing on a large scale. VirusTotal has cornered the market on this in the malware sample space. There are certainly a number of startups and other companies whose products center around the development of information collating and sharing systems.
Though Chris and I weren’t by any means the first people to argue for this kind of system (or develop one), we hoped our presentation helped to highlight the need to the ICANN audience for threat sharing systems between service providers. I know that Chris Davis worked to develop the Secure Domain Foundation to put these ideas into reality.
What about the state of these existing threat sharing tools like that of AlienVault?
I know of three companies that have threat sharing tools, AlienVault, ThreatStream and ThreatConnect. Though I’ve only worked with ThreatConnect, I’ve heard positive things about all of them. ThreatConnect is mostly about sharing structured information about organized malware campaigns and other information about malware. Their approach orients itself towards SOC analysts and I’ve found the tool very useful. The ability to transmit, manage, collect and react to threat information is definitely positive.
Isn’t this type of thing done within organizations? Were you proposing doing this on a larger scale?
You’d be surprised in how few organizations actually have threat intelligence teams. Few organizations have the internal maturity for that. You’d also be surprised at how little sharing of threat information exists within organizations. The ability to collaborate with peers didn’t exist in traditional AV tools and we’re just now starting to see it with newer solutions.
The point is we need to have better sharing and contextualization of threat information. More sophisticated technology is a help but it definitely is not the answer necessarily. Only humans are able to contextualize threat information and use security events to shape and establish policy.
Machine learning as applied to security seems to be the new buzzword. What do you think about the way machine learning is being used in security? Is it everything that it’s made out to be?
Machine learning in the technology industry as a whole is probably a buzzword and the security industry is no stranger to buzzwords. That said, machine learning is an incredibly valuable tool, because it provides network defenders the ability to immediately classify all activity. If you’re able to see events, determine that they are or aren’t false positives, that saves a lot of time for human analysts. The idea that machine learning is the be all and end all isn’t accurate. There’s a bunch of techniques that I’m actively using and implementing that help intrusion prevention systems weed out the false positives. Machine learning can also expand the ability of AV products to tell what is good from what isn’t beyond just heuristics, signatures, etc. Additionally, machine learning can help researchers working with large malware samples to discover patterns. However all of these are pieces of the larger puzzle. In the AV world for instance, once machine learning helps us identify something bad, we still have to remediate that threat. In the IPS world, once you identify that something is bad, you still have to block it. The point is, while machine learning is helpful, it’s not the answer to everything.
Are there any other applications of machine learning in security that we’re not seeing now but that you think we’ll see in the future?
I’d like to see more of it in many different areas. A lot of the backend research could benefit from machine learning. I’d like to see more techniques that are capable of signaling local on host material–one of the purposes of dynamic analysis is seeing what the malware does on a local machine. Malware has to install itself somewhere. It has to do things besides just exist. Understanding what malware looks like while it’s executing on a machine could be an interesting application of machine learning. Additionally machine learning could help us find patterns across different areas. It could bring together static and dynamic malware analysis for instance.
You seem very entrepreneurial, have you thought at all about leaving Novetta to start your own thing?
I’ve definitely thought about it. With the field in the state it is right now I’d like to have as much effect on the field that I possibly can and if I left to start my own company, I’d have to focus on a lot of important things that don’t necessarily have to do with the specific problems I am passionate about. One day however, that equation might change.
What is that current state of security?
There’s an incredible amount of flux in the space and there’s no shortage of funding, interest, or talent. There’s also an incredible amount of people in the space, so any time you begin to start a company you need to get through the amount of sound (both signal and noise) that exists in the current space that you’re in. There’s so much interest in the space and it’s only getting bigger. There’s some noise but there’s also lots of signal and I’m trying to amplify Novetta’s signal in the space. There’s always the excitement of starting something new, but the efficacy of working from an established platform is hard to deny.
- The list includes some well known names such as RedOwl Analytics and Netskope, and some lesser known companies such as Sqrrl Data and TaaSera.
- There are plenty of non-financial applications of the Blockchain that include tracking ownership of high value goods such as art to minimize fraud. Blockverify and Verisart are two companies working on these applications.
- Palo Alto Networks CEO and Chairman Mark McLaughlin claims that a big part of their culture and success comes from the fact that all of the engineering for Palo Alto Networks is done in Silicon Valley. McLaughlin claims that the best products get built when engineers are able to exchange ideas and collaborate.
- Startups that are building IoT products don’t necessarily have the security backgrounds to ensure the security of their devices.
- The company added another $30 million to its last round, bringing the total round size to nearly $150 million.
- The round is being led by ORR Partners. Dome9 provides security and compliance for AWS, Azure, and other public cloud products.
- Funding comes from TechOperators and Blackstone. Phantom aims to automate enterprise security to cut back on the necessary staff.
- Experian is one of the big 3 credit ratings agencies. Among other things, hackers received access to “identification numbers” – in this case, driver’s license, military ID, or passport numbers.
- Though the company hasn’t downgraded anyone due to cybersecurity concerns yet, they said they would if they thought the bank was ill-equipped to handle a data breach.