3 Ways Natural Language Processing is Changing Cybersecurity
If you’ve used a chatbot, predictive text to finish a thought in an email, or pressed “0” to speak to an operator, you’ve come across natural language processing (NLP). As more enterprises adopt NLP, the sub-field is developing beyond those popular use cases of machine-human communication to machines interpreting both human and non-human language. This creates an exciting opportunity for organizations to stay ahead of evolving cybersecurity threats.
NLP combines linguistics, computer science and AI to support machine learning of human language. Human language is astonishingly complex, and relying on structured rules leaves machines an incomplete understanding of it. NLP enables machines to contextualize and learn instead of relying on rigid encoding, so they can adapt to different dialects, new expressions, or questions the programmers never anticipated.
NLP research has driven the evolution of AI tech like neural networks that are instrumental to machine learning across various fields and use cases. NLP has been primarily leveraged across machine-to-human communication to simplify interactions for enterprises and consumers.
NLP for Cybersecurity
NLP was designed to enable machines to learn to communicate like humans, with humans. Many services we use today leverage machine communications either to each other or in translation to become intelligible by humans. Cybersecurity is the perfect example of a field where IT analysts can feel like they speak to more machines than people.
NLP can be leveraged in cybersecurity workflows to assist in breach protection, identification, and scale and scope analysis. Here are ways it does that:
1) In the short-term, NLP can be easily leveraged to enhance and simplify breach protection from phishing attempts. In this context, NLP can be leveraged to understand “bot” or “spam” behavior in email text sent by a machine posing as a human, and it can be used to understand the internal structure of the email itself to identify patterns of spammers and the types of messages they send. This example is the first extension of NLP, originally designed to understand just human language and now being applied to understand the combination of human language mixed with machine-level headers.
2) In the medium term, NLP can be leveraged to parse logs, a cyBERT use case. In the current rules-based system, the mechanisms and systems required to parse raw logs and make them ready for analysts are brittle and need significant development and maintenance resources. Using NLP, parsing of raw logs becomes more flexible and less prone to breaking when changes occur to the log generators and sensors. Going further, the neural networks used for parsing can generalize beyond the logs they were exposed to during training — creating methods to transform raw data into rich content ready for an analyst without the need to write explicit rules for these new or changed log types.
As a result, NLP models are more accurate at parsing logs than traditional rules while being more flexible and more fault-tolerant.
3) In the longer term, entirely synthetic languages can be created that represent machine-to-machine and human-to-machine communications. If two machines can create an entirely new language, that language can then be analyzed using NLP techniques to identify errors in grammar, syntax, and composition — all of which can be interpreted as anomalies and contextualized for analysts. This new development can help identify known issues or attacks when they occur, and can also identify completely unknown misconfigurations and attacks, which helps analysts be more efficient and effective. These applications are just the beginning for NLP.