A Survey Report on Big Data Security
Analytics Tools
Introduction
In a data-driven era, big data is available and
ubiquitous; particularly it becomes a unique and valuable asset for the
organizations due to underlying meaningful information. Despite big data’s
complexity, big data’s massive size and storage are challenging issues to many
organizations that usually set up an analytical process to distil big data for
knowledge or wisdom (Sakr & Gaber, 2014). Recently, cloud computing with
its cloud services (e.g., PaaS, IaaS, SaaS, etc.) becomes an emerging computing
model for processing Big Data due to flexibility, availability, and
affordability. However, clouds have major security issues in association with
data’s confidentiality, integrity, availability, privacy, and applications
outsourced to the cloud.
The cloud computing systems that are similar to
distributed computing systems encounter the complex security attacks in large
scale such as eavesdropping, masquerading, message tampering, replaying the
messages, and denial of services (Khan, Yaqoob, Hashem, Inayat, Ali, Alam,
Shiraz & Gani, 2014). The existing methods and tools such as ManageEngine,
BlackSrtatus, SolarWinds, etc. from the first generation of security
information and event management (SIEM) technologies share common functionality
similar to the previous generation tools but also have individual specific
capabilities. Nevertheless, they are unable to detect and solve complex
advanced persistent threats (APTs) (Gartner, 2016). Also, there is a lack of
good survey to the existing scalable and intelligent security analytics tools
on the new technical functions and suitable applications on the targeted
security problems.
This Unit 5 Individual Project document will provide a
research survey on four scalable and intelligent security analytics tools that
focuses on four advanced big data security analytics tools, describes their
main functions of APT detection in the cloud computing environment and key
design considerations for scalability, and discusses pros and cons of each
advanced security analytics tool.
Scalable and intelligent security analytics tools
The urgent need for early detection, prevention, or at
least reduction of the APTs or complex cyberattacks on organizational
databases, particularly cloud computing systems, drives a new generation of
advanced big data security analytics tools. Gartner’s report of “Magic Quadrant
for Security Information and Event Management” published in August, 2016
displayed 14 big data security analytics tools in the magic quadrant for SIEM technologies
in Figure 1 where the least performance players are ManageEngine,
BlackSTratus,SolarWinds, etc. and the best performance leaders are IBM QRadar,
Splunk, LogRhythm, etc.
Source: Adapted from Gartner’s SIEM Report, 2016.
Four of the most advanced scalable and intelligent
security analytics tools chosen for this survey report are IBM QRadar, Splunk, LogRhythm, and HPE ArcSight.
Main functions
The SIEM technologies form event data
such as log data, NetFlow, network packets. Event data includes contextual
information about users, assets, threats and vulnerabilities in SIEM market. The
major need of the organizations in both public and private sectors leads to a
new security information and event management market of analyzing event data in
real time for early detection of data breaches and targeted attacks. The SIEM
activities include capturing, collecting, storing, investigating, and reporting
on log data for forensics, incident response, and regulatory compliance. The
main functions of the top advanced big data security analytics tools are
described as follows:
IBM’s QRadar’s
main functions
QRadar is an IBM’s security
intelligence platform that can be implemented in both physical and virtual
appliances, and infrastructure as a service (IaaS) in the cloud, and QRadar itself
as a service fully managed by IBM Managed Security Services team. Qradar can
collect and process log data, security event, NetFlow, and monitor network
traffic by using deep-packet inspection and full-packet capture. It can perform behavior analysis for all
supported data sources (Gartner, 2016). QRadar collects log data from an
enterprise, OS (operating systems), host assets, applications, user activities,
and performs real-time analysis for malicious activity to stop, prevent or
mitigate the damage to the company (Scarfone, 2017).
Splunk’s
main functions
Splunk
is a security intelligence platform including Splunk Enterprise to provide log
and event collection, search and visualization in the Splunk query language,
and Splunk Enterprise Security (ES) to add security-specific SIEM features.
Splunk Enterprise (SE) enables data analysis used for IT operations, business
intelligence, application performance management. SE can combine with ES for
monitoring security events and analysis. Splunk ES supports predefined
dashboards, search, the rule of correlation, visualizations, and reports for
real-time security monitoring, and incident response. Splunk ES and SE can be
deployed in the public or private clouds, or as a hybrid for software as a
service (SaaS) (Gartner, 2016). Splunk has the capability to identify potential
quickly to trigger human or automated response to stop the attacks before they
are completed.
LogRhythm’s main functions
LogRhythm’s SIEM platform can be
used as an appliance with software as virtual instance format to support an
n-tier scalable decentralized architecture. LogRhythm provides endpoint and
network forensic capabilities of a system process, file integrity, monitoring
NetFlow, full-packet capture. It also combines events, endpoint, and network
monitoring for the integrated incident response, automated response
capabilities. Platform Manager, AI Engine, Data Processors, Data Indexers and
Data Collectors can be a consolidated all-in-one for high performance in APTs
detection.
HPE
ArcSight’s main functions
HPE ArcSight SIEM platform consists
of (1) the ArcSight Data Platform (ADP) for log collection, management and
reporting; (2) ArcSight Enterprise Security Management (ESM) software for
large-scale security monitoring deployments; and (3) ArcSight Express, an
appliance-based all-in-one offering a design for preconfigured monitoring and
reporting, as well as simplified data management. ArcSight Connectors,
Management Center and Logger in ArcSight Data Platform provide log management,
data collection, etc. ArcSight modules can perform entity behavior analytics,
malware detection, and threat intelligence.
Tool assessment
Cloud computing is an ability to
control the computation dynamically in cost affordability, and ability to
enable users to utilize the computation without managing the underlying
complexity of the technology (Open Cloud Manifesto, 2012). With special system features like virtual machines, trust
asymmetry, semitransparent system architecture, etc., the cloud computing
system has special security issues. Sakr and Gaber (2014) recognized at
least ten (10) security issues in the list below:
1. Exploitation of Co-tenancy
2. Secure Architecture for the
Cloud
3. Accountability for Outsourced
Data
4. Confidentiality of Data and
Computation
5. Privacy
6. Verifying Outsourced Computation
7. Verifying Capability
8. Cloud Forensics
9. Misuse Detection
10. Resource Accounting and
Economic Attacks
With four advanced big data
security analytics IBM QRadar, Splunk, LogRhythm, and HPE ArcSight are they
suitable to address these security issues in the cloud computing
environment?
IBM QRadar’s
assessment
IBM QRadar includes QRadar Log
Manager, Data Node, SIEM, Risk Manager, Vulnerability Manager, Qflow and Vflow
Collectors, and Incident Forensics in its utility. These features will cover
several security issues such as monitoring security events, log information for
anomaly detection, cloud forensics, misuse detection, confidentiality of data
and computation. IBM X-Force Exchange is used to share threat intelligence, and
QRadar Application Framework supports IBM Security App Exchange. For an
incident response, QRadar uses Resilient Systems. Also, it can monitor a single
security event and response. QRadar covers the security issues (The list numbers:
1, 2, 3, 4, 6, 8, 9, and 10). Thus, IBM QRadar is suitable to be used in the
cloud computing environment.
Splunk’s assessment
Splunk’s architecture comprises
streaming input, Forwarders that ingest data, Indexers that index and store raw
machine logs, and Search Heads that provide data access via the Web-based GUI.
Splunk provides incident management, workflow capabilities, improved
visualization, and monitoring IaaS, and SaaS providers. It focuses on security
event monitoring, analysis use cases and threat intelligence. Splunk covers
cloud security issues (The list numbers: 2, 4, 5, 7, 8, and 10). Therefore,
Splunk is a good security analytics tool in cloud computing.
LogRhythm’s assessment
LogRhythm has capabilities of log
processing, indexing, and expands unstructured search with clustered full data
replication. It improves the risk-based prioritization scoring algorithm and
protocols for Network Monitor. It supports cloud services such as AWS, Box,
Okta, and integrations with cloud access security broker solutions like
Microsoft’s Cloud App Security and Zscaler. LogRhythm covers cloud security issues
(The list numbers: 2, 3, 5 , 6, 8, and 10). LogRhythm is also a popular
security analytics tool for cloud computing.
HPE
ArcSight’s assessment
HPE ArcSight is a SIEM platform for
midsize organizations, enterprises, and service providers. It provides DNS
malware detection, threat intelligence, and extends the SIEM’s capabilities.
Its SIEM architecture and licensing model that use interface to control
incoming events, incidents, and traffic analysis. It supports midsize SIEM
deployments for extensive third-party connector support. HPE ArcSight covers
cloud security issues (The list numbers: 2, 4, 6, 7, 8, and 9) and it is a good
fit for large-scale deployments such as cloud computing environment.
Targeted applications
In order to cope with
sophisticated attacks, big data analytics approach with advanced analytical
methods is applied on the collected huge data sets from various systems. The
big data security analytics approach can answer the question of “why will big data analytics help solve
hard-to-solve security problems?” because of at least three reasons. First
of all, the attacker will have no way to know exactly what is stored in the big
data storage such as cloud repositories. As a result, the attacker is not sure
how to conduct the attacks on the data sets. Secondly, analytical methods may
change over time. This prevents the attacker from eluding these analytical
methods. Thirdly, some of forensic marks or spots of the attacks cannot be
removed from the system. Conversely, forensic spots are evidence for security
analytics tools’ detection (Elovici, 2014). Therefore,
the advanced big data security
analytics tools can be applied in most of the fields, for example, Web
application, financial application, insider threats analysis, full-spectrum
fraud detection, and Internet-scale botnet discovery.
The advanced big data security
analytics tools should address the problems of security and analysis. Figure 2
illustrates the non-linear relationship between effectiveness and cost of the
security analytics tools.
Figure 2:
Source: Adapted from
Dr. Cross’s live chat lecture, 2017.
IBM
QRadar’s applications
IBM's QRadar Security
Intelligence Platform with the main functions of physical and virtual
appliances and IaaS can be deployed for multitenant capabilities in applications
such as cloud computing systems, Web applications, healthcare fraud detection,
healthcare such as health monitoring and patch management in help check
framework (ScienceSoft, 2016). QRadar is an advanced security intelligence to detect about
80% to 90% of the APTs.
Splunk’s applications
Splunk security intelligence platform
with specific SIEM features provides accurate data analysis on-premises hybrid
(both public and private) clouds in Web applications, insider threats analysis,
data streaming processing engines, and monitoring IaaS, and SaaS providers. Splunk
owns a SIEM platform with flexibility for various data sources and analytics
capabilities of machine learning and UEBA functionality. Similarly to IBM
QRadar’s effectiveness performance, Splunk’s performance is also in the range
of 80% to 90%.
LogRhythm’s applications
LogRhythm is a SIEM solution to
midsize and large enterprises with applications in forensic capabilities, file
integrity, and integrated incident response. LgRhythm is good for a Web
application, insider threats analysis, routine security automation, APTs
detection, and supporting AWS, Box and Okta, and cloud access security broker
solutions. Its effectiveness performance is likely in the same range as QRadar
and Splunk are.
HPE
ArcSight’s applications
HPE
ArcSight SIEM platform is used by midsize organizations and service providers.
It can be used in Security Operations Center (SOC). Its applications include
Web applications, insider threats, Internet-scale botnet discovery, user
behavior analysis,out-of-the-box third-party technology connectors and
integration, financial services, and DNS malware analytics.
Primary design for scalability
Scalability is a dominant subject in
distributed computing, particularly cloud computing. A distributed system is
scalable if it remains effective when a large number of users, data, or
resources are increased significantly. For example, a cloud system has an
ability to expand the hardware such as adding more commodity computers.
Scalability has two ways from scale-up or scale-out. Most of cloud computing
systems prefer to scale out because they can accommodate more data, users,
hardware, software, and services, improve system bandwidth, performance, and
reduce bottlenecks, delay, or latency. Scalability is used to overcome (1) some
parts of programs such as initialization parts that can never be parallelized,
and (2) load imbalance among tasks is likely high. The key design consideration
for scalability of these chosen advanced big data security analytics tools is
discussed below:
IBM
QRadar’s scalability
The IBM QRadar’s architecture
includes event processors for processing event data and event collectors for
capturing and forwarding data. It has deployment options range from all-in-one appliance
implementations or scaled appliance implementations using separate appliances
for discrete functions for scalability. Integrated appliance version 3105
provides up to 5000 events per second, 200,000 flows per minutes, 6.2 terabytes
storage. Integrated appliance 3128 provides up to 15,000 events per second,
300,000 flows per minutes, 40 terabytes storage (Scarfone, 2017). QRadar can
collect log events and network flow data from cloud-based applications
Splunk’s
scalability
Splunk applies a modern big data
platform that enables users to scale and solve a wide range of security use
cases for SOC (security operations and compliance). It provides flexible
deployment options such as using on-premises, in the cloud or hybrid
environments depending on the workloads and use cases. Splunk has fairly
limited capabilities of scalability because it requires a separate
infrastructure with different license model from Splunk Enterprise and
Enterprise Security (Scarfone, 2015).
LogRhythm’s scalability
LogRhythm helps customers to detect
and respond quickly to cyber threats before a material breach occurs. It
provides compliance automation and IT predictive intelligence. Its architecture
is decentralized and scalable into n-tiers. Components of Logrhythm’s architecture
consist of Platform
Manager, AI Engine, Data Processors, Data Indexers and Data Collectors that can
scale up into more tiers for more workloads (Symtrex, 2016). Its scalability allows access to the
data for analytics and reporting in iterative approach for incident
investigation.
HPE
ArcSight’s scalability
HPE ArcSight focuses on big data
security analytics and intelligence software for SIEM and log management
solutions. It is designed to assist customers to identify and prioritize
security threats, simplify audit and compliance activities, and organize
incident response activities (Morgan, 2010). For midsize SIEM deployments with
extensive third-party connector support, HPE ArcSight is a good fit for
organizations in building a dedicated SOC.
Pros and cons
The most important characteristics
of the SIEM products are the ability to access and combine data and information
from multiple sources, and the ability to perform intelligent queries on that
data and information (IT Central Station, 2016). According to Gartner (2016), the advanced big data
security analytics tools, e.g., QRadar, Splunk, LogRhythm, and ArcSight, have
some pros and cons in security information and event management.
IBM
QRadar
For the pros, QRadar supports the
visual view of log and event data in network flow and packets, asset data and
vulnerability, and threat intelligence. It can analyze network traffic behavior
for correlation through NetFlow and log events. Its modular architecture is
designed to support security event and monitoring logs in IaaS environments,
AWS CloudTrail, and SoftLayer. QRadar can be deployed and maintained easily in
either an all-in-one appliance, a large-tiered, or multisite environment. The
IBM Security App Exchange can integrate capabilities from the third-party
technologies into the SIEM dashboards, investigation and response workflow. Users
find that dashboards are the most helpful for an overview of traffic flow and
issues. Built-in rules and report are comprehensive and do work (Gartner, 2016).
For the cons, QRadar can monitor
endpoint for threat detection and response, or basic file integrity but it
needs third-party technologies. The integration of IBM vulnerability management
add-on receives mixed success from the users. IBM sale engagement process is
complex and requires persistence. Multiple Java versions for deployment setup
are inconvenient (IT Central Station, 2016).
Splunk
For the pros, Splunk’s monitoring
use cases in security drives significant visibility for users. Splunk UBA with
more advanced methods provides advanced security analytics capabilities in
native machine learning functionality and integration. It includes the
essential features to monitor threat detection and inside threat use cases. In
IT operations, monitoring solutions provides security teams with in-house
experience on existing infrastructure and data. the application of log’s
events for business needs is helpful. Operational intelligence is fast and
available across several servers.
For the cons, Splunk Enterprise Security supports basic predefined
correlations for user monitoring and reporting requirements only while other
vendors provide richer content for use cases. Splunk license models are using
gigabyte data volume indexed per day. This solution has the higher cost than
other SIEM products where high data volumes are expected. It recommends
sufficient planning and prioritization of data sources to avoid overconsuming
licensed data volumes. Splunk UBA requires a separate infrastructure and leverages
a licensed model different from Splunk Enterprise and Enterprise Security
licensed models. Setting up and adding new data resources should be improved. Operational
workflow and ticketing systems should be suitable for security operation
center.
LogRhythm
For the pros, SIEM capabilities,
endpoint monitoring, network forensics, UEBA and incident management
capabilities are combined in LogRhythm to support security operations and
advanced threat monitoring use cases. LogRhythm provides dynamic context
integration and security monitoring workflows with a highly interactive and
user experience. Its emerging automated response capabilities can execute actions
on remote devices. LogRhythm’s solution is easy to deploy and maintain
effective out-of-box use case and workflows. It is still very visible in the
competitive SIEM technology. LogRhthm creates a good feedback loop to allow
users to see off-limits activities. AI engine is the most valuable feature.
Out-of-the-box is very easy and intuitive to get started.
For the cons, even though LogRhythm
has the integrated security capabilities like System Monitor and Network
Monitor to enable synergies across IT from the deeper integration with the
SIEM, users with critical IT and network operations should evaluate them versus
related point solutions. Its custom report engine requires improvement.
LogRhythm that has fewer sales and channel resources than other leading SIEM
vendors may lead to fewer choices for resellers. The client must be
installed on the computer for all of the functions to work.
HPE
ArcSight
For the pros, HPE ArcSight supplies
a complete set of SIEM capabilities used to support a large-scale SOC. Its User
Behavior Analytics provides full UBA capabilities along with SIEM. It has
various out-of-box third-party technology connectors and integrations. HPE
ArcSight reduces the amount of time in the investigation. As a result, it
becomes one of the best at ingestions of events. It has very stable system components such as
connectors, logger, and correlation engines.
For the cons, HPE ArcSight proposals are more professional services than
comparable offerings. Its ESM is complex and expensive to deploy, configure and
operate that other IBM QRadar, Splunk, etc. ArcSight makes it in top four of
the advanced big data security analytics tools, but it decreases visibility for
new installs and increasing competitive replacements. HPE spins a development effort
to redo the core technology platform. Users should take cautions in development
plans for their needs.
Deployment of HPE
ArcSight is complicated and expensive. In custom applications, users need some
expertise in configuring ArcSight software. Correlation rules should be
simplified.
Conclusion
This Unit 5 Individual Project
presented the concept of the SIEM technologies in advanced big data security
analytics to detect, prevent, and mitigate the advanced persistent threats and
data breaches from smarter hackers. It makes sense to integrate security data,
improve incident detection/response and security operations, and then move on
to integrating the myriad of security management consoles, middleware, and
enforcement points afterward (Oltsik, 2014).
The leading security analytics
tools, i.e., IBM QRadar, Splunk. LogRhythm and HPE ArcSight were under report
in Gartner Quadrant (2016). Their primary functions such as anomaly detections,
event correlation, real-time analytics, etc. were discussed on each advanced
tool. The assessment of these tools was performed for their suitability and
appropriateness that could be used in the cloud computing environment. The
effective security analytics tools for the targeted applications and security
problems were studied. The key design was considered for scalability by each
tool. Also, the strengths and weaknesses of each advanced big data security
analytics tool were explained briefly in comparison in this document.
REFERENCES
Elovici, Y.
(2014). Detecting cyberattacks using big data security analytics. Retrieved
March 01, 2017 from https://www.youtube.com/watch?v=bP_1X-392pU
Gartner (2016).
Garner’s complete analysis in the siem 2016 magic quadrant. Retrieved March 7,
2017 from
https://logrhythm.com/2016-gartner-magic-quadrant-siem-report/?utm_source=google&utm_medium=cpc&utm_campaign=SIEM-Search-US&AdGroup=SIEM-General&utm_region=NA&utm_language=en.
IT Central
Station (2016). Compare ibm security qradar, splunk, hpe arcsight , and
logrhythm. Retrieved March 13, 2017 from http://www.csoonline.com/article/3067716/network-security/siem-review-splunk-arcsight-logrhythm-and-qradar.html?upd=1489463430396
Khan, N., Yaqoob,
I., Hashem, I. A. T., Inayat, Z. Ali, W. K. M., Alam, M., Shiraz, M., &
Gani., A. (2014). Big data: Survey, technologies, opportunities, and
challenges. The Scientific World Journal, 2014. Retrieved from http://www.hindawi.com/journals/tswj/2014/712826/
Morgan, T.
(2010). "HP eyes $1.46bn ArcSight security buy: Hey, Dell. Wanna bid
higher?". The Register.
Oltsik, J. (2014).
Big data security analytics can become the nexus of information security
integration. NetworkWorld. Retrieved from
http://www.networkworld.com/article/2361840/security0/big-data-security-analytics-can-become-the-nexus-of-
information-security-integration.html
Open Cloud Manifesto (2012). Open cloud
manifesto google group. Retrieved March 12, 2017 from
https://groups.google.com/forum/#!forum/opencloud.
Symtrex (2016).
Logrythm. Retrieved March 13, 2017 from
http://www.symtrex.com/security-solutions/logrhythm/?gclid=Cj0KEQjwhpnGBRDKpY-My9rdutABEiQAWNcslI_0E0TnnNdgbPw24A3vq_b_YZ35qRPH2WBYbZg3TugaAhfY8P8HAQ
Sakr, S., &
Gaber, M. (Eds.). (2014). Large scale and big data: processing and management.
Boca Raton, FL: CRC Press.
Scarfone, K.
(2017). Ibm security qradar: siem product review. Retrieved March 13, 2017 from
http://searchsecurity.techtarget.com/feature/IBM-Security-QRadar-SIEM-product-overview
Scarfone, K.
(2015). Splunk enterprise: siem product overview. Retrieved March 13, 2017 from
http://searchsecurity.techtarget.com/feature/Splunk-Enterprise-SIEM-product-overview.
ScienceSoft
(2016). Health check framework. Retrieved March 12, 2017 from https://www.scnsoft.com/services/security-intelligence-services/health-check-framework-for-ibm-qradar-siem.