Q1. What is Splunk and where will you use it?
A1. Splunk is an enterprise-grade software tool for collecting and analyzing “machine data” like log files, feed files, and other big data in terra bytes. You can upload logs from your websites and let Splunk index them, and produce reports with graphs to analyze the reports. This is very useful in capturing start and finish times from asynchronous processes to calculate elapsed times. For example, here are the basic steps required.
Step 1: log4j MDC logging can be used to output context based logs and then
Step 2: upload that into Splunk to index and produce elapsed times for monitoring application performance.
Q2. What are the different ways to get data into Splunk?
- Uploading a log file via Splunk’s web interface.
- Getting Splunk to monitor a local directory or file.
- Splunk can index data from any network port. For example, Splunk can index remote data from syslog-ng or any other application that transmits via TCP. Splunk can also receive and index SNMP events.
- Splunk also supports other kinds of data sources like FIFO queues and Scripted inputs to get data from APIs and other remote data interfaces and message queues. for example, here is a simple scripted script via input.conf file.
<attrbute1> = <val1>
<attrbute2> = <val2>
Here is an example of using Splunk to write query against log files to monitor performance.
Step 1: Configure the Java application to output log statements as shown below using MDC (Mapped Diagnostic Context) feature offered by log-back or log4j.
2013-05-31 19:36:03,617 INFO [Camel (MyApp) thread #24 - Multicast] c.j.w.a.c.s.i.MyAppForecastServiceImpl - [groupId=48937,jobType=CASH_FEED,ValuationDate=2013-01-04] - Total Time spent on group level cashforecast feed - 225 ms
Step 2: Upload the test-myapp.log file via the Splunk interface.
Step 3: Once the file is uploaded, you can write the search queries to your requirement.
The query shown above is
search "Total Time" | search "group level cashforecast"
The Splunk search language is very powerful. Here is an extended search language.
search "Total Time" | search "group level cashforecast" | rex field=_raw "feed - (?<timetaken>\d+)" |
bucket _time span=5d | stats avg(timeTaken) as AvgDur, max(timeTaken) as MaxDur by _time
Q3. How does splunk with Nagios to add value to your business?
A3. Nagios is for monitoring your network services or your host resources and sending you alerts that something is wrong. Which is great. But now you have to get to the root of the problem. Splunk it.
Splunk uses powerful algorithms to automatically organize any type of IT data into events. It then classifies these events and discovers relationships between events of different kinds. Events are indexed by time, terms and relationships.
This is handy for detecting DoS attacks, bad SQL queries taking long time to execute, JMS queue depth reaching its max capacity, etc.
Q4. How does Splunk assist in resolving issues relating to performance, availability, operational, and security?
Complex distributed applications can introduce many points of failure. Problems are hard to find and fix. Splunk enables rapid problem investigation from a central location. You can write queries to spot the problem from a central location. Splunk integrates logs, configurations, messages, traps and metrics all in one place.
Splunk lets you search, alert and report on network in real time, and Navigates from symptom to root cause quickly with syslog, SNMP trap, configuration and netflow data all in one place.
Splunk runs in physical, virtual and cloud infrastructures and scales from a single server to the largest distributed environments.
In summary, use Splunk
— to examine data
— to correlate issues relating to performance or availability
— to analyse operational and security issues
with both real time and historical data.