What do data analysts, engineers & scientists do?

Today’s world run on data and no organisation would survive without data-driven decision making and strategic planning. There are several roles in the industry today like data analysts, data engineers, data scientists & business analysts that deal with data. Some of the skills required overlap among these roles. For example, SQL, Microsoft Excel, Data visualisation & basic data collection & management skills are must have for all these 3 roles. The data engineers must have solid programming & software engineering skills whereas the data scientists must have skills in maths, statistics & algorithms.

Data Analysts, Engineer & Scientist

Reference: https://k21academy.com/microsoft-azure/data-science-vs-data-analytics-vs-data-engineer/

Q01. What do you understand by the terms Descriptive, Predictive & Prescriptive Analytics
A01. Businesses analyse various data points (e.g. historical, social media, IoT, etc) to derive insights that help executives, managers and operational employees make better, more informed business decisions.

Descriptive Analytics

Descriptive analytics is used for analysing “what has happened?” by using historical data that is collected, organised and then presented in a way that is easily understood. Descriptive analytics is NOT used to draw any inferences or predictions from its findings. Descriptive analytics is used for reporting KPIs (i.e. Key Performance Indicators) like Sales Volumes, Gross Adds, Revenues, Churn Rates, Attrition Rates, Profit Margins, etc. It uses simple maths and statistical tools to calculate averages, percentages, sum, cumulative totals, etc and visual tools such as graphs & charts. SQL analytic functions interview questions.

Predictive Analytics

Predictive analytics is used for analysing “what could happen?” based on probabilities. Predicting the likelihood of customers purchasing another product, detecting fraudulent behaviours, identifying potential security breaches, predicting the resourcing requirements, etc. It makes use of techniques such as such statistical modelling and machine learning algorithms like classification, regression and clustering techniques. Predictive analytics attempts to forecast possible future outcomes and the likelihood of those events. To make predictions, machine learning algorithms take existing data and attempt to fill in the missing data with the best possible guesses. For example, next-best offer & next best action recommendation engines make use of the predictive analytics solutions to identify the products or services your customers are most likely to be interested in for their next purchase. This methodology empowers executives and managers to take a more proactive approach to business strategy and decision making. General Big Data, Data Science & Data Analytics Interview Q&As.

Machine learning, which is a subset of artificial intelligence (i.e. AI) is a tool that automates predictive modelling by generating training algorithms to look for patterns and behaviours in data without explicitly being told what to look for.

Prescriptive Analytics

Prescriptive analytics is used for analysing “what should happen?” based on what has been learned through descriptive and predictive analysis and goes a step further by recommending the best possible courses of action for a business. Used for assessing risks, improving patients care, price modelling, improving equipment management, etc. This is the most complex stage of the business analytics process, which requires more specialised analytics knowledge, hence rarely used in day-to-day business operations.

Both the predictive & prescriptive analysis require a large volume of data.

Q02. What do data analysts & scientists do?
A02. Data analysts & scientists analyse the data prepared by the data engineers. The data analyst analyses numeric data and uses it to help companies make better decisions whereas the data scientists analyse and interpret complex & big data. The data scientists must sift through a very large volume of data to identify hidden patterns & refine business metrics by developing & testing hypothesis. The data analysts often perform the Descriptive Analytics, whereas the Data Scientists perform the Predictive & Prescriptive analytics.

The data analysts & scientists are often responsible for the below tasks with the help of the business analysts, data engineers & dev ops engineers.

1) Identifying the data that they want to use. 13 Data Warehouse interview Q&As – Fact Vs Dimension, CDC, SCD, etc | Data categories in Data warehouse & Data lake.

2) Collecting the data from the different source systems by building the automated data pipelines. Data Lake Vs. Data Warehouse Vs. Delta Lake | Lambda, Kappa & Delta Data Architectures Interview Q&As – Overview | Hadoop based Big Data architecture & basics interview Q&As.

3) Cleaning the data in preparation for analysis. Cleansing & pre-processing data in BigData & machine learning with Spark interview Q&As.

4) Modelling the data based on the usage patterns. Data modelling interview Q&As.

5) Analysing the data & Interpreting the results of the analysis.

The data engineers work together with the analysts & scientists.

Q03. What tools do you use to analyse the data & interpret the results?
A03. A wide variety of tools are used to analyse the data & interpret the results.

1) Firstly, and most importantly SQL. SQL interview Q&As.

2) Microsoft Excel using functions, pivot tables, etc.

3) Business Intelligence (i.e. BI) reporting tools like Tableau, Microsoft Power BI, SAS, etc.

4) Notebooks like Jupyter Notebooks, Databricks notebook, Zeppelin notebook, etc using programming languages like Python, Scala, etc & APIs like Apache Spark.

Q04. What does a data engineer do?
A04. Data engineers build automated data pipelines to collect the raw data from various source systems, manage the data (e.g. data governance, metadata management, data lineage, data privacy, access control, etc) and convert the raw data into useful information for business analysts & data scientists. The data engineers rely on a variety of programming (E.g. Python, R, Scala, Java, etc), ETL tools (E.g. Alteryx, Informatica, Talend, etc), data management tools (E.g. File storages like HDFS, object storages like AWS S3, SQL, NoSQL, Apache Spark, Apache Kafka, etc) and automation tools (E.g. Jenkins, Control-M, Apache Airflow, Shell scripting, etc) for managing and building data lakes & data warehouses.

Data Lake Vs Data Warehouse

Source: https://orzota.com/2018/02/16/enterprise-data-lake/

Data engineers work with raw data sets that may contain all sorts of reliability issues & errors that can hardly present value to data analysts & scientists. To make it usable, a data engineer needs to build reliable data pipelines that cleanse, transform & enrich data. Pipelines connect data between systems and transfer data from one format into another. Engineers also need to refine the pipelines continually to make sure the data is accurate.

300+ Java Interview FAQs

Java & Big Data Tutorials