What is BI? What is data analytics? What is big data?
What is BI? What is data analytics? What is big data and what can I use it for?
The reality is that most businesses have been using business intelligence (BI) and data systems in some form for decades, but many probably had fairly silo’d, manual setups. As time has progressed, data has become so fundamental we’ve come up with new terminology for it - and new job roles, like Business Intelligence Analyst and Data Scientist. Here we’ll cover the basics and answer a few of the most frequently asked questions about data and BI…
What do you know about your business?
- Can you see at a glance if you're meeting performance goals?
- Do you know who your best clients are? Do you know who your worst clients are?
- Do you know which products or services are your most successful?
- Can you predict your busiest and quietest times?
- Do you know the characteristics of your most profitable or most challenging clients?
- Which is your most successful target audience?
- Are you expediting your services in the required time frame?
- Do you have the necessary information available to you when you need it, to make the best decisions for the future of your business?
I could go on. You know what information you ideally need to be able to make your business more successful. Your historic and live business data holds many stories and a wealth of information that could help you fine tune your future business activities to be more informed, efficient, and successful. Do you have this information at your fingertips? Can you access it without technical support?
BI and data analytics provide the answers
A Business Intelligence & Data Analytics solution will deliver valuable business information and insight, when and where you need it. It doesn’t matter where your live data is located, it can all be accessed from a central BI solution.
The BI world is rich with jargon, so you may be familiar with:
- Business Intelligence
- Data Analytics
- Data Warehouses
- Data Marts
- Staging Databases
- Relational Databases
- Multidimensional Databases
- Cubes
- Big Data
- ETL
- Dashboards
- Data Visualisations
- Insights
- Impressions
- Data Governance
- Data Quality
- Self-Service BI
- Team BI
- Corporate BI
- Data Mining
- Pivot Tables & Charts
- On-Premise
- Cloud
- Analytics Ecosystem
What can BI and data analytics do for my business?
A well-designed BI and Data Analytics solution will:
- Eliminate guesswork
- Deliver faster answers to business questions
- Provide customer behaviour insight
- Identify cross-selling and up-selling opportunities
- Improve efficiency
- Show the true costs of your business
- Show where your business has been, where it is now and where it might be going
- Help you make faster, more informed, and more effective business decisions
- Highlight new insights
- Reduce fraud
- Help you measure risk more accurately
- Implement accurate predictive analysis
- Support better budget management
- Increase revenue
- Ensure you focus resources and reduce costs
What is BI?
This involves extracting data from data sources, transforming the data to a more useful format, loading the data into a more suitable repository and then analysing and mining the transformed data.
In short, stored information will be used to analyse historic and current data to:
- Identify trends in business activities
- Predict future business activities
- Assess the business performance
- Measure successes and failures.
Business Data may be located in a variety of data sources from complex relational database structures to more straightforward flat file style data structures. These data sources could be distributed across different systems and different networks. Today we also face the challenge of “Big Data” resources containing vast quantities of unstructured (No SQL) data – I will talk more about this later. Here are a few examples of data sources:
- Relational (SQL) Databases
- NoSQL Databases
- Hadoop
- Spreadsheet
- Document Management System
- Proprietary Business Systems
- Legacy Systems
- Third Party Systems
- Exported Text Files
Once all relevant data has been gathered and stored centrally, the real magic can begin. Relationships must be formed between entities within the business and the numeric values that we are measuring, new business values derived from stored facts will need to be generated. What do we mean by numeric values? Well here are a few examples:
-
Financial Values
- Sales Revenue
- Cost of Sale
- Running Costs
- Income
-
Measurements
- Length
- Volume
- Area
- Temperature
-
Counts of records/events
- Number of customers
- Number of orders
- Number of callouts
- Number of incidents
- Average values
- Highest values
- Lowest values
And what do we mean by entities? Well these are the things that we are measuring by, for example:
- Time
- Location
- Product
- Service
- Customer
- Supplier
- Employee
- Vehicle
- Device
- Promotion
- Tables
- Matrices
- Charts
- Dashboards
- Reports
- Excel Pivot Tables & Charts
- Slicing and Dicing
BI components explained
Business Intelligence Systems will generally comprise some or all of the following components:
-
Data
- OLTP/OLAP Databases, XML, CSV, Excel Workbooks, Access Databases Data Sources
- Data Warehouse
- Data Marts
- Multidimensional Cubes or Models
- Data Mining Structures
- Reporting Systems
- Client Analysis Tools such as Excel Pivot Tables & Charts, PowerBI, PowerPivot
- On Premise or Cloud-based Repositories
To support and implement the above components the following services will be required:
- ETL Service
- Database Management System (DBMS)
- Multidimensional Service
- Reporting Service
- On Premise or Cloud-based storage location
In many cases technical authors will design the reports and schedule them for refresh with subscriptions enabling users to keep up to date with the latest versions of reports. They would have the option of viewing reports through an easy to use web front end, a shared network location or via email.
The BI system will also enable more skilled staff to implement Data Mining to analyse business data and analyse business trends, and make future predictions on the basis of these trends.
Data sources
We have established already that our data is varied in structure as well as location. Here I will define the main categories of data sources:
- Relational (SQL) database
- No SQL Database
- Flat File Data
- Multidimensional Database
Relational Database
A Relational Database stores data in tables where each table holds records that represent a single object, for example: a customer, a product, an order, a device. Tables consist of columns and contain records (or rows). Relational databases are perfect for working with structured data where the bulk of relationships are of a one to many nature:
- one customer places many orders
- one device generates many status messages
- one vehicle makes many journeys
The structure of the data is known prior to loading data and therefore tables with columns can be created providing a rigid structure before data is loaded. The structure can be changed, but this invariably needs changes to applications and the loading or creation of missing data, which may result in some downtime.
No SQL Database
A No SQL Database addresses the challenge of a massive volume of data that is largely unstructured and prone to rapid change. A relational database cannot cope with such demands and if we try to use them for this purpose performance will be far short of acceptable given the volume of data involved and changes to the structure of the data would take far too long to roll out.
The key features of a No SQL Database are:
- More scalable than Relational Databases and offer a much better level of performance.
- Can be distributed over many geographic locations rather than being stored on one single location. Cloud computing provides the storage infrastructure for this.
- Do not require that a schema is rigidly defined before data is loaded as is the case with a Relational Database. This means that as new attributes are identified the schema can be changed on the fly rather than having to add a new column to a table which might then upset existing applications.
- Allows application changes to be made in real-time, whereas changes to relational database applications tend to require some downtime.
Flat File Data
Flat File Data is stored as a single flat record (no tables and lots of data duplication for certain attributes). Flat file data could include any of the following:
- CSV, TSV or Delimited File
- Fixed Width Text File
- Excel workbook/Spreadsheet
- XML file
Where the business data source is a legacy or proprietary system, data can generally be exported or extracted from these systems as text. The text may be delimited by any chosen character – the most common being a comma (CSV) or a tab (TSV) – and these text files can then be imported into a central repository.
Another common export and import format for proprietary business systems is XML, a tagged language.
Excel workbooks may be completely flat file in nature or may be more like a table, with data distributed over several worksheets.
Multidimensional Database
Multidimensional databases are designed around analysis requirements as opposed to the business process requirements that a relational database might provide. They are capable of holding millions or billions of records and calculating aggregate values very quickly.
In a multidimensional database we will work with Cubes, Measures and Dimensions, and Dimensions will consist of Hierarchies, Levels and Members. The term Multidimensional database is sometimes used to describe a Relational Database that has been denormalised and designed around aggregate analysis needs. Such databases would be termed also as Online Analytical Processing (OLAP) Databases and would adopt a star-like schema consisting of Fact and Dimension Tables. This database would then be a suitable feed for a Cube based multidimensional database.
Whereas with Relational Databases and NoSQL Databases records may change, data in a Multidimensional is predominantly historic and not changed after it is loaded, although this may not always be the case.
Extract, Transform, Load (ETL)
Data warehousing
A BI system enables global data to be centralised, merged and consolidated in one central location known as the data warehouse. The data warehouse stores static, read-only facts (although sometimes it might change over time it is predominantly read-only historic facts) often pulled from or derived from many different business sources. The underlying structure of the Data Warehouse database may be quite complex, although in general not as complicated as the highly normalised OLTP (Online Transaction Processing) databases that they are constructed from, and doesn’t present the data in a suitable format for business analysts or decision makers.
There are different styles of Data Warehouse design with one of the most common designs being the Multidimensional Database.
The Data Warehouse database (OLAP Database) will typically aggregate tables from OLTP databases implementing a star or snowflake schema, making the design less complex with fewer tables, but more repeated data. The tables will also be split into Fact Tables and Dimension Tables as mentioned earlier.
Populating data warehouses
Data Warehouses will generally be populated through an Extract Transform Load (ETL) system. This system will take data exported from a data source and will transform it to enable it to be imported into a Data Warehouse structure. This invariably involves cleansing data, converting data types, converting data from relational OLTP structures to Data Warehouse structures such as star schemas, and verifying the loaded data.
Data analysis
Data analysis is the process of building aggregations on the Data Warehouse data and drilling down into the business data to present management information, business trends and business predictions.
Reporting
Reporting is the process of presenting information to the end user. Business Information can be deployed to those that need it via email, network share locations or web interfaces to reports.
Mining
Data Mining is the art of looking beyond the surface and finding the stories that your historic data can tell.
Server solutions
Corporate BI solutions will require a centrally deployed server based solution.
Client tools
A common term used today when discussing BI solutions is Self Service BI. This term refers to a tool that enables a user to find out the stories behind their business data without the need to become a technical architect or specialist.
What is cloud computing?
As Microsoft helpfully explains, cloud computing is "the delivery of computing services - including servers, storage, databases, networking, software, analytics, and intelligence - over the Internet. There's much more on that here, if you're interested - Cloud computing explained.
What is data analytics?
Data Analytics describes the process of asking questions about your business. Think about those questions I listed at the beginning of this article, such as “Do you know which is your most successful product or service?”. A Data Analytics solution is part of a BI solution. It comprises client tools that enable us to ask these questions and delivers the answers often in a visual and succinct way.
What is big data?
Businesses utilise a growing number of cloud-based services which generate massive volumes of data, all of which is valuable to the business and needs to be analysed in some way. Big data is often in need of cleansing before analysis as we are probably trying to use this data in a way for which it was not originally intended to be used.
Big data is as it says – very large volumes of data. The challenging thing about big data is:
- It's often external to the company
- It's often cloud-based
- It's very dynamic
-
It's generally unstructured data. For example...
- Text messages
- Social Media data
- Website content
- It's not intended to be a data analysis source
- Standard traditional BI tools simply cannot cope with the sheer size of the data
A large amount of Big data is text based.
Some examples of system data that would be classified as Big data:
-
Equipment D=data
- Equipment status messages
-
Social Media
- Google+
- Linked-In
-
Search Engine data
- Google Analytics
A traditional Data Warehouse is not appropriate for centralising and hosting big data. Some data analytics tools such as Microsoft’s Power BI provide integration for big data sources so the client can pull data from many sources into a suite of dashboard reports, but given that Big Data sources often require a certain amount of cleansing before they are suitable for analysing this may not be the best approach.
Many people will also include traditional transactional database data as a part of the big data picture.
In that sense big data is an all encompassing term for all of our data :
- Structured (relational database), semi-structured (XML) and unstructured (Word, PDF, Text)
- Internal (On-Premise) and external (Cloud).
Big data platforms are available from vendors such as Amazon, IBM, Microsoft and Oracle, and fall into a couple of categories:
- Operational
- Analytical
Operational big data systems
Operational big data systems create data in real time with user interaction. The data is unstructured data such as that produced by Social media services and is hosted in a No SQL Database. These systems have the following features:
- Read and write requests.
- Many concurrent sessions (potentially hundreds of thousands)
- Selective queries
- Accessed by users/customers
MongoDB is an example of an Operational Big Data system.
Analytical big data systems
Analytical big data systems provide complex analysis services, services that are complementary to analysis services provided through SQL based systems. These systems have the following features:
- Read requests only
- Few concurrent users
- Unselective queries
- Accessed by Data Scientists
Massively Parallel Processing (MPP) and MapReduce systems are examples of Analytical Big Data Systems.
Hadoop
Hadoop is an Open Source offering written for GNU/Linux (although it is also available for Windows) that provides a Hadoop File System (HDFS), the MapReduce engine and a suite of tools and is fast becoming a key component in a corporate Business Intelligence solution, sitting alongside the traditional Data Warehouse rather than replacing it.
Hadoop is a framework consisting of four key components:
- Hadoop Common: The common utilities that support the other Hadoop modules.
- Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput access to application data.
- Hadoop YARN: A framework for job scheduling and cluster resource management.
- Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
Being open source it is an economical solution enabling unstructured data as well as structured data to be brought into a central HDFS resource. Schemas are interpreted when read rather than a pre-existing and rigid structure being required (as in the case of a relational database Data Warehouse), which is perfect for unstructured data, ever changing, such as social media data. HDFS can potentially be distributed over thousands of computers, thus distributing load.
In its native open source form a high degree of programming will be required to configure and implement a Hadoop solution, but there are a number of suppliers who are providing a more shrink wrapped offering of Hadoop.
Suppliers such as SAS, Tableau, Microsoft and Oracle provide integrations with hadoop to enable their analytic tools to be used against a Hadoop data source.
What is self-service BI?
A full blown Corporate BI solution can take years to research, plan, design and implement and requires highly skilled staff with expertise in many areas. Many of the larger organisations will have dedicated teams focussing on a Corporate BI strategy with centralised data, analytics and reporting and highly skilled technical staff and statistical analysts.
For many organisations, though, they need to get going a little quicker with analysing their data and they don’t have statistical analysts or BI staff at their disposal. For them a Self-Service BI solution is more appropriate.
A self-service BI tool will:
- Provide an easy to use user interface
- Integrate with the back end data sources containing business data
- Enable the user to produce dashboards
- Provide a front end to producing queries against business data sources
A self-service BI solution should enable a user to design, create and publish their own reports and analytical findings with no need for the intervention of an IT team that can prove to be a bottleneck in churning the required information out.
Common self-service BI tools include PowerPivot & Power BI from Microsoft, and Tableau Desktop from Tableau.
BI vendors
Many vendors offer products to satisfy Business Intelligence Needs so choosing the right tool set for your business can be a challenge. Here are some of the vendors who offer BI technology solutions:
- Oracle – Oracle Business Intelligence 12c
- Microsoft
- Qlik
- Tibco
- Tableau
- MicroStrategy
- OpenText
- IBM
- Information Builders
- SAP
- SAS
Their offerings range from Personal (self-service) BI solutions to full corporate BI solutions.
I think the first question to ask yourself is “Do I already have any BI software or servers in my business?” If you have any Microsoft SQL Server databases in your organisation then the answer will be yes. Microsoft is one of the few vendors who include their Business Intelligence services with their Database Engine platform.
Many businesses are turning to Microsoft for a Business Intelligence solution as it offers a budget friendly solution if your company already uses SQL Server.
Summary
We've simply scratched the surface of BI, data analytics and big data here, but hopefully you now have a clearer idea of what it's all about.
It's what businesses have been doing for years, but now instead of trying to do all the clever analytical stuff in an Excel spreadsheet that needs to be updated weekly, monthly, quarterly, or annually, there are fantastic tools at our fingertips giving us live dashboards that update with our data. This gives businesses the power to make much faster and more effective decisions, fully informed and back up by data.
Instead of business users having to gather Excel workbooks or CSV files from various parts of the business to import into our magic Excel workbooks we have centralised repositories automatically updated and permanently available.
Discover lots more information elsewhere in our blog.
References
With so many conflicting definitions of Business Intelligence and Data Analytics terms, and so many offerings from different vendors it is a confusing world for anyone trying to assess their needs for their business. I have come across a few helpful documents/articles that are worth a read:
Gartner’s Magic Quadrant for Business Intelligence and Analytics Platforms
Business Intelligence, Data Analytics and Data Mining Solutions Directory
The Best Self-Service Business Intelligence (BI) Tools of 2016
Eight Considerations for Utilizing Big Data Analytics with Hadoop from SAS.