The Challenges with Visualization Tools
In the last few years, many enterprises have excelled at building large-scale data platforms that ingest and process huge volumes of structured and unstructured data and deliver information to the data consumers. The business value of all this information comes from using the data to generate insights for better decision-making and to automate decisions by embedding models into operational systems.
We have witnessed many architectural shifts in building data platforms, one of which is the rise of self-service tools such as Tableau and Looker. These have enabled non-technical users to undertake analysis independently by connecting to any data source and doing data preparation on the fly. In addition, these tools have resulted in the creation of pixel-perfect dashboards with a beautiful presentation of charts and tables.
As these self-service visualization tools become widely available, we see an interesting paradox emerge: all this ocean of readily available information does not lead to higher usage and better decisions.
There could be many reasons for this.
First is the sheer number of dashboards, causing information overload. We typically see enterprises having thousands of dashboards, many of them unused. Many of these have hidden data pipelines and non-standard metrics & attributes. This reduces user trust in the information available in these dashboards.
Another important reason could be that there’s too much information fluency. Research has shown that very high levels of information fluency do not always lead to better actions or any action at all. The easier the data is to read (well-designed tables and charts), the less you’ll end up acting on the information. People only understand information if they work to get it. The secret to getting things done with data is to play with it. To get people to act on information, there needs to be a little bit of information disfluency (not too much).
This is somewhat subtle, and it doesn’t mean that you need to make it challenging for users to get the data they want. However, you must provide an environment for users to build their own insights on top of clean, integrated, and harmonized foundational information. Today’s visualization tools are excellent for executive storytelling, but they are not the answer for operational users. With increasing features, non-technical users find that these tools are becoming quite complex for them to handle.
A Simple Tool for Operational Analytics
At LatentView Analytics, we developed a self-service data platform and successfully deployed it to our clients. Rather than providing access to hundreds of dashboards, we delivered a tool that helps users specify various data segments they require, using the business semantics they understand. In addition, the application is simple to use, and the visualizations are all tabular, which can be exported to any other tool of the client’s choice for further analysis.
The tool has been a smashing hit – it has led to significant viral adoption at all levels, resulting in better data usage, with anecdotal evidence suggesting improved decision-making and better access to information. In addition, users have tremendous internal demand to make this available to other parts of the business, especially those drowning in too many dashboards and having SQL access to data lakes.
Within six months of deployment in a Fortune 100 company, it has helped them save millions of dollars every year by delivering the promise of data democratization at scale.
Basic Design Principles
We designed the tool with the following design principles in mind:
- Easy to use for the non-technical user: A basic interface of a browser-based application, with logically organized tabs and intelligent defaults customized to various groups of users
- Minimalistic interface for operational users: No fancy charts, graphs, or other visualizations, but simple tables delivered through a browser interface
- Sophisticated but minimalist interface: Extremely user friendly with many advanced features that deliver the power of analytics in the hands of users to come up with a valuable/relevant insight rather than pre-defined metrics dashboards
- Transparent data model: Single, logical view of all the necessary data for each bounded context (Examples include customer relationship management, revenue management, and category management); No messy joins and partitions
- Scalable: The ability to support a large and growing no. of concurrent users within the enterprise ecosystem (tens of thousands) since this is built to leverage the cloud
- Fast: Low latency for bulk data analysis starting from TBs of data compared to traditional reporting platforms
- Cost-effective: No complicated licensing costs, low cost of operations, and ongoing upgrades. We plan to keep the tool simple, with a minimal feature set
- Rapid deployment: Once the use case is finalized, the product prototype can be customized and deployed in 2-3 weeks
A Quick Tour
Let’s take a quick tour of the how the operational analytics tool looks like. The functionality in the analysis tool is grouped into various tabs for ease of navigation. There is a top panel that shows the selections across tabs. Each of these tabs is configurable by the administrator (through an admin user interface), based on user roles and the domain context that the user has access to.
The Dimensions tab allows the user to select the key dimensions to be presented in the final report. Non-numerical data can be viewed, broken down, and compared (such as people, places, and products). These are entirely configurable.
Metrics are numerical values that can be displayed in the report. These values can be viewed, broken down, and used in calculations (including sales $, Quantity Cases, GP%, Cost). If you can perform math, it is probably a metric.
Use Basic Calculations to apply math operators (such as add or multiply) to build customized metrics of our choice. Calculations and SQL operations can be applied to any metric field.
The Report sharing feature allows users to create a report and share it instantly with a specific user or a user group. Strict authentication and authorization principles are applied to ensure data privacy and security as the reports are shared.
It’s a single view for all data definitions for dimensions and metrics to the end-users.
There are a few other features, especially for administrators. For example, the tool allows administrators to view usage and validate data quality based on daily loads. Below are some of the available key features.
- Save Reports: Once the report is built and output is generated, it can be saved and reloaded for later use
- Filters/Thresholds: Add filters to narrow down the analysis based on the requirement or thresholds to limit the results
- Download Reports: Once the query is built and submitted, the results can be exported to Excel or CSV for further analysis. It also allows exporting the SQL for the built query.
- Schedule Reports: Saved reports can also be scheduled to run on a scheduled frequency like daily, weekly, or monthly. The reports will be emailed to the users based on the schedule.
How We Built It
Behind the simple design lies sophisticated architecture. This combines the strength of data lakes and data warehouses to create a clean, integrated, and scalable data foundation. The data model is denormalized and can be optimized for common consumption patterns. Moreover, the application is partitioned by bounded contexts, and everything is customized for each context.
Data Platform serves as DaaS (Data as a Service) in a secure and scalable fashion in the following ways:
- It is designed to serve different bounded contexts within the enterprise domains. This enables us to deliver data as a service to the hands of business leaders, functional analysts, salespersons, and data scientists (no surprise)
- It is built as a serverless interactive querying stack using AWS components such as AWS Lambda, AWS S3, AWS Cognito, API Gateway, Simple Queuing Service, DynamoDB, and Redshift
- It delivers this using a dual-mode architecture (simple queries and complex queries). If the expected run-time of a query (calculated based on historical data and query parsing) is under 30 seconds, it kicks off the simple stack. If not, then it goes to the complex stack.
- An ANSI SQL compliant query is constructed based on the request JSON and is ready to hit the database. The query is submitted to a highly available FIFO (first-in-first-out) queue, with the SQS (simple queuing service) decoupling from the existing HTTP connection with the client
- Intelligent consumers of the query message identify the right time to hit the database (Redshift). The right time is defined by the availability of Redshift and IPs in the VPC (virtual private cloud). This is where a lot of engineering magic happens
- The dispatched queries are executed in a Redshift cluster, and the results are used by the client from S3 using a pre-signed URL
- While the platform is currently built on AWS, it can be also be deployed, with some modification, in any of the cloud service platforms of choice like Azure & GCP
An exciting part of the architecture is that it can handle the limitation of Lambda and API timeouts efficiently using a constant polling mechanism, thereby creating the possibility for executing long-running queries.
Let us know what you think of architecture. If you’d like to understand what we do and how we can help your business, get in touch with LatentView Analytics’ Data Engineering team at firstname.lastname@example.org