Exploding the Myths of Big Data – #1 (IT Toolbox Blogs)

Many companies have implemented big data applications. These applications consist of a very large data store, hybrid hardware and software to store and access the data, and a sophisticated software interface that accepts the queries of business analysts, accesses the data store, and provides answers that can be used to understand customer needs, simplify business transactions, and increase profitability.


As success stories (and failures) have appeared in the news and technical publications, several myths have emerged about big data.  This article explores a few of the more significant myths, and how they may negatively affect your own big data implementation.


Myth #1:  Big Data Applications can Stand Alone.


False. Your big data application certainly contains a lot of data. However, of equal importance is the analytics software used to query the data.  Analyzing business data is common, especially in companies that already have a data warehouse. The data warehouse contains time-dependent snapshots of operational data, and your current data marts and analytical reports depend upon dimensions in the warehouse.


Dimensions are entities by which an analyst would subset or categorize information. These include time, geography, customer type, store, department, and so forth.  A query that sums customer purchases of electronic items for retail stores in several states during the Christmas holiday season includes dimensions of product type (electronic items), stores, geography (state), and time (Christmas holidays). Each dimension gives a different way to summarize data, and may provide clues regarding customer preferences, item availability in stores, or profitability.


Big data applications require such dimensions as well. Since this data is already stored and maintained in your data warehouse, it is natural to integrate the data models of your warehouse and your big data application.


A natural outcome of this integration is that you will be upgrading your data warehouse so that analytical queries can encompass the warehouse data. A good enterprise data model and a comprehensive data dictionary are a necessity.


Warehouse upgrades will include adding new dimensions, inclusion of data from new operational systems, and storage of large objects such as scanned images and XML. This last is especially important, and was mentioned earlier in the discussion on budgeting. Large, complex objects may not be directly analyzable by your business intelligence software package, but basic information about them may be stored in the data warehouse. For example, XML documents can be decoded by some database management systems and stored in a database as tables. This table data may then be analyzed by the BI software.


big data
best practices

Source: SANS ISC SecNewsFeed @ May 31, 2016 at 10:09AM