In recent days, the topic of Big Data has taken leaps and bounds on the web, media, academia, and in industry white papers. If you are in the business of data management, data visualization, and analytics, I am sure you have read success stories, challenges, opportunities, and judgment statements related to Big Data. While there have been many success stories, organizations still face challenges developing a winning plan outlining the business benefits of a Big Data solution, including the initial size of their Big Data environment. The business benefit depends on the nature of your business, e.g., online media: it may be to increase online fan engagement/interactions, analyze web traffic and consumption to determine premium content, etc.; Cable and Telecom industry: it may be to increasing customer value through analyzing call transactions, to provide superior customer service through tracking customer location. Before you invest in a Big Data solution, you may consider deconstructing this complex topic by taking the following steps.
The journey starts with defining business opportunities, which may drive a need for Big Data. This requires a good understanding of your business model, the growth strategies data can drive, and the limitation(s) of your current data strategy. You should start with engaging stake holders within or across business units who may benefit from larger data sets for short term (next 6 months) and long term (18-24 months), and determine the value proposition.
The next step is to conduct a data assessment of each data source. The purpose of this step is to understand current data consumption and determine if they are still valid with changing business needs. For example, if the system was designed to keep 5 years of historical data, but you are using only 13 months of data for forecasting, then there is no need to keep that much history. The additional data not only takes up valuable storage capacity but also creates bottlenecks in application processing. As you may know, data redundancy and duplication grows over the lifetime of your system. And while some duplication is deliberate and necessary, assessing of your current system, data, and business drivers still makes a lot of sense. You can use the following model in your data assessment to make sure you address the key elements of data consumption. Once each data source is analyzed through this model you should compile the information to get a broader and deeper view of the system. The initial exercise may be time-consuming depending on level documentation available and availability of the subject matter expert, but it will produce a nice and reusable framework to leverage in the future.
Once you have the above information ready, meet with stake holders to align their needs with your findings, and streamline the data and processes around them. It takes you one step closer to understanding which data should be included in your big data solution.
Following to the data assessment, it’s important to conduct an assessment of the current infrastructure. It will help you to re-scope existing systems/applications, which can be a smart way to leverage existing investments. Also, it will help you to plan integrating big data solution into the existing infrastructure. You can use the following model in your infrastructure assessment. This model enforces taking deeper view of each infrastructure component:
- Database technology: RDBMS used (MS SQL server, Oracle, DB2, etc.), database version, utility/application used in data management
- Server: hardware, operating system (32 or 64 bit), memory, RAID configuration, etc.
- System Architecture: inbound/outbound interfaces, operational data store, data mart, web applications, business intelligence application, and analytics application
- Storage technology: storage type, capacity, and consumption for each application
And please don’t forget to evaluate your organization’s future data needs. In the coming months, how will the nature of your data evolve (web, call center transactions, mobile applications, etc.), how will formats change (structured and unstructured), and will you anticipate volume changes?
Once you complete the above steps, I recommend selecting top 2-3 initiatives and developing an execution plan. Starting small is the key: it helps to sell the plan to upper management and build the solution sooner so that power users can start to use the system. In return, you gain support and credibility from management and user community in the journey of Big Data. In terms of technology selection, there are few parameters to consider. If the need is to ingest 100+ millions of rows of structured and unstructured data per day with increasing frequency of data loading, the traditional database technology may not keep up. In a recent blog post, my colleague, Roman Lenzen, highlighted the Big Data technology stack.
In a recent client engagement, we developed a Big Data solution combining an existing Netezza platform with Hadoop. This allowed our clients to leverage the best of both technologies which we seamlessly integrated using Quaero’s Big Data Management Platform (BDMP) software. The Netezza appliance is used to run analytics on recent data sets, data visualization, and reporting; Hadoop is leveraged for analytics on broader data sets, sentiment analysis, and online storage of historical data. With this client’s growing business need for growing data volumes, this integrated solution scales linearly and nicely.
Are you on a Big Data journey? Please share your thoughts and drop me an email.