This website uses cookies to store information on your computer. Some of these cookies are used for visitor analysis, others are essential to making our site function properly and improve the user experience. By using this site, you consent to the placement of these cookies. Click Accept to consent and dismiss this message or Deny to leave this website. Read our Privacy Statement for more.
Print Page   |   Contact Us   |   Sign In   |   Join
AEA Search
Featured Members
BALA PRASAD PEDDIGARIHyderabad Chapter Volunteer of the month!

What You Need to Know About….Big Data
Blog Home All Blogs
What You Need to Know About…Big Data with Eric Stephens and Michael Cavaretta By the AEA In the third installment of the “What EAs Need to Know About…” blog series we talk with Eric Stephens, Enterprise Architect at Oracle, and Michael Cavaretta, Technical Leader in Predictive Analytics / Data Science at Ford Motor Company, about one of the most hyped trends of the past few years, Big Data


Search all posts for:   


What You Need to Know About….Big Data

Posted By Administration, Friday, February 27, 2015

What You Need to Know About…Big Data

with Eric Stephens and Michael Cavaretta

By the AEA


In the third installment of the "What EAs Need to Know About…” blog series we talk with Eric Stephens, Enterprise Architect at Oracle, and Michael Cavaretta, Technical Leader in Predictive Analytics / Data Science at Ford Motor Company, about one of the most hyped trends of the past few years, Big Data.


In separate interviews transcribed together here, Stephens and Cavaretta both lent their perspective on how Big Data is affecting organizations today, the differences between the need for Big Data and the need for analytics, how to architect for it and the challenges of knowing how to best use what you’ve got.


The volume of data being generated today is having a huge impact on Enterprises today. How is Big Data affecting Enterprise Architecture?

Eric Stephens: I see it affecting Enterprise Architecture in a negative way—people are getting very focused on Big Data problems. I think there needs to be a broader discussion around the Information Architecture (IA), a focus on the capabilities within IA – including the ‘Big Data’ specific capabilities. By first organizing these capabilities and aligning them with what they are trying to accomplish then companies public and private can start leveraging. Look at the landscape and identify the sources of all data in an enterprise and how that informs a landscape, a flow—a supply chain, if you will—of data that eventually turns itself into information through action. That’s what’s important when we get into this conversation of Big Data—or more correctly—Information Architecture.


To quote Stephen Covey, EAs should begin with the end in mind. What do you want your stakeholders to see? What actionable intelligence do you want to gather and respond to? Whether it’s human consumable insights or machine-readable intelligence (think – IoT)


Mike Cavaretta: I think the biggest way that Big Data technologies are affecting Enterprise Architecture really have to do with the way the technology is implemented within the company. Taking a look at a lot of the work around building Data Lakes and how people are using the technology to sweep together a lot of different data sources to try and break across different data silos. I think those have really been things that, in particular, are challenging in the Enterprise Architecture space.


On one side you’ve got this big push to try and really get value from the data, and then on the other side you’ve got lower-level Hadoop implementations and looking at Big Data and people saying, we’ve got all these data warehouses, should we even be having data warehouses anymore? A few years ago it was, if you want to have a Business Intelligence (BI) implementation, you want to get value from the data and, you perhaps want to use predictive analytics. The first place you’d go was to a data warehouse, and I really think in the last few years that has changed. Now it is very much ‘make sure we’re looking at both sides.’ If we need a data warehouse, and we need that structure that’s fine, but maybe the first place we should be looking is over at Hadoop. I think that’s a really good thing.


Is there a role for Enterprise Architects to help their companies think about Big Data more strategically then?

MC: Oh definitely. I think one of the biggest things when you’re looking at people who work in Enterprise Architecture, it’s really their responsibility to come back and be able to push back in an appropriate way. When somebody comes in and says, ‘I want to build a Data Lake. We should take all of our data, and we’re going to look at these four different surveys and match it up with these two different transactional systems. We’re going to use Hadoop because I heard Hadoop’s good.’ Well, that probably doesn’t make any sense. The survey responses are low volume compared to the transactional system; it makes no sense to throw all of that into Hadoop. You would need to go back and say, ‘Look this is where Hadoop works, and if you’re talking about getting better value for the data, instead of spending the money on a Big Data solution, maybe you need to be spending the money on an analytics solution or some kind of categorization software or natural language processing. Something that can bring the value of the textual data, the unstructured data, and then match it up with the transactional data.’ Enterprise Architects need to be able to understand the business problem well enough to feel empowered to come back and say to the organization, ‘You’re asking for a technological solution, but really the business problem dictates something totally different.’


Are there industries that might be more affected by Big Data or is this a universal problem?

ES: I think it’s fairly universal. It won’t be limited to industries like the typical uses cases where you think about a lot of unstructured data. There are lots of industries such as telecommunications and finance that have enormous volumes of data. It may be structured, but nonetheless they need strategies to harness the information and be able to gain insights and take action upon it—sometimes immediately. I’m also thinking of healthcare, back to this Internet of Things idea where you get into smart health. Once we start, in fact we already are, instrumenting physical objects — I like to say from pill to windmill—then we start to produce larger and larger amounts of data. Whether it’s structured or unstructured, its still high velocity and high volume.


MC: I really like this question, and the reason I like it is that it gets at the root of something that I see as a really big problem, which is, people look at Big Data and I think a lot of times when the business is looking at a Big Data solution, what they’re really asking for is an analytics solution to answer, ‘How do I get value out of the data?’ Technically, Big Data is just the way to store and process data. That’s not getting the insights out of the data. You need different tools and resources to do that. Both are really important, but you need to make sure you have the right motivation. Are we talking about storing lots and lots of data and then processing it? Or are we talking about really getting insights from the data that we can reasonably process? This answer will be different across industries. I think all industries really need to have efforts in the analytics space, all the way from the regular BI dashboards, reporting, the basic stuff of knowing what’s going on, all the way up to the more predictive analytics.


The way we think about it here at Ford is that we divide things into three major categories. We call it Hindsight, Insight and Foresight. You can think of this as ‘what happened?’, ‘what’s happening now?’, and ‘what’s going to happen in the future?’


If you want to know what industries will be taking the value—particularly with Big Data—it’s mostly large companies that have the resources and are generating a lot of data to begin with. Those companies are probably going to be the ones that get the most value out of Big Data. Some start-ups are finding value with Big Data. In particular, those that have a narrow niche, because they found a spot where they can tap into that stream and do something with it. That’s been enabled by fantastic reduction in costs for Cloud processing, which is amazing, and I’m totally enthusiastic about it. But if you’re talking about analytics, I don’t think there’s an industry out there that wouldn’t benefit from better understanding of the data that they currently have.


What are some of the considerations EAs need to make when incorporating Big Data into their Architectures?

ES: I think the key consideration is they need to have a fundamental understanding of the overall Information Architecture capability within the enterprise. Then, look at evolving Big Data capabilities as an incremental step in their Information Architecture capability. Naturally, there are a whole new set of skills that can be added on, there are roles to be addressed such as data scientists

One thing that we can’t forget is information security. Especially if we’re talking about very sensitive information that, in some cases, are safety critical for these applications. So having the necessary controls at all levels and a traditional depth and breadth approach to all the information and access is going to be critical. 


MC: I think it’s a very different space from top to bottom. The biggest things that I see that are challenging in this space have to do with making sure that you’re getting value for the hardware, the software and the resources that are being put into this space.  From a technology perspective, it’s working through how to get started in the area, what technologies you want to start with, what vendors you want to look at, whether you want to use open source—those are all pieces.


I also think there’s a role for trying to understand what the expectations for the organization are. A lot of organizations are talking about ‘This sounds like great stuff. We know that we have Big Data, I hear that we should be using it, let’s go do it,’ as opposed to ‘We have these business questions, these use cases, these areas where we believe today’s data can help us. Let’s go out and build the infrastructure that allows us to go about doing it.’ I think the latter is more important and has a higher degree of success.


How much is the need to plan around large Data Architectures and Data infrastructure affecting the process of doing Enterprise Architecture right now?

ES: I think it’s picking up fairly well. The fact that Enterprise Architects are typically—but not always—coming out of the IT realm and Big Data is up there with the usual suspects with regard to hot technologies—like Cloud, IoT—they’re naturally going to gravitate in this direction. I’m also observing that it’s not just the traditional technical literature that’s talking about Big Data and gaining insights but it’s also the B-school journals looking at this seemingly IT-centric topic. And I say seemingly because it’s not about the IT mechanism. Its about treating data as an asset per the TOGAF® principle. And in some cases it’s treating it as a competitive asset or, for lack of a better term, a competitive weapon if you’re in a private sector context.


MC: I think the difficulty in this situation is the practical aspects vs. the hype. If someone truly needs to be planning for a Big Data architecture and Big Data effort, most of the time the upfront work should be significantly less. You’re talking relatively simple stuff.


The more complex piece is on the backend when you try and pull across different data sets that are within that Big Data structure and then look at the regular BI stuff or the more advanced analytics. That’s where the work comes in. But the benefit you get from that is you can go through the data sets without having to worry about the initial cost of setting up the structure, which is one of the detriments of the traditional data warehouse model.


Are there system design problems that need to be considered when it comes to incorporating Big Data into the Enterprise Architecture?

ES: I think it’s the usual set of quality attributes (or non-functional requirements) that continue to need to be addressed such as performance, scalability, security. One nuance when we talk about Big Data is we think about this in an analytical context. I’ve got 100 terabytes of data, and I’ve got time to crunch it and then devise some insights from it. But there are other contexts where we need to think about fast data – contexts where you’re using a similar genre of tools to process data that’s happening in ‘real real-time,’ as we see in financial markets (trades) or telecommunications (CDR records). It’s not about seeing where people are going or what people are buying. In the financial context it could be fraud detection in real-time. It could be detecting a cyber attack. On the telecom side, you’re also looking at the health of your network. So being able to process that information in a very rapid time period and then turn into around into actionable insight in a deterministic way.


There are other contexts where this is applicable as well. I think about the Google car. Or the America’s Cup that Oracle sponsored. They were using the same technologies and principles in the Google car and in the yacht—you’re collecting large amounts of data in real-time and devising insights and taking action in a matter of minutes. In the case of the Google car, it’s fractions of a second. When Big Data starts to move into a real-time or safety critical concept, a hard systems engineering approach may be necessary for nailing down your performance characteristics. Like other software-intensive projects for the last 50-60 years, it comes down to identifying the functional and non-functional requirements and how the information’s going to be used.


What are some best practices EAs can make in incorporating Big Data/Data Architectures?

ES: I think to put it simply, follow your process. Follow your process around requirements engineering, follow your process around enterprise architecture development. Follow your process around technology adoption, standardization. Allow for some experimentation and "tire-kicking” with the technology. While I advocate for a disciplined approach there still needs to be that realm and that ability to experiment with proofs of concept and so on. Partner with your vendors to determine the best approach. (P.S. They’ll want to partner with you). And ensure that your vendors are providing that capability. But treat it like any other technology adoption—look at it as an incremental add to your overall technology portfolio and your information architecture capability portfolio. I would also add, don’t forget about the Internet of Things. There’s crossover when we talk about Big Data or when we get into fast data and some of these operational contexts of how that might work. Finally, remember the security aspects of this—that we’re producing lots and lots of additional information. Make sure that the right audits are performed and make sure that the right controls are in place so that you don’t experience breaches like the mass exfiltration at Sony or the recent hack at Anthem.


MC: I think the biggest thing to do, if you’re going into a space where you haven’t worked with any of the Big Data products before, start with the lowest cost solution that you can go for. There’s great low cost tools that you can get that are open source. You’d be surprised at how cheap hardware, really cheap stuff, can be used to process fairly large volumes of data. I know that Facebook has—they’ve opened up their designs for using commodity hardware. There’s stuff out there that you can use to get your feet wet and get some experience with the tools and start to derive value. That’s the place that I’d suggest you start out with, if you’ve got the time.


The key piece there is that you keep your iteration small. So every few weeks look at releasing something or designing something that provides some value somewhere, and then you can bootstrap your way along. Then at some point, you can decide that you want to go a little bit bigger and you want that extra support, maybe move to buy versions of the software and work at it from that perspective. Particularly in a space where you can spend a whole lot of money and collect a whole lot of data without actually showing any value, those quick iterations can really make a big difference.


What are the challenges and benefits of architecting for Big Data?

ES: I think some of the challenges aren’t so much around the technology and getting the information but rather with understanding what to do with it later. Some areas that go beyond Big Data, like discovery tools. Tools where you point these technologies at this mound of Big Data, and it’s not necessarily there to give you answers, but it’s there to help you generate the right questions of the data. I think the challenge is determining the right questions to ask once you accumulate the data. I don’t think, in 2015, getting the data in any context is a challenge. Knowing how to filter the noise out, whether it’s my newsfeed in the morning or data coming in off a cell tower or from the financial markets. It’s knowing how to separate the noise and understanding what your business stakeholders want. It is beginning with the end in mind and focusing on solving the business problem(s) at hand. It’s understanding what the business problem is you’re trying to solve. Otherwise any Big Data (Information Architecture) effort will be reduced to a fruitless IT science fair project with Big Data wrongly positioned to solve nearly all IT-related problems.


I think the benefits are that we can gain incredible insights into consumer behavior whether in finance, telecommunications, retail, or health care. The ability to instrument the human body may enable us to detect various ailments or vital signs in real-time. Other hardware and software vendors such as Apple are starting to prepare for that. In the end, we must determine how does Big Data helps business stakeholders improve profits, improve customer experience, or improve health outcomes. We get excited about Big Data, but what matters is what your shareholders, customers, and other stakeholders think about — business outcomes. Period.


MC: Overall, Big Data implementations can provide value in two primary ways. First of all, if you’re going across different data silos in the company. A lot of times just taking data sets from two different silos and putting them together provides a lot of value. You can do that a lot quicker with the Big Data technology because you don’t have to worry about the structure.


The other piece is that generally the Big Data technologies can handle unstructured and semi-structured data much better than a Data Warehouse, as well as being able to handle volumes that are much higher. I remember talking with someone at T-Mobile, and she was talking about how they put together how much data they store. What they do is that they have a tiered process. The first was 90 days—they store everything, absolutely everything. Based on that, they can do their studies, they can see what points are the most important, here’s some analysis we want to do. If that turns out to be valuable to the company, then they roll into the more structured data and they build KPIs off them into a rolling six month to two years worth of data. Each time you go out in time, the longer the time series, the longer the history, the more aggregated the data is. But it’s very valuable to keep the most raw data—you’ve got to have some data that is you keep at the most raw level, so when you have those ideas and want to do those experiments, you have that most raw data to start with. In those types of circumstances, especially when you’re talking about data exhaust or log files or machine data or sensor data, the Big Data technologies really beneficial.


What advice would you have for EAs as they plan around Big Data?

ES: My advice is to not plan for Big Data. Plan for Information Architecture and incorporate the Big Data capabilities—high-volume, less unstructured, data acquisition — and how that fits in with an overall catalog of Information Architecture capabilities. Focus on the end game. What are you looking to get out of it? Not you as the IT person, not you as the Enterprise Architect but what is your CEO, CFO, CMO, CDO, and ultimately your customer, looking to get out of that?


MC: Of course the first thing would be, make sure you push back on the hype. Make sure there’s a need and that you understand the need inside the company for using these types of tools and technologies. That’s going to be making sure that you have good connections with the business and also partially that’s going to mean that you have good connections with the analytic resources. Of course, depending on the size of the business, maybe you are the analytic resources in which case it will be a lot easier.


As things move along within the company and the thinking is to add more data resources into the company, the first thing that should be thought about is where you’re going to place this stuff. There’s going to be a lot of pressure to throw everything into a Data Lake and Hadoop from the very beginning. That’s probably going to be right most of the time, but not every single time. Be thinking about that up front so when asked those questions, you’ve got a good response.


Just like in the data warehouse arena when those were popular and really getting off the ground 10-15 years ago, there was the idea that ‘We’re going to take a pot of money and were going to build this giant, monolithic thing, and it’s going to be fantastic and gorgeous and everyone’s going to say this is the best thing ever.’ Those, most of the time, didn’t work out. With the best of intentions, everybody wanted to make sure that everything was battened down and tight and perfect, but by the time you delivered it there was probably three other data sources you should have put in if you would have known about them, but you didn’t. The value to the business that you were going to deliver didn’t happen because the business changed and probably some of the people changed, too. 


When you think about the Big Data technologies, the same things are still true. At the very core, it’s all just data. The last piece when you’re thinking about planning for Big Data has to do with, where is the data coming from? It’s one thing to say we’ve got these transactional systems and maybe we’re aggregating data by the day and maybe we’re not going to aggregate data by the day, we’re going to take the most raw, pointed sales data that we can have. But that’s going to be a much different problem than if you have something where you have an Internet of Things implementation—installing devices and sensors and putting in this mesh network and. Those are very different situations.


 As an Enterprise Architect, you’ve got to be thinking about where is the data coming from, what’s the data latency going to be, where am I going to store it, what’s my landing zone going to look like? Eventually, what kind of aggregations do I need, what kind of reporting, what kind of analytics do I need? How much of the data do I keep across what time frame? All of those different pieces need to be thought about. But the key piece is always going to be, it’s best to just start collecting the data as fast as you can and get it off the ground than it is to spend months and months trying to figure out every nuance of it before you start turning on the switch. Because it’s easier to get rid of data and purge it than it is to say, I wish I would have started six months ago.



Eric Stephens is an Oracle Enterprise Architect & Oracle Business Architect at Oracle. Comments expressed by Eric are his own and not of Oracle.


Michael Cavaretta is Technical Leader in Predictive Analytics / Data Science in the Research and Advanced Engineering of Ford Motor Company.





This post has not been tagged.

Share |
PermalinkComments (0)
Sign In
Login with LinkedIn

Latest News
AEA Events

9/24/2020 » 9/25/2020
INTERSECTION20 Joining Forces | Stockholm, Sweden

10/5/2020 » 10/6/2020
IRM UK | Data Modelling Essentials

10/7/2020 » 10/9/2020
IRM UK | Information Management Fundamentals

10/15/2020 » 10/16/2020
IRM UK | Pre-Project Problem Analysis: Techniques for Early Business Analysis Engagement

10/26/2020 » 10/28/2020
ONLINE EVENT: The Open Group | Digital-First


Join our AEA LinkedIn Group!