Print Page   |   Contact Us   |   Sign In   |   Join
AEA Search
Featured Members
BALA PRASAD PEDDIGARIHyderabad Chapter Volunteer of the month!

IRM UK | Unified Data Delivery: From Data Lake to Enterprise Data Marketplace
Tell a Friend About This EventTell a Friend

This 2-day seminar looks at the challenges faced by companies trying to deal with an exploding number of data sources, collecting data in multiple data stores (cloud and on-premises), multiple analytical systems and at the requirements to be able to define, govern, manage, unify and share trusted high-quality data products in a distributed and hybrid computing environment.

3/10/2020 to 3/11/2020
When: 10 - 11 March 2020
Tuesday and Wednesday
Where: etc.venues Marble Arch
Garfield House
86 Edgware Rd
London W2 2EA
United Kingdom
Presenter: Mike Ferguson
Contact: +44 (0)20 8866 8366

« Go to Upcoming Event List  

IRM UK                                      

Unified Data Delivery: From Data Lake to Enterprise Data Marketplace

Use code AEA10 to receive 10% AEA member discount when registering!!

Register On-line:
10 - 11 March 2020, London

Seminar Fee 
£1,295 + VAT (£259) = £1,554


Governing and Integrating Data Across Hadoop, Cloud Storage, Data Warehouses, MDM & NoSQL Data Stores

Most organisations today are dealing with multiple silos of information. These include cloud and on-premises based transaction processing systems, multiple data warehouses, data marts, reference data management (RDM) systems, master data management (MDM) systems, NoSQL databases, cloud storage, Big Data NoSQL platforms such as Hadoop and content management (ECM) systems. In addition, the number of data sources is increasing dramatically.  Given this situation it is not surprising that many companies are struggling with knowing what is available, whether they can trust it and how they go about integrating it. Many have ended up managing information in silos with different tools being used to prepare and integrate data across different systems, with varying degrees of governance.  In addition, it is not just IT that is now integrating data. Business users are also doing it using self-service data preparationtools. It seems that everyone is blindly integrating data with no attempt to share what they create.  The question is, do we let this chaos continue or is there another way to govern and unify data across an increasingly complex data landscape that can speed up development and shorten time to value?

This 2-day seminar looks at the challenges faced by companies trying to deal with an exploding number of data sources, collecting data in multiple data stores (cloud and on-premises), multiple analytical systems and at the requirements to be able to define, govern, manage, unify and share trusted high-quality data products in a distributed and hybrid computing environment.  It also explores a new approach to organising your data in a logical data lake and how IT data architects, business users and IT developers can work together to build ready-made trusted data products that can be published in a data marketplace available to others to consume and use to drive value.  This new DataOps approach to unifying data includes data ingestion, automated data discovery, data profiling, tagging and publishing data in an information catalog. It also involves refining raw data to produce trusted ‘data products’ available as a service that can be published in a data marketplace (catalog) available for consumption across your company.  The objective is to reduce time to value by making ready-made data components available for rapid assembly to deliver new value rather than expecting every project to always have to start from scratch with raw data. We also introduce multiple data lake configurations as well as how data governance and data unification jobs can be implemented across multiple data stores. It emphasises the need for a common collaborative approach to governing, managing and producing ready-made trusted data sets for mass consumption across the enterprise.

Learning Objectives

Attendees will learn:

  • How to define a strategy for producing trusted data as-a-service in a distributed environment of multiple data stores and data sources
  • How to organise data in a centralised or distributed data environment to overcome complexity and chaos
  • How to design, build, manage and operate a logical or centralised data lake within their organisation
  • The critical importance of an information catalog in understanding what data is available as a service
  • How data standardisation and business glossaries can help make sure data is understood
  • An operating model for effective distributed information governance
  • What technologies and implementation methodologies they need to get their data under control and produce ready-made trusted data products
  • Collaborative curation of trusted, ready-made data products and publishing them in a data marketplace for people to shop for data
  • How to apply methodologies to get master and reference data, big data, data warehouse data and unstructured data under control irrespective of whether it be on-premises or in the cloud.
  • Fuelling rapid ‘last mile’ analytical development to reduce time to value

Course Outline

Module 1: Establishing a Data Strategy for Rapid Unification of Trusted Data Assets

This session introduces the data lake together with the need for a data strategy and looks at the reasons why companies need it. It looks at what should be in your data strategy, the operating model needed to implement, the types of data you have to manage and the scope of implementation. It also looks at the policies and processes needed to bring your data under control.

  • The ever-increasing distributed data landscape
  • The siloed approach to managing and governing data
  • IT data integration, self-service data preparation or both? – data governance or data chaos?
  • Key requirements for data management
    • Structured data – master, reference and transaction data
    • Semi-structured data – JSON, BSON, XML
    • Unstructured data – text, video
    • Re-usable services to manage data
  • Dealing with new data sources – cloud data, sensor data, social media data, smart products (the internet of things)
  • Understanding scope of your data lake
    • OLTP system sources
    • Data Warehouses
    • Big Data systems, e.g. Hadoop
    • MDM and RDM systems
    • Data virtualisation
    • Streaming data
    • Enterprise Content Management
  • Building a business case for distributed data management
  • Defining an enterprise data strategy
  • A new collaborative approach to governing, managing and curating data
  • Introducing the data lake and data refinery
  • Data lake configurations – what are the options?
    • Centralised, distributed or logical data lakes
  • Establishing a multi-purpose data lake and Information Supply Chain to produce data products for the enterprise
  • DataOps – a component-based approach to curating trusted data products
  • The rising importance of an Information catalog and its role as a data marketplace
  • Key technology components in a data lake and information supply chain – including data fabric software
  • Using Cloud storage or Hadoop as a data staging area and why it is not enough
  • Implementation run-time options – the need to curate data in multiple environments
  • Integrating a data lake into your enterprise analytical architecture

Module 2: Information Production Methodologies

Having understood strategy, this session looks at why information producers need to make use of multiple methodologies in a data lake information supply chain to produce trusted structured and multi-structured data for information consumers to make use of to drive business value.

  • Information production and information consumption
  • A best practice step-by-step methodology structured data governance
  • Why the methodology has to change for semi-structured and unstructured data
  • Methodologies for structured Vs multi-structured data

Module 3: Data Standardisation, the Business Glossary and the Information Catalog

This session looks at the need for data standardisation of structured data and of new insights from processing unstructured data. The key to making this happen is to create common data names and definitions for your data to establish a shared business vocabulary (SBV). The SBV should be defined and stored in a business glossary and is important for information consumers to understand published data in a data lake. It also looks at the emergence of more powerful information catalog software and how business glossaries have become part of what a catalog offers

  • Semantic data standardisation using a shared business vocabulary within an information catalog
  • The role of a common vocabulary in MDM, RDM, SOA, DW and data virtualisation
  • Why is a common vocabulary relevant in a data lake, data marketplace and a Logical Data Warehouse?
  • Approaches to creating a common vocabulary
  • Business glossary products storing common business data names
  • Alteryx Connect Glossary, ASG, Collibra, Informatica, IBM Information Governance Catalog, Microsoft Azure Data Catalog Business Glossary, SAP Information Steward Metapedia, SAS Business Data Network and more
  • Planning for a business glossary
  • Organising data definitions in a business glossary
  • Key roles and responsibilities – getting the operating model right to create and manage an SBV
  • Formalising governance of business data names, e.g. the dispute resolution process
  • Business involvement in SBV creation
  • Beyond structured data – from business glossary to information catalog
  • What is an Information Catalog?
  • Why are information catalogs becoming critical to data management?
  • Information catalog technologies, e.g. Alation, Alteryx Connect, Amazon Glue, Apache Atlas, Collibra Catalog, Cambridge Semantics ANZO Data Catalog, Denodo Data Catalog, Google Data Catalog, IBM Information Governance Catalog & Watson Knowledge Catalog, Informatica EDC &Live Data Map, Microsoft Azure Data Catalog, Qlik Data Catalyst, Waterline Data, Zaloni Data Platform
  • Information catalog capabilities

Module 4: Organising and Operating the Data Lake

This session looks at how to organise data to still be able to manage it in a complex data landscape. It looks at zoning, versioning, the need for collaboration between business and IT and the use of an information catalog in managing the data

  • Organising data in a centralised or logical data lake
  • Creating zones to manage data
  • New requirements for managing data in centralised and logical data lakes
  • Creating collaborative data lake projects
  • Hadoop or cloud storage as a staging area for enterprise data cleansing and integration
  • Core processes in data lake operations
  • The data ingestion process
  • Tools and techniques for data ingestion
  • Implementing automated disparate data and data relationship discovery using Information catalog software
  • Using domains and machine learning to automate and speed up data discovery and tagging
  • AI in the catalog – Alation, IBM Watson Knowledge Catalog, Informatica CLAIRE, Silwood, Waterline Data Smart Data Catalog
  • Automated profiling, PII detection, tagging and cataloguing of data
  • Automated data mapping and lineage discovery
  • The data governance classification and policy definition processes
  • Manual and automated data governance classification to enable governance
  • Using tag-based policies to govern data

Module 5: The Data Refinery Process

This session looks at the process of refining and curating data in an information supply chain to produce trusted data products

  • What is a data refinery?
  • Key requirements for refining data
  • The need for multiple execution engines to run in multiple environments
  • Options for refining data – ETL versus self-service data preparation
  • Key approaches to scalable ETL data integration using Apache Spark
  • Self-service data preparation tools for Spark and Hadoop, e.g. Alteryx Designer, Informatica Intelligent Data Lake, IBM Data Refinery, Paxata, Tableau Prep, Tamr, Talend, Trifacta
  • Automated data profiling using analytics in data preparation tools
  • Executing data refinery jobs in a logical data lake using Apache Beam to run anywhere
  • Approaches to integrating IT ETL and self-service data preparation tools
  • ODPi Egeria for metadata sharing
  • Joined up analytical processing from ETL to analytical pipelines
  • Publishing data and data integration jobs to the information catalog
  • Mapping produced data products into your business vocabulary
  • Data provisioning – publishing trusted, ready-made data products into an Enterprise Data Marketplace
  • The Enterprise Data Marketplace – enabling information consumers to shop for data
  • Provisioning trusted data using data virtualisation, a logical data warehouse and on-demand information services
  • Consistent data management across cloud and on-premise systems

Module 6: Unifying Big Data, Master Data and Data Warehouse Data to Drive Business Value

This session looks at how the data refining processes can be applied to governing, unifying and provisioning data across a Big Data, MDM and traditional data warehouses to drive new business value. How do you deal with very large data volumes and different varieties of data? How do you load and process data in Hadoop? How should low-latency data be handled? Topics that will be covered include:

  • A walk through of end-to-end data lake operation to create a Single Customer View
  • Types of big data & small data needed for single customer view and the challenge of bringing it together
  • Connecting to Big Data sources, e.g. web logs, clickstream, sensor data, unstructured and semi-structured content
  • Ingesting and analysing clickstream data
  • The challenge of capturing external customer data from social networks
  • Dealing with unstructured data quality in a Big Data environment
  • Using graph analysis to identify new relationships
  • The need to combine big data, master data and data in your data warehouse
  • Matching big data with customer master data at scale
  • Governing data in a Data Science environment

Module 7: Information Audit & Protection – Governing Data Across a Distributed Data Landscape

Over recent years we have seen many major brands suffer embarrassing publicity due to data security breaches that have damaged their brand and reduced customer confidence. With data now highly distributed and so many technologies in place that offer audit and security, many organisations end up with a piecemeal approach to information audit and protection. Policies are everywhere with no single view of the policies associated with securing data across the enterprise. The number of administrators involved is often difficult to determine and regulatory compliance is now demanding that data is protected and that organisations can prove this to their auditors.  So how are organisations dealing with this problem?  Are the same data privacy policies enforced everywhere? How is data access security co-ordinated across portals, processes, applications and data? Is anyone auditing privileged user activity? This session defines this problem, looks at the requirements needed for Enterprise Data Audit and Protection and then looks at what technologies are available to help you integrate this into your data strategy

  • What is Data Audit and Security and what is involved in managing it?
  • Status check – Where are we in data audit, access security and protection today?
  • What are the requirements for enterprise data audit, access security and protection?
  • What needs to be considered when dealing with the data audit and security challenge?
  • Automatic data discovery and the information catalog – a huge help in identifying sensitive data
  • What about privileged users?
  • Using a data management platform and information catalog to govern data across multiple data stores
  • Securing and protecting data using tag-based policies in an information catalog
  • What technologies are available to protect data and govern it? – Apache Knox, Cloudera Sentry, Dataguise, IBM (Watson Knowledge Catalog, Optim & Guardium), Informatica Secure@Source, Imperva, Micro Focus, Privitar
  • Can these technologies help in GDPR?
  • How do they integrate with Data Governance programs?
  • How to get started in securing, auditing and protecting your data

Who It's For

This seminar is intended for business data analysts doing self-service data integration, data architects, chief data officers, master data management professionals, content management professionals, database administrators, big data professionals, data integration developers, and compliance managers who are responsible for data management. This includes metadata management, data integration, data quality, master data management and enterprise content management. The seminar is not only for ‘Fortune 500 scale companies’ but for any organisation that has to deal with Big Data, small data, multiple data stores and multiple data sources. It assumes that you have an understanding of basic data management principles as well as a high level of understanding of the concepts of data migration, data replication, metadata, data warehousing, data modelling, data cleansing, etc.



Managing Director, Intelligent Business Strategies

Mike Ferguson is Managing Director of Intelligent Business Strategies Limited.  As an analyst and consultant he specialises in business intelligence / analytics, data management, big data and enterprise architecture.  With over 35 years of IT experience, Mike has consulted for dozens of companies on business intelligence strategy, technology selection, enterprise architecture and data management.  He has spoken at events all over the world and written numerous articles.  Formerly he was a principal and co-founder of Codd and Date Europe Limited – the inventors of the Relational Model, a Chief Architect at Teradata on the Teradata DBMS and European Managing Director of Database Associates.  He teaches popular master classes in Big Data, Predictive and Advanced Analytics, Fast Data and Real-time Analytics, Enterprise Data Governance, Master Data Management, Data Virtualisation, Building an Enterprise Data Lake and Enterprise Architecture.  Follow Mike on Twitter @mikeferguson1.

IRM UK Public Course  – Designing, Managing & Operating a Multi-Purpose Data Lake

Enterprise Data & BI & Analytics Keynote: Unified Data Delivery – Shortening Time To Value in a Digital Enterprise

Data Ed Week Europe 2019 One Day Course: Data Management in a Hybrid and Multi-Cloud Computing Environment

Register On-line:
10 - 11 March 2020
, London

Seminar Fee 
£1,295 + VAT (£259) = £1,554

Sign In
Login with LinkedIn

Latest News
AEA Events

9/24/2020 » 9/25/2020
INTERSECTION20 Joining Forces | Stockholm, Sweden

10/5/2020 » 10/6/2020
IRM UK | Data Modelling Essentials

10/7/2020 » 10/9/2020
IRM UK | Information Management Fundamentals

10/15/2020 » 10/16/2020
IRM UK | Pre-Project Problem Analysis: Techniques for Early Business Analysis Engagement

10/26/2020 » 10/28/2020
ONLINE EVENT: The Open Group | Digital-First


Join our AEA LinkedIn Group!

This website uses cookies to store information on your computer. Some of these cookies are used for visitor analysis, others are essential to making our site function properly and improve the user experience. By using this site, you consent to the placement of these cookies. Click Accept to consent and dismiss this message or Deny to leave this website. Read our Privacy Statement for more.