UFCF8H 15 M
UFCF8H-15-M Big Data
Overview
The theory and practice of big data management and how it connects to organisational goals.
Objectives
- Understand the importance of data for business applications and the difference between data, information and knowledge in terms of their uses
- Understand the challenges in storage and retrieval of small and large amounts of data, and the difference between SQL and NoSQL databases
- Apply problem solving skills necessary for identifying the organizational needs to employ a SQL or NoSQL DB
- Understand the four dimensions of Big Data i.e. volume, velocity, variety, veracity, which are important challenges the delivery of business benefits from Big Data
- Be able to apply problem-solving skills to address the challenge of extracting useful data and application of data quality checks
- Master various ways to improve data quality by understanding why data quality is a business problem
- Apply knowledge modelling skills to generate ontologies to define domain knowledge and relationships between entities, and use them for information retrieval purposes
- Demonstrate knowledge of Big Data management using Cloud computing and associated privacy and trust issues
Curriculum
Data Storage and Retrieval: Importance of data for business. Understand the difference between data, information and knowledge. Traditional ways to store and retrieve data. Big Data challenges and opportunities.
Introduction to Big Data: Defining Big Data: Sources of Big Data. The four dimensions of Big Data - volume, velocity, variety, veracity. Introducing storage and MapReduce. Business application of Big Data: Big Data applications/examples in business. Delivering business benefit from Big Data. Establishing the business importance of Big Data. Addressing the challenge of extracting useful data/knowledge. Integrating Big Data with traditional data.
SQL Databases vs. NoSQL Databases: Understand the growing amounts of data. The relational database management systems (RDBMS). Capabilities of traditional RDBMSs. Overview of Structured Query Languages (e.g. SQL). Introduction to NoSQL databases. Understanding the difference between a relational DBMS and a NoSQL database. Identifying the need to employ a NoSQL DB.
Storing Big Data: Analysing data characteristics: Selecting data sources for analysis. Introduction of selected Big Data stores from the following list: Hadoop, Cassandra, Amazon S3, BigTable, etc.
Achieving Data Quality: Introduction to data quality. Why is data quality a business problem? Problems when data is not “fit for purpose”. Preparing data. Ways to improve data quality. Understand ETL - Extract, Transform, Load procedures to improve Data Quality.
Knowledge-based Information Retrieval: Introduction to knowledge-based information retrieval. Use for ontologies for knowledge modelling. Learn how to build an ontology to link knowledge with data. Using ontologies for information retrieval – case study. Machine learning for knowledge acquisition: Introduction to machine learning and pattern recognition. Capabilities of different modelling, analysis and algorithmic techniques.
Big Data and Cloud Computing (technology, challenges and trends): Cost of storing Big Data. Is cloud computing a solution? Issues: privacy and trust. Future of Big Data and cloud computing. Future research trends in Big Data.
Assessment
Report (75%)
Presentation (25%)