WHAT IS BIG DATA?

Qualified Writers
Rated 4.9/5 based on 2480 reviews

100% Plagiarism Free & Custom Written - Tailored to Your Instructions

Introduction:

This assignment will provide a summative assessment of your understanding of Big Data Systems and related technologies. Three mini-tasks that are to be completed have the following aims:

Introduce Big Data in the context of a given organisation (See Task 1)
Understand the problems of working with Big Data and describe technologies that specialize in catering for Big Data (See Task 2)
Use a software package that is designed for Big Data Systems to perform a simple analytical task (See Task 3)

Task 1 – Introduce Big Data

In The Context of Amazon; Amazon is an online book retailer that has expended its retail offering far beyond books over the last decade (www.amazon.com).

Define Big Data in terms of the four V’s. Describe how each V could apply to Amazon. (E.g. ‘Volume’ is one of the V’s. What data would Amazon likely to be capturing to qualify?)
Give an example from Amazon to illuminate your points for each of the 4 V’s discussed above (12 Marks)

Task 2 – Big Data Technologies

Hadoop is a technological framework that enables processing of large datasets at the scale of Big Data. Your task is to research and understand Hadoop. Your description should include:

What is Hadoop?
What are the technological challenges of working with Big Data?
How does Hadoop framework overcome abovementioned challenges? (10 Marks)

Task 3 – Big Data Analytics with Orange Software Package

The dataset that we will be using is contained in the file Titanic.tab that is made available on CloudDeakin under Resources->Assignment 3->Titanic.tab

This by no means is a Big Data set. In order to simplify the analytical task (as promised in lectures) we will settle for using a smaller and simpler dataset. Your task is to:

Analyse the full dataset using Orange and try to get an insight.
Take a random sample of 200 records and perform the same analysis. State your findings. Are your conclusions similar to what you have found previously? Explain why or why not.
Under what circumstances would it be permissible to use a random sample from a full dataset for analysis? Under what circumstances would it raise red flags? (11.3 Marks)

Price: £109

100% Plagiarism Free & Custom Written - Tailored to Your Instructions