WHAT IS BIG DATA?

Qualified Writers
Rated 4.9/5 based on 2480 reviews

100% Plagiarism Free & Custom Written - Tailored to Your Instructions

Introduction:

 

This assignment will provide a summative assessment of your understanding of Big Data Systems and related technologies. Three mini-tasks that are to be completed have the following aims:

 

  • Introduce Big Data in the context of a given organisation (See Task 1)
  • Understand the problems of working with Big Data and describe technologies that specialize in catering for Big Data (See Task 2)
  • Use a software package that is designed for Big Data Systems to perform a simple analytical task (See Task 3)

 

Task 1 – Introduce Big Data

 

In The Context of Amazon; Amazon is an online book retailer that has expended its retail offering far beyond books over the last decade (www.amazon.com).

 

  • Define Big Data in terms of the four V’s. Describe how each V could apply to Amazon. (E.g. ‘Volume’ is one of the V’s. What data would Amazon likely to be capturing to qualify?)
  • Give an example from Amazon to illuminate your points for each of the 4 V’s discussed above                                                                                                                 (12 Marks)

 

Task 2 – Big Data Technologies

 

Hadoop is a technological framework that enables processing of large datasets at the scale of Big Data. Your task is to research and understand Hadoop. Your description should include:

 

  • What is Hadoop?
  • What are the technological challenges of working with Big Data?
  • How does Hadoop framework overcome abovementioned challenges? (10 Marks)

 

Task 3 – Big Data Analytics with Orange Software Package

 

The dataset that we will be using is contained in the file Titanic.tab that is made available on CloudDeakin under Resources->Assignment 3->Titanic.tab

 

This by no means is a Big Data set. In order to simplify the analytical task (as promised in lectures) we will settle for using a smaller and simpler dataset. Your task is to:

 

  • Analyse the full dataset using Orange and try to get an insight.
  • Take a random sample of 200 records and perform the same analysis. State your findings. Are your conclusions similar to what you have found previously? Explain why or why not.
  • Under what circumstances would it be permissible to use a random sample from a full dataset for analysis? Under what circumstances would it raise red flags? (11.3 Marks)

Price: £109

100% Plagiarism Free & Custom Written - Tailored to Your Instructions