What is BIG Data | Hadoop and their architecture – TechnicalTeacher

In this tutorial,Firstly , you will learn what is BigData? and then What is HDFS? at last you will cover Hadoop and their Architecture.

What is BIG Data

What is BIG DATA?

The data that is , always increasing in size and can’t be processed and stored on a single machine is called as BigData. In other words, we can say a large volume of data.

Where Big data is used?

We are using bigdata in Social Networking sites, Healthcare, Banking, Education etc.

One live example like when you start searching any product on Amazon then it provides recommended data and similar products as per your searching criteria.

Big Data used

BigData cluster-

In this cluster,the machines are connected to each other via network, to acts as a single system.

Machines are commodity hardware (CPU+RAM) and these are stacked together on a rack. These racks are installed in physical location called as Data_centers.

  Big data pipelines-

 There are some steps that are-

 1-Big data ingestion –(Sqoop/Flume)

 The data is coming from different and multiple sources.

 2-Data validation and cleanup & processing (Spark)

 In this phase, we validate and cleanup our data and process the data.

 3-Data analysis (Hive)

 In this phase, we do some data analysis as per business requirement.

 4-Data visualization (Tableau)

 We can create report that helps to communicate information clearly to users. 

BigData Pipeline

  

What is HDFS?

It stands for Hadoop Distributed File System.

1-Primary data_storage

HDFS is primary data_storage system under hadoop applications.

2-Distributed File system

When we use distributed file system?

When data becomes large enough to accommodate on a single machine it becomes necessary to break it and distribute on multiple machines.

3-Block size (128MB)

HDFS stores every file as a block.

The default size of a block in HDFS is 128MB.

4-Fault Tolerant

It also replicates (creates exact copy of) those blocks to provide the fault tolerance in case of failures.

The default Replicator Factor is 3.

For Example-

You have 1GB of data.The block size in HDFS is 128MB then it creates 8 blocks.

1GB=128MB

1024/128=8 blocks

Replicator Factor: 3 (creates  exact 3 copies)

Block size and Replicator Factor default provided by hadoop.

You can change block size and replicator factor as per your convenience.

Architecture in Hadoop-

Hadoop uses master and slave architecture.

1-Name node– stores meta information.

It knows which block of file goes to which machine.

Name node is responsible for dividing the file and storing all meta information.

2-Data node-

This node stores all data related information.

We have one name node in cluster that act as master node and several data nodes that act as slave nodes.

I hope, you have understood the basic concepts for bigdata.Also, learn about hadoop and their architecture.

Thank you!! Keep Reading.

 
 

1 thought on “What is BIG Data | Hadoop and their architecture – TechnicalTeacher”

  1. I do not know if It’s just me or if perhaps everyone else encountering problems
    with your site.
    It looks like some of the
    text on your content are running off
    the screen. Can somebody else please comment and let me
    know if this is happening to them too?
    This could be a problem with my
    internet browser because I’ve had this happen previously.

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *