Hadoop is nothing but an open source tool used to extract informations and perform computations from this big data. It can be used effectively to run applications. Hadoop uses a technology known as MapReduce to process data across data across multiple storages, even if the system has thousands of nodes. Al though Hadoop helps to overcome many of the challenges we face while processing big data it has several limitations. Without weighing the pros and cons, you cannot make the right decision about Hadoop.
In this article, let's look at the 6 Advantages and Disadvantages of Hadoop | Limitations & Benefits of Hadoop. Though this post, you will know the pros and cons of using Hadoop.
Let's get started,
Advantages of Hadoop
1. Performance
Hadoop with its distributed file system is able to process data at a higher speed. Compared to other traditional database management systems the rate is much faster. The usage of distributed file system allows Hadoop to break down large size files into smaller blocks.
These blocks are stored inside Hadoop cluster so that it can be processed parallelly. As a result, the performance is generally higher. Within few minutes, Hadoop is able to process terabytes of data.
2. Cost Effective
The storage solution used in Hadoop is much more cost effective. If you are using a traditional Relational Database System and try to store large set of data, you need to spend more to scale up the infrastructure.
Thus, for reducing the expense this approach requires you to delete old data time to time. Meanwhile in Hadoop, the entire raw data is stored.
Therefore, companies can still refer to this sample data in the future if they wanted to make important business decisions.
3. Availability
Hadoop 2.X support both single active NameNode and Standby NameNode. Similarly in Hadoop 3.0, it contains more than one standby NameNode.
The purpose of these NameNodes is to make the system highly available. Even if a NameNode crashes or stops functioning, the other NameNode will continue the job.
4. Scalability
Hadoop is highly scalable through the use of clusters. i.e. if there is a requirement to expand the cluster, new nodes can be added without hindering the system. This approach is known as Horizontal Scaling. It entirely differs from traditional way to installing more components like CPU, RAM and Hard disk.
5. Flexibility
The design of Hadoop allows it to gather information in the form of both structured and unstructured data. Whatever the data in can be, MySql, XML, JSON, Images, Videos etc... Hadoop can store all of them inside HDFS.
Regardless of the data type, Hadoop can be used to process them. This type of flexibility is important for organizations who need to process large amount of data sets such as Social Media, Click stream data and Email Conversions.
6. Compatibility
Hadoop can be used as a storage system for other frameworks like Spark and Flink. Their processing engines are compatible with Hadoop. The list even expands with file systems like Azure Storage, FTP file systems and Amazon S3. So you can start to combine HDFS with their processing engines.
Disadvantages of Hadoop
1. Security
Organizations handling sensitive data must implement appropriate security measures. Since Hadoop disables security measures, all your data could be at risk.
Similar to other frameworks, Hadoop too uses the native JavaScript language which on default is targeted by many cybercriminals. Therefore, before start using Hadoop the data analytics team needs to implement some preventive measures.
2. Learning Curve
The language most developers are familiar is the SQL. But Hadoop completely relies in Java instead of SQL. For those developers and data analysts who are willing to program with Hadoop needs detailed understanding about the Java language.
In addition to that, they must have knowledge on MapReduce to acquire capabilities of Hadoop completely.
3. Data Processing
Hadoop relies on MapReduce. So it is able to support batch processing only. If there is a large file, it takes necessary amount of input and process them using the predefined instructions.
The problem with this method is that the output produced is with high latency. So there is a chance for the output to be delayed.
4. Small Data Issue
For storing datasets, Hadoop uses data block which is not more than 128MB. Since the file size is too small it cannot handle large number of files, even if the file size is small.
If you try to store these large number of files, the NameNode will get overloaded. Eventually, Hadoop will stop functioning.
5. Processing Overhead
The entire read or write operations in Hadoop is performed using the disk. Hadoop is not efficient in carrying out read or write operations since the data size is too large. This creates a problem in-memory calculations which can be one of the reasons for processing overhead.
6. Data Storage
As mentioned earlier, Hadoop comprises data security in many aspects. Therefore, the development team needs to be extra precautious in storing confidential and crucial data. If not properly handled, there is high possibilities for losing these sensitive information.
No comments:
Post a Comment