1) Explain how Hadoop is different from other parallel computing solutions.
2) What are the modes Hadoop can run in?
3) What is a NameNode and what is a DataNode?
4) What is Shuffling in MapReduce?
5) What is the functionality of Task Tracker and Job Tracker in Hadoop? How many instances of a Task Tracker and Job Tracker can be run on a single Hadoop Cluster?
6) How does NameNode tackle DataNode failures?
7) What is InputFormat in Hadoop?
8) What is the purpose of RecordReader in Hadoop?
9) What are the points to consider when moving from an Oracle database to Hadoop clusters? How would you decide the correct size and number of nodes in a Hadoop cluster?
10) If you want to analyze 100TB of data, what is the best architecture for that?
11) What is InputSplit in MapReduce?
12)I n Hadoop, if custom partitioner is not defined then, how is data partitioned before it is sent to the reducer?
13) What is replication factor in Hadoop and what is default replication factor level Hadoop comes with?
14) What is SequenceFile in Hadoop and Explain its importance?
15) What is Speculative execution in Hadoop?
16) If you are the user of a MapReduce framework, then what are the configuration parameters you need to specify?
17) How do you benchmark your Hadoop Cluster with Hadoop tools?
18) Explain the difference between ORDER BY and SORT BY in Hive?
19) What is WebDAV in Hadoop?
20) How many Daemon processes run on a Hadoop System?
21) Hadoop attains parallelism by isolating the tasks across various nodes; it is possible for some of the slow nodes to rate-limit the rest of the program and slows down the program. What method Hadoop provides to combat this?
22) How are HDFS blocks replicated?
23) What will a Hadoop job do if developers try to run it with an output directory that is already present?
24) What happens if the number of reducers is 0?
25) What is meant by Map-side and Reduce-side join in Hadoop?
26) How can the NameNode be restarted?
27) When doing a join in Hadoop, you notice that one reducer is running for a very long time. How will address this problem in Pig?
28) How can you debug your Hadoop code?
29) What is distributed cache and what are its benefits?
30) Why would a Hadoop developer develop a Map Reduce by disabling the reduce step?
31) Explain the major difference between an HDFS block and an InputSplit.
32) Are there any problems which can only be solved by MapReduce and cannot be solved by PIG? In which kind of scenarios MR jobs will be more useful than PIG?
33) What is the need for having a password-less SSH in a distributed environment?
34) Give an example scenario on the usage of counters.
35) Does HDFS make block boundaries between records?
36) What is streaming access?
37) What do you mean by “Heartbeat” in HDFS?
38) If there are 10 HDFS blocks to be copied from one machine to another. However, the other machine can copy only 7.5 blocks, is there a possibility for the blocks to be broken down during the time of replication?
39) What is the significance of conf.setMapper class?
40) What are combiners and when are these used in a MapReduce job?
41) Which command is used to do a file system check in HDFS?
42) Explain about the different parameters of the mapper and reducer functions.
43) How can you set random number of mappers and reducers for a Hadoop job?
44) Did you ever built a production process in Hadoop? If yes, what was the process when your Hadoop job fails due to any reason? (Open Ended Question)
45) Explain about the functioning of Master Slave architecture in Hadoop?
46) What is fault tolerance in HDFS?
47) Give some examples of companies that are using Hadoop architecture extensively.
48) How does a DataNode know the location of the NameNode in Hadoop cluster?
49) How can you check whether the NameNode is working or not?
50) Explain about the different types of “writes” in HDFS.
2) What are the modes Hadoop can run in?
3) What is a NameNode and what is a DataNode?
4) What is Shuffling in MapReduce?
5) What is the functionality of Task Tracker and Job Tracker in Hadoop? How many instances of a Task Tracker and Job Tracker can be run on a single Hadoop Cluster?
6) How does NameNode tackle DataNode failures?
7) What is InputFormat in Hadoop?
8) What is the purpose of RecordReader in Hadoop?
9) What are the points to consider when moving from an Oracle database to Hadoop clusters? How would you decide the correct size and number of nodes in a Hadoop cluster?
10) If you want to analyze 100TB of data, what is the best architecture for that?
11) What is InputSplit in MapReduce?
12)I n Hadoop, if custom partitioner is not defined then, how is data partitioned before it is sent to the reducer?
13) What is replication factor in Hadoop and what is default replication factor level Hadoop comes with?
14) What is SequenceFile in Hadoop and Explain its importance?
15) What is Speculative execution in Hadoop?
16) If you are the user of a MapReduce framework, then what are the configuration parameters you need to specify?
17) How do you benchmark your Hadoop Cluster with Hadoop tools?
18) Explain the difference between ORDER BY and SORT BY in Hive?
19) What is WebDAV in Hadoop?
20) How many Daemon processes run on a Hadoop System?
21) Hadoop attains parallelism by isolating the tasks across various nodes; it is possible for some of the slow nodes to rate-limit the rest of the program and slows down the program. What method Hadoop provides to combat this?
22) How are HDFS blocks replicated?
23) What will a Hadoop job do if developers try to run it with an output directory that is already present?
24) What happens if the number of reducers is 0?
25) What is meant by Map-side and Reduce-side join in Hadoop?
26) How can the NameNode be restarted?
27) When doing a join in Hadoop, you notice that one reducer is running for a very long time. How will address this problem in Pig?
28) How can you debug your Hadoop code?
29) What is distributed cache and what are its benefits?
30) Why would a Hadoop developer develop a Map Reduce by disabling the reduce step?
31) Explain the major difference between an HDFS block and an InputSplit.
32) Are there any problems which can only be solved by MapReduce and cannot be solved by PIG? In which kind of scenarios MR jobs will be more useful than PIG?
33) What is the need for having a password-less SSH in a distributed environment?
34) Give an example scenario on the usage of counters.
35) Does HDFS make block boundaries between records?
36) What is streaming access?
37) What do you mean by “Heartbeat” in HDFS?
38) If there are 10 HDFS blocks to be copied from one machine to another. However, the other machine can copy only 7.5 blocks, is there a possibility for the blocks to be broken down during the time of replication?
39) What is the significance of conf.setMapper class?
40) What are combiners and when are these used in a MapReduce job?
41) Which command is used to do a file system check in HDFS?
42) Explain about the different parameters of the mapper and reducer functions.
43) How can you set random number of mappers and reducers for a Hadoop job?
44) Did you ever built a production process in Hadoop? If yes, what was the process when your Hadoop job fails due to any reason? (Open Ended Question)
45) Explain about the functioning of Master Slave architecture in Hadoop?
46) What is fault tolerance in HDFS?
47) Give some examples of companies that are using Hadoop architecture extensively.
48) How does a DataNode know the location of the NameNode in Hadoop cluster?
49) How can you check whether the NameNode is working or not?
50) Explain about the different types of “writes” in HDFS.
Thanks for sharing such details about big data and hadoop. Big Data Hadoop Online Course India
ReplyDelete