Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisAfzal Godil
Information Access Division (IAD), ITL, NIST
Thursday, July 21, 2016 15:00-16:00,
In this talk, I will give an introduction to Apache Hadoop and Spark for developing applications for processing large amounts of data on cluster-computer systems. I will discuss the components of Hadoop, HDFS, MapReduce and YARN. Then, I will discuss how to setup a Hadoop cluster and the steps for developing applications with it. I will then discuss the capabilities of Spark and how it compares with a typical MapReduce solution. Finally, I will discuss a few Spark use cases for data analysis.
Speaker Bio: Afzal Godil is a researcher in the Information Technology Laboratory at National Institute of Standards and Technology (NIST) where he has been for over 19 years. Prior to that, he has worked at the NASA Langley and Lewis Research Centers as a contractor. His main focus in research and development is in the area of shape analysis and retrieval, computer vision, computational methods, graphics/visualization, digital human modeling and machine learning. He has published over 100 papers in conferences, reports and journals. He was also a principle investigator of the Shape Metrology IMS and the SHARP project.
Contact: W. Griffin
Note: Visitors from outside NIST must contact Cathy Graham; (301) 975-3800; at least 24 hours in advance.