Applied Computational Mathematics Division


Up

Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis

Afzal Godil
Information Access Division (IAD), ITL, NIST

Thursday, July 21, 2016 15:00-16:00,
Building 101, Portrait Room
Gaithersburg
Thursday, July 21, 2016 13:00-14:00,
Room 4072
Boulder

Abstract:

In this talk, I will give an introduction to Apache Hadoop and Spark for developing applications for processing large amounts of data on cluster-computer systems. I will discuss the components of Hadoop, HDFS, MapReduce and YARN. Then, I will discuss how to setup a Hadoop cluster and the steps for developing applications with it. I will then discuss the capabilities of Spark and how it compares with a typical MapReduce solution. Finally, I will discuss a few Spark use cases for data analysis.

Speaker Bio: Afzal Godil is a researcher in the Information Technology Laboratory at National Institute of Standards and Technology (NIST) where he has been for over 19 years. Prior to that, he has worked at the NASA Langley and Lewis Research Centers as a contractor. His main focus in research and development is in the area of shape analysis and retrieval, computer vision, computational methods, graphics/visualization, digital human modeling and machine learning. He has published over 100 papers in conferences, reports and journals. He was also a principle investigator of the Shape Metrology IMS and the SHARP project.

Presentation Slides: PPT

Contact: W. Griffin

Note: Visitors from outside NIST must contact Cathy Graham; (301) 975-3800; at least 24 hours in advance.