2023-2024 Academic Catalog

Search Results

IT 533. Big Data Technologies. (3).

More and more organizations are collecting large amounts of data, much of it unstructured. Big data technologies can be used to store, process and analyze large amounts of data using a distributed environment. This course introduces students to the world of big data and associated technologies. The focus of the course is Apache Hadoop, which is an open source software project that enables, distributed processing of large data sets across clusters of commodity servers. The objective of this course is to provide students a foundation for understanding big data technologies and Hadoop in particular. Topics include Hadoop system architecture, Hadoop Distributed File System (HDFS), MapReduce programming model and design patterns and technologies surrounding Hadoop ecosystem such as Pig, Hive and Oozie. The course will also introduce big data science concepts and NoSQL database technologies.