Hadoop is an open-source, Java-based programming framework that supports the processing and storage of extremely large datasets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.
Hadoop makes it possible to run applications on systems with thousands of commodity hardware nodes and handle thousands of terabytes of data. Its distributed file system facilitates fast data transfer speeds between nodes and allows the system to continue to function in the event of a node failure. This approach reduces the risk of catastrophic system failure and unexpected data loss, even if a significant number of nodes become inoperative. Consequently, Hadoop quickly emerged as a foundation for large-scale data processing tasks, such as scientific analysis, business planning and sales, and processing of huge volumes of sensor data, even from the internet of things sensors.
Hadoop was created by computer scientists Doug Cutting and Mike Cafarella in 2006 to support the distribution of Nutch's search engine. It was inspired by Google's MapReduce, a software framework in which an application is divided into numerous small parts. Any of these parts, which are also called fragments or blocks, can be run on any node in the cluster. After years of development within the open source community, Hadoop 1.0 was made available to the public in November 2012 as part of the Apache project sponsored by the Apache Software Foundation.