- Stream Processing with BigData: SSS-MapReduce
[abst]
-
Hidemoto Nakada, Hirotaka Ogawa, Tomohiro Kudoh
, IEEE CloudCom 2012 (poster)
, 2012
We propose a MapReduce based stream processing system, called SSS, which is capable of processing stream along with large scale static data. Unlike the existing stream processing systems that can work only on the relatively small on-memory data-set, SSS can process incoming streamed data consulting the stored data. SSS processes streamed data with continuous Mappers and Reducers, that are periodically invoked by the system. It also supports merge operation on two set of data, which enables stream data processing with large static data.
- Stream Processing with Bigdata by SSS-MapReduce
[PDF]
[abst]
-
Hidemoto Nakada, Hirotaka Ogawa, Tomohiro Kudoh
, IEEE International Conference on eScience (eScience 2012) (poster)
, 2012
We propose a MapReduce based stream processing system, called SSS, which is capable of processing stream along with large scale static data. Unlike the existing stream processing systems that can work only on the relatively small on-memory data-set, SSS can process incoming streamed data consulting the stored data. SSS processes streamed data with continuous Mappers and Reducers, that are periodically invoked by the system. It also supports merge operation on two set of data, which enables stream data processing with large static data.
- An Implementation of Sawzall on Hadoop
[PDF]
[Slides]
[abst]
-
Hidemoto Nakada, Tatsuhiko Inoue, Tomohiro Kudoh
, CUTE 2011(The 6th International Conference on Ubiquitous Information Technologies & Applications)
, 2011
Sawzall is a script language designed for batch processing of large amount of data, based on MapReduce parallel execution model, which is introduced by Google in 2006. Sawzall allows programmers only to program {\it mappers} to ease the burden for them. Sawzall provides a set of built-in {\it aggregators} that provides reducing function, from which programmers could pick and use. We have implemented a Sawzall compiler and runtime, called \scns, which allows Sawzall scripts to run in parallel on Hadoop. We employed Scala language to leverage Scala's parser combinator libraries for Sawzall syntax parsing. It enabled easy implementation of parser and potential future extension of the language. This paper provides detailed implementation of the system. We performed evaluation on the system comparing with the Java programs that use native Hadoop API and szl, a Sawzall open source implementation from Google. We confirmed that overhead imposed by \sc is small enough, and the execution speed is comparable with szl.
- Implementation and evaluation of SSS: a Key-Value Store based MapReduce Framework
[abst]
-
Hidemoto Nakada, Hirotaka Ogawa, Tomohiro Kudoh
, CloudCom 2011 poster
, 2011
Design and Implementation of a MapReduce framework SSS are described. MapReduce is considered to be a promising parallel programming model for broad range of applications. For that purpose, a flexible MapReduce framework is required that enables programmers to easily combine Mappers and Reducers into workflows that may involve iterations. Hadoop, the most widely used MapReduce framework, is not flexible enough, however. Iteration overhead of Hadoop is too big to perform fine-grained iterations. A job in Hadoop always composed of one Mapper and one Reducer, limiting the shape of workflows. We propose a MapReduce framework based on distributed KVS, called SSS. In SSS, Mappers and Reducers have the same data access pattern. making possible to have flexible combination of Mappers and Reducers. Furthermore, SSS employs Owner Computes Rule which enables faster iteration. Here, we provide detailed design and implementation of SSS. We also demonstrate the performance of SSS using K-means clustering application.
- SSS:a MapReduce Framework based on Distributed Key-value Store
[PDF]
[abst]
-
Hidemoto Nakada, Hirotaka Ogawa, Tomohiro Kudoh
, SC11 Poster
, 2011
MapReduce has been very successful in implementing large-scale data-intensive applications. Because of its simple programming model, MapReduce has also begun being utilized as a programming tool for more general distributed and parallel HPC applications. However, its applicability is often limited due to relatively inefficient runtime performance and hence insufficient support for flexible workflows. In particular, the performance problem is not negligible in iterative MapReduce applications. We implemented new MapReduce framework SSS based on distributed key-value store, that supports flexible workflows. Mappers and reducers read key-values only from its local storage enjoying high throughput and low latency. We evaluated SSS comparing with Hadoop using synthetic benchmark and real application. The result showed that SSS is significantly faster than Hadoop, especially for shuffle-intensive jobs and iterative jobs.
- Implementing a Key-Value Store based MapReduce Framework
[PDF]
[abst]
-
Hirotaka Ogawa, Hidemoto Nakada, Tomohiro Kudoh
, Proceedings of 9th USENIX Conference on File and Storage Technologies
, 2011
MapReduce has been very successful in implementing large-scale data-intensive applications. Because of its simple programming model, MapReduce has also begun being utilized as a programming tool for more general distributed and parallel applications. However, its applicability is often limited due to relatively inefficient runtime performance and hence insufficient support for flexible workflows. In particular, the performance problem is not negligible in iterative MapReduce applications. In order to resolve such situations, we have been developing a new MapReduce prototype system called “SSS”, which is based on distributed key-value store (KVS). In this poster, we present the design and implementation of SSS and the tentative benchmark results.
- SSS: An Implementation of Key-value Store based MapReduce Framework
[PDF]
[abst]
-
Hirotaka Ogawa, Hidemoto Nakada, Ryousei Takano, Tomohiro Kudoh
, Proceedings of 2nd International Conference on Cloud Computing Technology and Science
, pp. 745-761
, 2010
MapReduce has been very successful in implementing large-scale data-intensive applications. Because of its simple programming model, MapReduce has also begun being utilized as a programming tool for more general distributed and parallel applications, e.g., HPC applications. However, its applicability is limited due to relatively inefficient runtime performance and hence insufficient support for flexible workflow. In particular, the performance problem is not negligible in iterative MapReduce applications. On the other hand, today, HPC community is going to be able to utilize very fast and energy-efficient Solid State Drives (SSDs) with 10 Gbit/sec-class read/write performance. This fact leads us to the possibility to develop ``High-Performance MapReduce'', so called. From this perspective, we have been developing a new MapReduce framework called ``SSS'' based on distributed key-value store (KVS). In this paper, we first discuss the limitations of existing MapReduce implementations and present the design and implementation of SSS. Although our implementation of SSS is still in a prototype stage, we conduct two benchmarks for comparing the performance of SSS and Hadoop. The results indicate that SSS performs 1-10 times faster than Hadoop.