BDEv is a tool to evaluate Big Data processing solutions in terms of performance and resource efficiency. It includes several ready-to-use frameworks (e.g. Hadoop, Spark, Flink) and manages the configuration needed to leverage the available computational resources, like CPU, memory and network interfaces. The evaluation of these frameworks can be done by using different benchmarks (e.g. TeraSort, WordCount) included in the BDEv distribution, while also enabling the execution of custom commands. Moreover, BDEv eases the execution of experiments and the task of recovering results by providing automatically generated graphs.
MarDRe is a de novo MapReduce-based parallel tool to remove duplicate and near-duplicate DNA reads in large scale FASTQ/FASTA datasets. Duplicate reads can be seen as identical or nearly identical sequences with some mismatches, so removing them decreases memory requirements and computational time of downstream analysis, without damaging biological information. MarDRe is written in Java and built upon Apache Hadoop.
BDWatchdog is a framework to assist in tasks of in-depth and real-time analysis of the execution of Big Data frameworks and applications. BDWatchdog supports per-process resource monitoring using timeseries and mixed system and JVM profiling using flame graphs. Focusing on applications rather than on hosts, it allows to identify both resources and code bottlenecks. BDWatchdog has been succesfully tested on both Docker containers and virtual machines.