摘要:
We propose a variation of round-robin ordering in an multi-threaded pipeline to increase system throughput and resource distribution fairness. We show that using round robin with a typical arbitrary ordering results in inefficient use of shared resources and subsequent thread starvation. To address this but still use a simple round-robin approach, we optimally and dynamically sort the order of the round robin periodically at runtime. We show that with 4-threaded workloads, throughput can be improved by over 9% and harmonic throughput by over 3% by sorting thread order at run time. We experiment with multiple stages of the pipeline and show consistent results throughout several experiments using the SPEC CPU 2006 benchmarks. Furthermore, since the technique is still a simple round robin, the increased performance requires little overhead to implement.
摘要:
The vector fitting method has been largely used to generate accurate and passive time-domain macromodels for circuits or structures characterized by frequency-domain data such as scattering parameters. Nevertheless, very few approaches to determine the vector fitting order were discussed in the literature. In this paper, an iterative method to automatically identify the number of vector fitting poles is proposed. It can achieve acceptable performance and accuracy by capitalizing on the fact that the number of vector fitting poles lies in the range that spans the number of resonance peaks in the model and double that number as deduced from the empirical data that was acquired by experimentation on different types of models. Therefore, the algorithm's awareness of the frequency response of the model and confining the search for an acceptable order within the aforementioned range proved to be key factors to enable it to mostly outperform its MATLAB counterpart as shown in the experimental results. Furthermore, a multithreaded implementation of the proposed method is introduced to distribute the vector fitting passes on multicore processors in order to obtain a better speedup factor. The experiments that demonstrate the positive effect of the multithreaded version on performance are also presented.
摘要:
We present elPrep 4, a reimplementation from scratch of the elPrep framework for processing sequence alignment map files in the Go programming language. elPrep 4 includes multiple new features allowing us to process all of the preparation steps defined by the GATK Best Practice pipelines for variant calling. This includes new and improved functionality for sorting, (optical) duplicate marking, base quality score recalibration, BED and VCF parsing, and various filtering options. The implementations of these options in elPrep 4 faithfully reproduce the outcomes of their counterparts in GATK 4, SAMtools, and Picard, even though the underlying algorithms are redesigned to take advantage of elPrep's parallel execution framework to vastly improve the runtime and resource use compared to these tools. Our benchmarks show that elPrep executes the preparation steps of the GATK Best Practices up to 13x faster on WES data, and up to 7.4x faster for WGS data compared to running the same pipeline with GATK 4, while utilizing fewer compute resources.
摘要:
The requisites at the Large Hadron Collider (LHC) at CERN have driven silicon tracking detectors to the fringe of the modern technology. The next upgrade of the LHC to a 10 times increased luminosity of 7.5x10(34) cm(-2) s(-1) will require semiconductor detectors with substantially improved radiation hardness. CERN-RD50 collaboration mandate is to provide detector technologies, which are able to operate in such an environment. Within this context, this paper describes the approaches and first results of a C++11 multi-threading software based on the Shockley-Ramo's theorem to simulate non-irradiated and irradiated silicon micro-strips and pad detectors of complex geometries in order to understand signal formation and charge collection efficiencies of arbitrary charge distributions.