GitHub - NeDS-Lab/TPDS-MJQM-Simulator

Abstract

Modern data centers feature an extensive array of cores that handle quite a diverse range of jobs. Recent traces, shared by leading cloud data center enterprises like Google and Alibaba, reveal that the constant increase in data center services and computational power is accompanied by a growing variability in service demand requirements. The number of cores needed for a job can vary widely, ranging from one to several thousand, and the number of seconds a core is held by a job can span more than five orders of magnitude. In this context of extreme variability, the policies governing the allocation of cores to jobs play a crucial role in the performance of data centers. It is widely acknowledged that the First-In First-Out (FIFO) policy tends to underutilize available computing capacity due to the varying magnitudes of core requests. However, the impact of the extreme variability in service demands on job waiting and response times, which has been deeply investigated in traditional queuing models, is not as well understood in the case of data centers, as we will show. To address this issue, we investigate the dynamics of a data center cluster through analytical models in simple cases, and discrete event simulations based on real data.

The figures present in the paper are generated from simulation results, which were performed with different parameters and service time distributions. The results were then parsed and plotted into figures using the Matplotlib library in Python.

Requirements

Software requirements

C++14/17/20
Python 3

Hardware requirements

We don't specify the hardware requirements for our simulations. But to give an estimation, we run our simulations to get the results published in the paper on a cluster node composed of 20 core Intel(R) Xeon(R) Gold 6148 CPU @ 2.40 GHz, 200 GB of ECC RAM. Storage is on a 30 TB NAS, and everything is hosted on a Nutanix hyperconvergent architecture. For the simulation involving bounded pareto service time distribution, we run 100 million events with 60 independent runs. While for other service time distributions, we run 30 million events with 40 independent runs. The independent runs are done to compute the confidence interval for each metric we captured. Each independent runs take approximately 3 minutes. It is possible to change the number of events and/or independent runs by modifying the given .sh scripts, but results may become less reliable.

Experiment workflow

Option 1: Only do the plotting from pre-saved simulation results

cd Only_Plotting
chmod +x script-figures.sh
./script-figures.sh

The plots will be generated inside folder Figures

Option 2: Do all the simulations first, then do the plotting

cd Sim_and_Plotting
chmod +x script-figures.sh res_Fig3a.sh res_others.sh generate_all.sh
./generate_all.sh

The simulation results will be generated inside foler Results, while the plots will be generated inside folder Figures

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Only_Plotting		Only_Plotting
Sim_and_Plotting		Sim_and_Plotting
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstract

Requirements

Software requirements

Hardware requirements

Experiment workflow

Option 1: Only do the plotting from pre-saved simulation results

Option 2: Do all the simulations first, then do the plotting

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Abstract

Requirements

Software requirements

Hardware requirements

Experiment workflow

Option 1: Only do the plotting from pre-saved simulation results

Option 2: Do all the simulations first, then do the plotting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages