How do i generate a histogram for a given probability distribution (for functional testing a server)?

I am trying to automate functional testing of a server using a realistic frequency distribution of requests. (sort of load testing, sort of simulation)

I've chosen the Weibull distribution as it "sort of" matches the distribution I've observed (ramps up quickly, drops off quickly but not instantly)

I use this distribution to generate the number of requests that should be sent each day between a given start and end date

I've hacked together an algorithm in Python that sort of works but it feels kludgy:

how_many_days = (end_date - start_date).days
freqs = defaultdict(int)
for x in xrange(how_many_responses):
    freqs[int(how_many_days * weibullvariate(0.5, 2))] += 1
timeline = []
day = start_date
for i,freq in sorted(freqs.iteritems()):
    timeline.append((day, freq))
    day += timedelta(days=1)
return timeline

What better ways are there to do this?


Asked by: John149 | Posted: 28-01-2022






Answer 1

This is quick and probably not that accurate, but if you calculate the PDF yourself, then at least you make it easier to lay several smaller/larger ones on a single timeline. dev is the std deviation in the Guassian noise, which controls the roughness. Note that this is not the 'right' way to generate what you want, but it's easy.

import math
from datetime import datetime, timedelta, date
from random import gauss

how_many_responses = 1000
start_date = date(2008, 5, 1)
end_date = date(2008, 6, 1)
num_days = (end_date - start_date).days + 1
timeline = [start_date + timedelta(i) for i in xrange(num_days)]

def weibull(x, k, l):
    return (k / l) * (x / l)**(k-1) * math.e**(-(x/l)**k)

dev = 0.1
samples = [i * 1.25/(num_days-1) for i in range(num_days)]
probs = [weibull(i, 2, 0.5) for i in samples]
noise = [gauss(0, dev) for i in samples]
simdata = [max(0., e + n) for (e, n) in zip(probs, noise)]
events = [int(p * (how_many_responses / sum(probs))) for p in simdata]

histogram = zip(timeline, events)

print '\n'.join((d.strftime('%Y-%m-%d ') + "*" * c) for d,c in histogram)

Answered by: Rafael855 | Posted: 01-03-2022



Answer 2

Why don't you try The Grinder 3 to load test your server, it comes with all this and more prebuilt, and it supports python as a scripting language

Answered by: Lucas408 | Posted: 01-03-2022



Answer 3

Slightly longer but probably more readable rework of your last four lines:

samples = [0 for i in xrange(how_many_days + 1)]
for s in xrange(how_many_responses):
    samples[min(int(how_many_days * weibullvariate(0.5, 2)), how_many_days)] += 1
histogram = zip(timeline, samples)
print '\n'.join((d.strftime('%Y-%m-%d ') + "*" * c) for d,c in histogram)

This always drops the samples within the date range, but you get a corresponding bump at the end of the timeline from all of the samples that are above the [0, 1] range.

Answered by: Ted482 | Posted: 01-03-2022



Answer 4

Instead of giving the number of requests as a fixed value, why not use a scaling factor instead? At the moment, you're treating requests as a limited quantity, and randomising the days on which those requests fall. It would seem more reasonable to treat your requests-per-day as independent.

from datetime import *
from random import *

timeline = []
scaling = 10
start_date = date(2008, 5, 1)
end_date = date(2008, 6, 1)

num_days = (end_date - start_date).days + 1
days = [start_date + timedelta(i) for i in range(num_days)]
requests = [int(scaling * weibullvariate(0.5, 2)) for i in range(num_days)]
timeline = zip(days, requests)
timeline

Answered by: Kellan935 | Posted: 01-03-2022



Answer 5

I rewrote the code above to be shorter (but maybe it's too obfuscated now?)

timeline = (start_date + timedelta(days=days) for days in count(0))
how_many_days = (end_date - start_date).days
pick_a_day = lambda _:int(how_many_days * weibullvariate(0.5, 2))
days = sorted(imap(pick_a_day, xrange(how_many_responses)))
histogram = zip(timeline, (len(list(responses)) for day, responses in groupby(days)))
print '\n'.join((d.strftime('%Y-%m-%d ') + "*" * c) for d,c in histogram)

Answered by: Fiona719 | Posted: 01-03-2022



Answer 6

Another solution is to use Rpy, which puts all of the power of R (including lots of tools for distributions), easily into Python.

Answered by: Emily163 | Posted: 01-03-2022



Similar questions

algorithm - Probability distribution in Python

I have a bunch of keys that each have an unlikeliness variable. I want to randomly choose one of these keys, yet I want it to be more unlikely for unlikely (key, values) to be chosen than a less unlikely (a more likely) object. I am wondering if you would have any suggestions, preferably an existing python module that I could use, else I will need to make it myself. I have checked out the random module; it does not...


Python, SimPy: How to generate a value from a triangular probability distribution?

I want to run a simulation that uses as parameter a value generated from a triangular probability distribution with lower limit A, mode B and and upper limit C. How can I generate this value in Python? Is there something as simple as expovariate(lambda) (from random) for this distribution or do I have to code this thing?


python - How to run statistics Cumulative Distribution Function and Probability Density Function using SciPy?

I am new to Python and new to SciPy libraries. I wanted to take some ques from the experts here on the list before dive into SciPy world. I was wondering if some one could provide a rough guide about how to run two stats functions: Cumulative Distribution Function (CDF) and Probability Distribution Function (PDF). My use case is the following: I have a sampleSpaceList [] which have 1000 floating point value...


python - Pylab, draw a probability distribution

im using pylab for scientific computing purposes. I ve made a histogram with the hist method. But im also interested in painting a probability distribution. Does someone know of that?


numpy - Probability distribution function in Python

I know how to create an histogram in Python, but I would like that it is the probability density distribution. Let's start with my example. I have an array d, with a size of 500000 elements. With the following code I am building a simple histogram telling me how many elements of my array d are between every bin. max_val=log10(max(d)) min_val=log10(min(d)) logspace = np.log...


python - how to calculate the estimated joint probability distribution for the red and green pixels only in a png image?

I have a png image that contains the red and green channels only. I removed the blue channel from the image for calculation purposes. I need to calculate the estimated joint probability distribution for these pixels. I came across this function: numpy.random.multivariate_normal(mean, cov[, size]) but this one computes the known distribution. I need to calculate the estimated distribution. Any suggestions? Thanks a lot...


python - How to calculate probability in a normal distribution given mean & standard deviation?

How to calculate probability in normal distribution given mean, std in Python? I can always explicitly code my own function according to the definition like the OP in this question did: Calculating Probability of a Random Variable in a Distribution in Python Just wondering if there is a ...


Getting probability distribution in Python

I have the following lines of data in the file (of course much more lines): data1 0.20 data2 2.32 data3 0.02 dataX x.xx data1 1.13 data2 3.10 data3 0.96 dataX x.xx .... I'd like to create probability distribution for each data*. I can do that by hand but maybe there is a library which let me do that more automatically. Ideally I would like to avoid preformatting lines (and feed the libra...


python - Making a custom probability distribution to draw random samples from in SciPy

This question already has answers here:


python - How to pick a random choice using a custom probability distribution

I have a list of US names and their respective names from the US census website. I would like to generate a random name from this list using the given probability. The data is here: US Census data I have seen algorithms like the roulet...


algorithm - Probability distribution in Python

I have a bunch of keys that each have an unlikeliness variable. I want to randomly choose one of these keys, yet I want it to be more unlikely for unlikely (key, values) to be chosen than a less unlikely (a more likely) object. I am wondering if you would have any suggestions, preferably an existing python module that I could use, else I will need to make it myself. I have checked out the random module; it does not...


licensing - Including Python standard libraries in your distribution

Closed. This question does not meet Stack Overflow guid...


Is there any linux distribution that comes with python 2.6 yet?


Bimodal distribution in C or Python

What's the easiest way to generate random values according to a bimodal distribution in C or Python? I could implement something like the Ziggurat algorithm or a Box-Muller transform, but if there's a ready-to-use library, or a simpler algorithm I don't know about, that'd be better.


python - How to calculate cumulative normal distribution?

I am looking for a function in Numpy or Scipy (or any rigorous Python library) that will give me the cumulative normal distribution function in Python.


Python, SimPy: How to generate a value from a triangular probability distribution?

I want to run a simulation that uses as parameter a value generated from a triangular probability distribution with lower limit A, mode B and and upper limit C. How can I generate this value in Python? Is there something as simple as expovariate(lambda) (from random) for this distribution or do I have to code this thing?


What is the best approach for creating an agent framework in python for flexible script distribution and data collection

What I am trying to do: I have hundreds of servers with very large log files spread out at dozens of different clients. I am creating nice python scripts to parse the logs in different ways and would like to aggregate the data I am collecting from all of the different servers. I would also like to keep the changing scripts centralized. The idea is to have a harness that can connect to each of the servers, scp the scrip...


python - How to run statistics Cumulative Distribution Function and Probability Density Function using SciPy?

I am new to Python and new to SciPy libraries. I wanted to take some ques from the experts here on the list before dive into SciPy world. I was wondering if some one could provide a rough guide about how to run two stats functions: Cumulative Distribution Function (CDF) and Probability Distribution Function (PDF). My use case is the following: I have a sampleSpaceList [] which have 1000 floating point value...


Why are all the tk examples in a Python distribution written in TCL?

Now don't get me wrong, I'm not exactly a Python fan, but when you see a Tk directory inside of the python directory you kinda expect... Well Python. And yeah, I get that Tk came from TCL, but if I had to write a TCL to use Tk, I'd forget TK existed and use a completely different tool box. (The popularity of this combination completely eludes me.) Expecting to see a relatively readable language like Python, and f...


python - Fitting a bimodal distribution to a set of values

Given a 1D array of values, what is the simplest way to figure out what the best fit bimodal distribution to it is, where each 'mode' is a normal distribution? Or in other words, how can you find the combination of two normal distributions that bests reproduces the 1D array of values? Specifically, I'm interested in implementing this in python, but answers don't have to be language specific. Thanks!






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top