Blog Viewer

SSPA Blog: Random or Not? With Clients, Communication Is Key

  

As statistical programmers, we are sometimes asked to implement ideas
formulated by someone else. Sometimes that person making the request is not a statistician. In those cases, communication is key.

Last week I was asked to "generate 20 points in a rectangle that are randomly distributed in a rectangle." No problem. I knew from previous conversations that  my client wanted a random uniform distribution, so in a few minutes I had sent her a SAS DATA step and a graph:

/* generate 20 points in the rectangle [0,2] x [0,1] */
data random;
call streaminit(12345);
do i = 1 to 20;
   x = 2*ranuni(1); /* x is in interval [0,2] */
   y =   ranuni(1); /* y is in interval [0,1] */
   output;
end;

proc sgplot data=random;
scatter x=x y=y;
run;

I thought I was done. However, a few minutes later I got an email complaining about the spacing of the points. "The points aren't spread out enough," my client complained. "There's a big hole on the left side, and the points in the upper right corner are too close together. Can you spread them out a little?"

At first I wasn't sure how to respond. The plan calls for random points, but now she is asking me to manual adjust some values?  By definition, they would no longer be random!  And why is she complaining about the "big hole"? Surely she knows that "uniformly distributed" does not mean "evenly spaced"?

Or does she? My client is not a statistician. Do I clearly understand what she wants, or have I only heard what she said?  I remembered advice I had gotten from a professor in graduate school: sometimes clients ask for one thing, but they really want something else.

I went back to my client and asked her to re-describe what she wants. Through our discussion, I was able to determine that what she really wants are points that are roughly equidistant from each other, but that she didn't want them on a grid. She wants them "spaced out" but "random."

Since there is no "spaced-out-and-random" distribution in any of my textbooks, I needed to invent one. Based on her description, I started with points on a uniform 5x4 grid, but randomly perturbed those points off the grid. I left the size of the perturbation as a parameter in the problem, as shown in the following DATA step:

data random2;
drop dx dy              /* grid spacing parameters */
     delta;             /* parameter: size of perturbation */
dx = 2/5; dy = 1/4; delta = 0.1;
do i = 1 to 5;          /* five points in x direction */
   x0 = i*dx - dx/2;    /* evenly spaced in [0,2] */
   do j = 1 to 4;       /* four points in y direction */
      y0 = j*dy - dy/2; /* evenly spaced in [0,1] */
      x = x0 + delta*(2*ranuni(1)-1); /* perturb at most +/- delta */
      y = y0 + delta*(2*ranuni(1)-1); /* perturb up or down */
      output;
   end;
end;

proc sgplot data=random2;
scatter x=x y=y;
yaxis min=0 max=1;
xaxis min=0 max=2;
run;

The result? The client loves the new arrangement: "Awesome! I like [this one] better because there aren't any big holes."

It may not be what she asked for, but it is what she wants.

0 comments
2 views

Permalink

Tag