Here's an example with n = 800, k = 300. The times are not huge.
Original Message:
Sent: 09-02-2024 11:26
From: Virginia Rovnyak
Subject: A samping question
The question comes from a physical situation, where values of k would be in the 100-300 range, out of n=500-1,000. An asymptotic formula would be the best. The sampling will be done sequentially.
------------------------------
Virginia Rovnyak
Original Message:
Sent: 09-01-2024 02:47
From: Jim Baldwin
Subject: A samping question
Note that the exact inclusion probabilities for Anthony Warrack 's R code when k=2 can be obtained using his notation with
w*(1 - w/(1 - w) + sum(w/(1 - w)))
Here is some Mathematica code to obtain exact inclusion probabilities for other values of k. (Note that with even moderate values of k and n the calculations might take longer that the age of the earth.)
(* Function to calculate probability of each sample *)
pr[indices_, p_] := Module[{remaining, prob},
remaining = 1 - p[[indices[[1]]]];
prob = p[[indices[[1]]]];
Do[prob = prob*p[[indices[[j]]]]/remaining;
remaining = remaining - p[[indices[[j]]]], {j, 2, Length[indices]}];
prob
]
(* Function to calculate inclusion probabilities *)
inclusionProbabilities[p_, k_] := Module[{perms, allProb, data},
If[k == 2, p (1 - p/(1 - p) + Total[p/(1 - p)]),
perms = Permutations[Range[Length[p]], {k}]; (* Generate all possible samples *)
allProb = pr[#, p] & /@ perms ; (* Get probabilities of every possible sample: these sum to 1 *)
data = Transpose[{perms, allProb}]; (*
Combine samples with associated probability of selection *)
(* Find the probabilities of inclusion for each element *)
Table[Total[Select[data, MemberQ[#[[1]], i] &][[All, 2]]], {i, 1, Length[p]}]]
]
(* Samples of size 3 *)
p = {1/12, 1/12, 2/12, 3/12, 3/12, 2/12};
inclusionProbabilities[p, 3]
(* {8203/27720,8203/27720,1819/3465,1255/1848,1255/1848,1819/3465} *)
inclusionProbabilities[p, 3] // N
(* {0.295924, 0.295924, 0.524964, 0.679113, 0.679113, 0.524964} *)
Here is a slightly larger example:
p = RandomVariate[UniformDistribution[{0, 1}], 20];
p = p/Total[p]
(* {0.0423276, 0.0266268, 0.0823042, 0.0855377, 0.0168512, 0.0535499, 0.068562, 0.0471883, 0.0100306,
0.0546233, 0.0347716, 0.0926354, 0.00674148, 0.0390817, 0.0559113, 0.0806157, 0.0890967, 0.00176155,
0.0600508, 0.0517321} *)
inclusionProbabilities[p, 4]
(* {0.176228, 0.113839, 0.318929, 0.329457, 0.0732076, 0.218636, 0.272497, 0.194821, 0.044056, 0.222596,
0.146647, 0.352041, 0.0297645, 0.163622, 0.227326, 0.313371, 0.340871, 0.00783841, 0.242363, 0.211891} *)
------------------------------
Jim Baldwin
Retired
Original Message:
Sent: 08-31-2024 13:10
From: Anthony Warrack
Subject: A samping question
Virginia, I think the following R program should provide reasonable estimates for the probabilities for each point (note: they do not add to one). Also note that points with the same weights should have the same selection probabilities. This could be achieved by averaging probabilities for points with the same weights
####################
n <- 6 ; k <- 2 # choose n and k
x <- 1:n # number the points from 1 to n
w <- c(1/12,1/12,2/12,3/12,3/12,2/12) # given weights, assumed to be sampling probabilities
nsim <- 10000 # choose number of simulations
M <- matrix(nrow = nsim,ncol = k, byrow = TRUE) # M is nsim by k, lists points selected for each simulation
for(i in 1:nsim){
y <- sample(x,k,w,replace = F)
M[i,] <- y
}
MT <- table(M)
MT/nsim #gives the estimated probability for each point to be selected
###################
------------------------------
[Giles] [Warrack]
[Retired]
[North Carolina A&T State University]
Original Message:
Sent: 08-29-2024 10:26
From: Paul Auclair
Subject: A samping question
Virginia, here's a reply from Perplexity.AI that describes the problem, outlines a solution, and provides some references.
https://www.perplexity.ai/search/a-colleague-is-interested-in-t-uckvx4wMSrKXKnhIyD3blw
------------------------------
Paul Auclair
Corporate Operations Research Analyst
LinQuest Corporation
Original Message:
Sent: 08-28-2024 22:12
From: Virginia Rovnyak
Subject: A samping question
A colleague is interested in the following situation. There is a finite set of n points, each with a certain weight. A weighted random sample of k points is drawn without replacement. What is the probability that a given x will be in this sample?
Is this a known problem? Any information or references would be appreciated.
------------------------------
Virginia Rovnyak
------------------------------