I invite you to compare those smoothers to the appended
R solution that implements the optimization ideas I have tried to express previously.
This solution finds a final floor (a planar surface) that minimizes the sum of three costs. I chose to find a planar surface because it constitutes a simple easily-assessed target to achieve in the field. This seems a far more practicable approach than attempting to achieve any curved or locally smoothed solution would be. Without extensive and painstaking additional measurements as the work progresses, it would be almost impossible to determine when a nonplanar solution has actually been attained. It's far easier to aim for a planar floor initially because deviations from planarity are easily detected and corrected. If one has a LIDAR scanner available to monitor the work, then this solution could be modified to include, say, quadratic or cubic spline terms to allow for some undulation--but that seems like overkill.
The three costs are
1. a cost of leveling any parts of the initial surface down to the floor, which is directly proportional (to first order, anyway) to the amount of material removed.
2. a cost of filling any parts of the initial surface up to the final floor, again directly proportional to the amount of material added; and
3. an optional "global" cost associated with the departure of the floor from being perfectly level. This cost in my implementation is proportional (again to first order) to the square of the dip angle.
They can be measured in time, materials, labor, quality of the final result, or anything else relevant, but of course they must be made commensurable.
Each cost can be controlled by specifying a multiplicative constant. The first two ("local") costs can further be adjusted to allow one to stop short of removing all the required material (or filling all the required material) by a prespecified amount, the "threshold" parameter of function
U().
It is amusing to rerun the solution by varying the relative amounts of the three costs as well as the"threshold" parameter. This can dramatically change which parts of the floor are removed, which parts are filled, and how level the final floor is. This shows--as proof of concept--that it can be important to pay attention to the relative costs of the different kinds of remedial operations available (cutting and filling).
Any solution that does not quantify these cost components could be a good one only by accident. (Even my solution might not be ideal if there are additional costs it does not capture, but that does not change the point that costs must be considered.)
I have included initial code to simulate floor data, enabling quick evaluation of many scenarios if actual data are not available. The executable code needed to perform the optimization comprises only a dozen lines: the rest is function wrappings and display of the results. That should make it fairly easy to modify and maintain.
This approach, incidentally, includes as special cases L1 fits, L2 fits, quantile regression, and Deming regression, thereby exhibiting some of the alternative recommendations as special (but limited) examples of this approach (such as basing the optimization on medians). It does not require regularly spaced data--but for irregular data it should be modified by including a weights vector, one weight per elevation measurement, giving the floor area represented by each measurement. These weights should be applied multiplicatively to the local cost terms.
(In the interests of time the code is quick and dirty, containing no validity or error checks, little post-processing of the solution, and only primitive out-of-the-box visualizations of what is going on. Because it is intended as a (working) illustration, it uses only base packages installed with
R. It is complete enough to show that this approach works well and to demonstrate its flexibility and adaptability to individual needs.)
-------------------------------------------
William Huber
Quantitative Decisions
-------------------------------------------
#
# Generate a nubbly slightly tilted floor covering [0,x.max] X [0,y.max].
# The (x,y,z) measurements will be stored in a dataframe xy.df.
#
dip <- 0.01 # Radians
strike <- 0.5 # Radians, positive relative to (1,0)
sd.floor <- 0.005 # Meters: waviness of floor
se.measurement <- 0.001 # Meters: iid measurement error
x.max <- 5 # Meters along
y.max <- 3 # Meters across
n.x <- 23 # Number of regularly spaced x measurements
n.y <- 16 # Number of regularly spaced y measurements
set.seed(17)
# The (x,y) grid.
d.x <- x.max / n.x; d.y <- y.max / n.y
x.start <- runif(1, 0, d.x); y.start <- runif(1, 0, d.y)
x <- seq(x.start, by=d.x, length.out=n.x)
y <- seq(y.start, by=d.y, length.out=n.y)
xy.df <- expand.grid(X=x, Y=y)
# The wavy component.
z <- matrix(rnorm(n.x*n.y), nrow=n.x, ncol=n.y)
k <- outer(x, y, function(j,i,s=0.05) { # s is the spatial scale of the "waves"
v <- pmin(j/x.max, 1-j/x.max)
u <- pmin(i/y.max, 1-i/y.max)
z2 <- 1/2 * (u^2 + v^2) / s^2
ifelse(z2 > 100, 0, exp(-z2))
})
k <- k/sum(k)
wave <- Re(fft(fft(z) * fft(k), inverse=TRUE))
wave <- wave / sd(wave) * sd.floor
# Add the trend and the measurement error.
plane <- outer(x, y, function(x, y) -(cos(strike)*x + sin(strike)*y)) * tan(dip)
z <- plane + wave + rnorm(n.x * n.y, 0, se.measurement)
z <- z - mean(z) # Original mean elevation is zero as reference.
xy.df$Z <- as.vector(z)
#------------------------------------------------------------------------------#
#
# Fit an optimal plane.
#
# The local penalty is cost[1] times the amount above the threshold
# or cost[2] times the amount below the threshold.
#
U <- function(delta, threshold=0.0, cost=c(1,1)) {
delta.minus <- delta + threshold; delta.plus <- delta - threshold
(delta.plus > 0) * delta.plus * cost[1] - (delta.minus < 0) * delta.minus * cost[2]
}
#
# The global penalty tries to level the floor as well as flatten it.
# Its argument is the coefficient vector (b1, b2, b3, b4) designating the
# plane b1*x + b2*y + b3*z + b4 = 0 (equivalently, z = (b4 - b1*x - b2*y)/b3).
#
V <- function(coeff, strength=100*n.x*n.y) {
nu <- sqrt(sum(coeff[1:3]^2))
strength * sum((coeff[1:2]/nu)^2)
}
#
# The objective is the sum of the local penalties at the residuals
# plus any global penalty for lack of levelness. The
# residuals are *distances* from the data points to the fitted plane.
#
objective <- function(coeff, data=xy.df, penalty.local=U,
penalty.global=function(beta){0}, ...) {
# (Optional args are passed to the local penalty function.)
coords <- as.matrix(subset(data, select=c("X", "Y", "Z")))
nu <- sqrt(sum(coeff[1:3]^2))
sum(penalty.local((coords %*% coeff[1:3] - coeff[4]) / nu, ...)) +
penalty.global(coeff)
}
#
# Use the OLS solution to construct a starting estimate.
#
fit.lm <- lm(Z ~ X + Y, data=xy.df)
beta <- coef(fit.lm)
coeff.start <- c(-beta["X"], -beta["Y"], Z=1, beta[1])
#
# Polish it with a nonlinear optimization.
# The local costs can be controlled here through the `threshold` and `cost`
# arguments. The global cost is controlled via the `penalty.global`
# argument, which is the global cost function (exemplified by V()). To remove
# the global penalty, use `penalty.global=function(b){0}`.
#
fit <- optim(coeff.start, objective, penalty.global=V, threshold=0.0025, cost=c(3,1))
#
# Compute the fitted values.
#
b.hat <- fit$par
xy.df$Pred <- with(xy.df, (b.hat[4] - b.hat[1]*X - b.hat[2]*Y) / b.hat[3])
#
# Display the results.
#
par(mfrow=c(2,2))
plot(xy.df$Z, xy.df$Pred, col="#00000040",
xlab="Current elevation", ylab="Final elevation")
abline(c(0,1), col="Red", lwd=2)
image(z, asp=y.max/x.max, main="Current elevation")
image(matrix(xy.df$Pred, nrow=n.x), asp=y.max/x.max, main="Final elevation")
image(matrix(xy.df$Z - xy.df$Pred, nrow=n.x), asp=y.max/x.max,
main="Residuals")
#
# Quick contour plots show the magnitudes and signs of the results.
#
par(mfrow=c(2,2))
plot(xy.df$Z, xy.df$Pred, col="#00000040",
xlab="Current elevation", ylab="Final elevation")
abline(c(0,1), col="Red", lwd=2)
contour(x, y, z, asp=y.max/x.max, main="Current elevation")
contour(x, y, matrix(xy.df$Pred, nrow=n.x), asp=y.max/x.max, main="Final elevation")
contour(x, y, matrix(xy.df$Z - xy.df$Pred, nrow=n.x), asp=y.max/x.max,
main="Residuals")
Original Message:
Sent: 07-09-2014 18:39
From: James Joseph
Subject: Applied Stat - Help, I need a flat floor!
I've found an R-package that can apply window-based smoothers to three-dimensional data. I will set it to my desired tolerance and realize the error I need to account for via grinding/filling.