D.E. Beaudette, P. Roudier, J.M. Skovlin
This document is based on aqp
version 1.8-8 and soilDB
version 1.5-5.
can we do better than selecting a “representative pedon” from a collection?
generalized horizon labels are expert-guided, “micro-correlation” decisions
\[ P[Y \geq j | X] = \frac{1}{1 + exp[-(\alpha_{j} + X \beta]} \] Extension of logistic regression model; predictions constrained by horizon designation and order. RCS basis functions accommodate non-linearity.
\[ H = -\sum_{i=1}^{n}{p_{i} * log_{n}(p_{i})} \] \( H \) is an index of uncertainty associated with predicted probabilities, \( \mathbf{p} \), of encountering horizons \( i \) through \( n \) at some depth. Larger values suggest more confusion.
\[ B = \frac{1}{n} \sum_{i=1}^{n}{ ( p_{i} - y_{i} )^{2} } \] \( B \) is an index of agreement between predicted probabilities, \( \mathbf{p} \), and horizons, \( \mathbf{y} \), over depth-slices \( i \) through \( n \) associated with a specific horizon. Larger values suggest less agreement between probabilities and observed horizon labels.
Examples using 54 profiles correlated to Loafercreek soil series
Examples using 54 profiles correlated to Loafercreek soil series
colors represent generalized horizon labels (GHL)
colors represent generalized horizon labels (GHL)
colors represent generalized horizon labels (GHL)
no assumptions, simple interpretation, directly tied to the original data; but over-fit
proportional-odds logistic regression generalizes the process
empirical probabilities when data are sparse, PO-LR when data are available
Shannon entropy: continuous metric of confusion; Brier scores: agreement by GHL
model stability: iterative re-fitting (n=25, reps=250), mean R2 = 0.89
most-likely horizon boundaries determined by probability depth-functions