Category Archives: Research

PhD, with distinction

img_7400aWednesday, August 12th, 2009 exceeded my expectations in so many ways.  I was really touched by having so many people in attendance for my dissertation defense.  Friends and professors from math & stat, biology, and other departments, collaborators from the medical campus and from consulting outside the university, and my mom and brother.  About 25-30 in all. My advisor, Edward Bedrick, provided a wonderful introduction that reminded me of the many ways I’ve connected with the UNM community: teaching (and awards), research assistantships (and publications), statistics department consultant, collaborations, and unicycling around campus those first few years. img_6532I was generally happy with my presentation and was grateful that Ed reminded me occasionally of points that I wanted to make but had forgotten to mention.  He was also my champion whenever I tried to sell myself short, interjecting and saying that a particular point I was glossing over is nontrivial in theory and application.  I had some positive feedback and am learning how to give better talks. My committee’s questions really helped me feel more like a peer, asking questions with the expectation that I might provide insight on a topic.  They never asked a question with the goal to stump me or make me feel small.  I felt supported, respected, encouraged, and welcomed into PhD family. What next?  I’m now a postdoc at the Mind Research Network on the UNM campus in Albuquerque doing modeling for fMRI brain imaging data, see previous post.  I’m completing my dissertation to submit to the university, writing papers from the dissertation, and organizing work on my other collaborations.  These next couple years are going to challenging and exciting, frustrating and daunting, engaging and inspiring, and I’d better also say peaceful and tranquil so my mom thinks I’m getting some rest, too.  Carpe data!
... more

Dissertation defense, August 12th

Dissertation defense for Erik Barry Erhardt, PhD candidate in Statistics Wednesday, August 12th at 10am in Humanities 428 University of New Mexico Title: Stable Isotope Sourcing using Sampling Abstract: Stable isotope sourcing is used to estimate proportional contributions of sources to a mixture, such as in the analysis of animal diets, plant nutrient use, geochemistry, pollution, and forensics. We focus on animal ecology because of the particular complexities due to the process of digestion and assimilation. Parameter estimation has been a challenge because there are often many sources and few isotopes leading to an underconstrained linear system for the diet probability vector. This dissertation off ers three primary contributions to the mixing model community. (1) We detail and provide an R implementation of a better algorithm (SISUS) for representing possible solutions in the underconstrained case (many sources, few isotopes) when no variance is considered (Phillips and Gregg, 2003). (2) We provide general methods for performing frequentist estimation in the perfectly-constrained case using the delta method and the bootstrap, which extends previous work applying the delta method to two- and three-source problems (Phillips and Gregg, 2001). (3) We propose two Bayesian models, the implicit representation model estimating the population mean diet through the mean mixture isotope ratio, and the explicit representation model estimating the population mean diet through mixture-specific diets given individual isotope ratios. Secondary contributions include (4) estimation using summaries from the literature in lieu of observation-level data, (5) multiple methods for incorporating isotope ratio discrimination (fractionation) in the analysis, (6) the use of measurement error to account for and partition more uncertainty, (7) estimation improvements by pooling multiple estimates, and (8) detailing scenarios when one model is preferred over another. We show that the Bayesian explicit representation model provides more precise diet estimates than other models when measurement error is small and informed by the necessary calibration measurements.
... more

Wishart distribution in WinBUGS, nonstandard parameterization

The Wishart distribution and especially the inverse-Wishart distribution are the source of some confusion because they occasionally appear with alternative parameterizations. Also, the Wishart distribution can be used to model a covariance matrix or a precision matrix (the inverse of a covariance matrix) in different situations, and the inverse-Wishart the same, but the other way round. It’s already becoming complicated. Hal Stern, coauthor of Bayesian Data Analysis (BDA), helped to clarify many issues for me in an email conversation. In this post I hope to clarify the differences in Wishart parameterizations of BDA, the wikipedia pages, and the WinBUGS and OpenBUGS softwares, and show an example in OpenBUGS where the inverse parameterization has to be specified relative to the distribution’s definition for the correct posterior to result. The Wishart distribution commonly arises in the context of the Bayesian multivariate normal model, so first I detail that model. Let $$d$$-dimensional vector $$y_i$$, $$i=1,\ldots,n$$, be independent with $$y_i|\mu,\Sigma\sim\textrm{Normal}(\mu,\Sigma)$$, where $$\mu$$ is a (column) vector of length $$d$$ and $$\Sigma$$ is a $$d\times d$$ covariance matrix, which is symmetric and positive definite. The conjugate priors for the parameters are $$\mu|\mu_0,\Sigma,\kappa_0\sim\textrm{Normal}(\mu_0,\Sigma/\kappa_0)$$, where $$\mu_0$$ is the prior mean and $$\kappa_0$$ is the number of prior measurements on the $$\Sigma$$ scale, and $$\Sigma|\Sigma_0,\nu_0\sim\textrm{Inv-Wishart}(\Sigma_0,\nu_0)$$, where $$\Sigma_0$$ is the prior covariance matrix with $$\nu_0$$ degrees of freedom. I’ll follow the definition of the Wishart and inverse-Wishart distributions used in BDA and Wikipedia. $$W|A,\nu\sim\textrm{Wishart}(A,\nu)$$ if $$p(W|A,\nu)\propto|A|^{-\nu/2}|W|^{(\nu-d-1)/2}\exp\left(-\frac{1}{2}\textrm{tr}(A^{-1}W)\right)$$, with expected value $$\textrm{E}(W)=\nu A$$ (notation: $$\textrm{tr}(X^{T})$$ indicates the trace of the transpose of matrix $$X$$) . The integral is finite when $$\nu\ge d$$ and the density is finite if $$\nu\ge d+1$$. In this definition $$A$$ and $$W$$ are on the same scale. $$A$$ and $$W$$ are both covariance matrices when modeling a sample covariance matrix. Specifically, the scatter matrix $$S$$, where $$S=\sum_{i=1}^n(y_i-\bar{y})(y_i-\bar{y})^{T}$$, has $$S|V,n\sim\textrm{Wishart}(V,n)$$, where $$V$$ is the population covariance and $$n$$ is the sample size informing $$S$$. $$A$$ and $$W$$ are both precision matrices when modeling the inverse covariance (precision) matrix ($$\tau\equiv\Sigma^{-1}$$) of the normal model above. That is $$\tau|\tau_0,\nu_0\sim\textrm{Wishart}(\tau_0,\nu_0)$$, where $$\tau_0$$ is the inverse of the prior population covariance and $$\nu_0$$ can be thought of as the prior sample size. If $$B^{-1}|A,\nu\sim\textrm{Wishart}(A,\nu)$$ (where $$B$$ is a covariance matrix and $$B^{-1}$$ and $$A$$ are precision matrices) then $$B|A^{-1},\nu\sim\textrm{Inv-Wishart}(A^{-1},\nu)$$.  Now, with $$W$$ and $$A$$ as covariance matricies, $$W|A,\nu\sim\textrm{Inv-Wishart}(A,\nu)$$ if $$p(W|A,\nu)\propto|A|^{\nu/2}|W|^{-(\nu+d+1)/2}\exp\left(-\frac{1}{2}\textrm{tr}(AW^{-1})\right)$$, with expected value $$\textrm{E}(W)=A/(\nu-d-1)$$. One point of confusion was when using WinBUGS and OpenBUGS to fit the Bayesian multivariate normal model. I’ll call the WinBUGS parameterization of the Wishart the “BUGS-Wishart” with a covariance parameter ($$R$$) and a precision random matrix ($$W$$) (these are on inverse scales from one another, where our standard definition above had the parameter and random matrix on the same scale). $$W|R,\nu\sim\textrm{BUGS-Wishart}$$ if $$p(W|R,\nu)\propto|R|^{\nu/2}|W|^{(\nu-d-1)/2}\exp\left(-\frac{1}{2}\textrm{tr}(RW)\right)$$, with expected value $$\textrm{E}(W)=\nu R^{-1}$$. By itself, this was straightforward for the parameter to be on the covariance scale rather than the precision scale, however it becomes misleading when incorporated into the normal model. For example, I expected a non-informative prior to have covariance matrix $$R$$ with large diagonal elements. But the following example shows the need for a Bayesian to write down the full posterior to see how the prior and likelihoods combine in the posterior. The second point of confusion was in the prior distributional specification for precision matrix $$\tau$$ in the OpenBUGS Camel example. This example has replicated structured missing bivariate data at $$\pm 2$$, but four complete-data points at the corners of a square ($$\pm 1$$) centered at zero. The camel example is named so because the posterior for $$\rho$$ (rho) has two humps; two of the complete observations support correlation $$+1$$ and two support correlation $$-1$$. The data has a normal distribution parameterized with precision matrix, $$\tau$$, as $$y_i\sim\textrm{Normal}([0,0]^{T},\tau)$$, and precision matrix $$\tau\sim\textrm{Wishart}(R,2)$$, where $$R=\textrm{diag}(0.001,0.001)$$. The correlation $$\rho$$ is calculated from $$\tau$$ by inverting $$\tau$$ to get the covariance matrix $$\Sigma$$ and taking the off-diagonal covariance divided by the square root of the product of the on-diagonal variances. Because the BUGS-Wishart distribution has a covariance parameter $$R$$, I was surprised at their use of a noninformative prior parameter matrix with diagonal values of 0.001 (instead of variance values of 1000). Hal Stern helped clarify why the prior they use is the correct one by writing out the posterior distribution based on the BUGS-Wishart density and the normal likelihood (which is parameterized in terms of precision $$\tau$$). Looking at the exponential kernel of the BUGS-Wishart distribution, we have $$\exp\left(-\frac{1}{2}\textrm{tr}(R\tau)\right)$$. The kernel of the likelihood is $$\exp\left(-\frac{1}{2}\sum_{i=1}^n(y_i-\mu)^{T}\tau(y_i-\mu)\right)=\exp\left(-\frac{1}{2}\textrm{tr}(S_0\tau)\right)$$, where $$S_0=\sum_{i=1}^n(y_i-\mu)(y_i-\mu)^{T}$$ is the “sums of squares” relative to $$\mu$$. Thus, when the prior and likelihood are combined in the posterior, the posterior for $$\tau$$ is now $$\exp\left(-\frac{1}{2}\textrm{tr}((R+S_0)\tau)\right)$$, which is BUGS-Wishart with parameter $$R+S_0$$. This is the correct noninformative analysis since the near-zero matrix $$R$$ from the prior contributes little to the sum with $$S_0$$ informed by the data. However, it requires looking ahead to the form of the posterior to see that the parameter for the BUGS-Wishart needs to be specified as a precision matrix INSTEAD of a covariance matrix! Dave Lunn (first author of the reference for WinBUGS) has written to me that he will try to clarify matters in the next version. I would like to see a second version of the Wishart implemented in WinBUGS and OpenBUGS with the parameter and the random matrix on the same scale, as in BDA, to avoid this awkward inverse specification of the parameter for the BUGS-Wishart distribution.
... more

Paper accepted: Prenatal X-ray Exposure and RMS in Children

Seymour Grufferman, Frederick Ruymann, Simona Ognjanovic, Erik B. Erhardt, and Harold M. Maurer. Prenatal X-ray Exposure and Rhabdomyosarcoma in Children: A Report from the Children’s Oncology Group.  Cancer Epidemiology Biomarkers & Prevention. April 2009, 18(4), OF1–6. [pdf] This is the first in a series of papers that will be the result of my research assistantship Fall 2005 — Spring 2008 with Seymour Grufferman, M.D., Dr. P.H. and Deirdre A. Hill, Ph.D., M.P.H. at the Cancer Research and Treatment Center, University of New Mexico, Albuquerque, NM 87131, USA.  More results are likely to come from this rich dataset.
... more

RA at MIND Institute begins

On Jan 20th, 2009, I joined the Medical Image Analysis Laboratory (MIALab) as a research assistant (RA) at the MIND institute at UNM.  This position will transition to a 2-3 year postdoc upon the completion of my PhD this May. Vince Calhoun, Edward Bedrick (my stat advisor), Jeremy Bockholt, and I compose the Biostatistics & NeuroInformatics Core (STATNI) on UNM’s Center of Biomedical Research Excellence (COBRE) grant funded by the National Center for Research Resources (NCRR), a part of the National Institutes of Health (NIH).  STATNI serves as a centralized resource for biostatistical consulting for a number of scientific projects. My role will be in the development of statistical methods and programming numerical and statistical methods to address the aims of the projects.  Specifically, the development of a Bayesian ICA model for fMRI data. There are five aims to the project that will ultimately extend the ability to incorporate prior information to move beyond the semi-blind ICA approach. [from the project summary] First, we will extend our semi-blind ICA (sbICA) framework to provide a general framework for incorporating prior information from multiple spatial and temporal sources. In the second aim we will focus upon statistical inference and develop a framework for integrating the relevant functional components. In the third aim, we will validate the algorithms in aims 1 and 2, including using fMRI data collected on multiple days from a variety of paradigms. In this aim we develop a decision mechanism for selecting the best combination of methods given a particular problem. For the fourth aim, we will apply our methods to data collected during four well-studied paradigms in healthy controls and patients with schizophrenia. Our final aim involves the continuing development of our GIFT toolbox, and incorporation of the above algorithms, constraint selection mechanisms, and visual interfaces into the software. The successful completion of this research will provide a powerful set of tools for the research community to increase the sensitivity and specificity of BOLD analysis methods by drawing upon the strengths of both model-based and data-driven approaches. These tools will also provide a way to study the inter-relationship among functional networks in a flexible manner. This is an ideal position for me because the modeling is similar to work I have done in my dissertation, I continue to work closely with my advisor, Ed, who I continue to learn so much from, I get to learn and work with Vince who has many ideas and is very prolific, and all of this in the hot area of fMRI.  I also have family and friends in Albuquerque who I want to stay close with for a little longer and this position allows me to stay put.
... more

Paper accepted: Stable Isotope collaboration, Chris Bickford

Christopher P. Bickford, Nate G. Mcdowell, Erik Barry Erhardt, Heath H. Powers, David T. Hanson. (2009) “High frequency field measurements of diurnal carbon isotope discrimination and internal conductance in a semi-arid species, Juniperus monosperma“. Plant, Cell & Environment, online Volume 32, Issue 7, pages 796–810, July 2009. Chris Bickford, PhD candidate UNM Biology, and I met when we attended Iso-Camp at Jim Ehleringer’s lab at U Utah Summer 2008.  On the flight home we started discussing a challenge he was facing in his first of three dissertation papers. He studies details of plant photosynthesis.  He had complicated expressions for leaf carbon isotope discrimination $$\Delta$$ and internal conductance $$g_i$$ based on CO$$_2$$ concentrations of CO$$_2$$ isotopologues $$^{13}C^{16}O^{16}O$$ and $$^{12}C^{16}C^{16}O$$. He needed to propigate the variation of the CO$$_2$$ measurements into his variables of interest, $$\Delta$$ and $$g_i$$.  He also needed to compare his accurate and precise measurements using tunable diode laser spectroscopy (TDL) to predictions from three models. There were a number of statistical issues.  One was how to make model and observation comparisons.  I suggested using RMSE since it includes both variance and bias in the single measurement.  The main issue was the incorporation of variation from the CO$$_2$$ measurements into the quantities of interest.  The bootstrap allowed us to do this.  There were a number of programming sessions in R to write functions and scripts to do all the calculations, create plots, output spreadsheets of results, and so on.  Chris has become a convert from Excel to R over the course of this project.  These methods implemented on this paper will likely flow into later pubs for both Chris and Dave. Chris has taken a postdoc in New Zealand, where he and his wife, Karen, will spend the next two years with their dog.  He defends his dissertation on April 13th.
... more