Another look at New Mexico suicide statistics: conditional probability and data visualization

November 4th, 2011

This article was printed in the Daily Lobo on 11/10/2011.

Presenting information in a way that clearly answers interesting questions is challenging. Every plot has an implicit question (hypothesis) that it helps you answer. Therefore, it is important to align a visual display of information with the intended interesting question(s). Collaboration or consultation with a statistician can clarify interesting questions and lead to answers through appropriate data analysis (visit UNM’s free statistics consulting clinic, www.stat.unm.edu/~clinic).

Suicide was the topic of the front cover story in the Daily Lobo on Thurs, Nov 3rd. With the story, two pie charts displayed average annual proportions of “successful” and “unsuccessful” suicides by method in NM. The “successful” pie chart answers this statement of conditional probability (their implied question): “given a successful suicide, what percentage used certain methods?” A question I consider more interesting reverses the conditioning (my question): “given an attempted suicide with a certain method, what percentage were successful?” Furthermore, I want to know the overall frequency and percentage of each method attempted. How can we present the information in a way that simultaneously answers these questions?

The Suicide Prevention Resource Center (SPRC.org) maintains national and state suicide fact sheets, last updated September 2008, describing “deaths by suicide, estimated hospitalized attempts, and data on medical costs, work loss costs, gender, race/ethnicity, age, and method of suicide.” The pie charts in Thursday’s Daily Lobo were reproductions of those found on the NM fact sheet. From their NM summaries, below is the SPRC table for estimated mean frequencies by method for “successful” and “unsuccessful” suicides.

Method Successful Unsuccessful Total
Cut/Pierce 4 229 233
Firearms 191 16 207
Poisoning 60 1097 1157
Suffocation 73 23 96
Other/Unspecified 13 91 104
Total 341 1456 1797

Their question and pie charts (below) consider percentages down columns. When the data are reduced to row percentages for “successful” and “unsuccessful” attempts separately, you lose the relative frequency of attempts. The percentage of firearms “successes” (56%), for example, depends on all the other “successful” attempts. Because proportions for “successful” and “unsuccessful” attempts are separate, you can’t learn about how successful firearm attempts are.

Original pie chart

Original pie charts of proportions of method conditional on attempt "success", which doesn't ask/answer the interesting/relavant question.

It is critical to consider the temporal process: a person first chooses a method, then makes an attempt, and is either “successful” or not. The data display and questions should follow these temporal steps. The pie chart displays ignore this process.

My question and plot (below) considers the temporal process of attempting suicide, considering percentages across rows, including row total information. First, the relative use of various methods is clear; almost two-thirds of attempts are by poisoning, and firearm and cut/pierce are each just above one in ten. However, though attempts by firearms (12%) and cut/pierce (13%) are relatively rare, the “success” rates are extremely different (92% versus 2%)! The plot has been sorted by the numbers of “successes” to emphasize the relative risk of the methods in terms of lives, information which is lost in the pie charts. Also, the area of each box is relative to the frequency in each box. The Agora Crisis Center (505-277-3013, 9am-midnight, every day) plays a critical role in our community, and our education as individuals around these issues can save someone. Using statistics and visualization to tell and understand the important story in the data can lead to improvements in strategies and resource allocation for treatment and prevention.

Improved visualization

Improved visualization has relative use of methods across the horizontal and proportion of successes along the vertical. Area is proportional to people.

R code follows to produce plot above (with modest post-production necessary).
Read more…

Research, Statistics

Paper published: Capturing inter-subject variability with group independent component analysis of fMRI data: a simulation study

October 14th, 2011

Our paper using our simulation toolbox (SimTB) detailing what can be learned about multi-subject ICA on fMRI data has been published in NeuroImage.

Capturing inter-subject variability with group independent component analysis of fMRI data: A simulation study
Elena A. Allen, Erik B. Erhardt, Yonghua Wei, Tom Eichele, Vince D. Calhoun
NeuroImage, Available online 14 October 2011, ISSN 1053-8119, 10.1016/j.neuroimage.2011.10.010.
Volume 59, Issue 4, 15 February 2012, Pages 4141–4159
(http://www.sciencedirect.com/science/article/pii/S1053811911011712)
Keywords: fMRI; Inter-subject variability; Group ICA; Multi-subject; Model order; Simulations

Abstract
A key challenge in functional neuroimaging is the meaningful combination of results across subjects. Even in a sample of healthy participants, brain morphology and functional organization exhibit considerable variability, such that no two individuals have the same neural activation at the same location in response to the same stimulus. This inter-subject variability limits inferences at the group-level as average activation patterns may fail to represent the patterns seen in individuals. A promising approach to multi-subject analysis is group independent component analysis (GICA), which identifies group components and reconstructs activations at the individual level. GICA has gained considerable popularity, particularly in studies where temporal response models cannot be specified. However, a comprehensive understanding of the performance of GICA under realistic conditions of inter-subject variability is lacking. In this study we use simulated functional magnetic resonance imaging (fMRI) data to determine the capabilities and limitations of GICA under conditions of spatial, temporal, and amplitude variability. Simulations, generated with the SimTB toolbox, address questions that commonly arise in GICA studies, such as: (1) How well can individual subject activations be estimated and when will spatial variability preclude estimation? (2) Why does component splitting occur and how is it affected by model order? (3) How should we analyze component features to maximize sensitivity to intersubject differences? Overall, our results indicate an excellent capability of GICA to capture between-subject differences and we make a number of recommendations regarding analytic choices for application to functional imaging data.

MIND, Research

Eri-Eri-Eri-Erik, the fastest, smoothest, dreamiest swinger

August 26th, 2011

A personalized song? for me?

Artist: Katherine Sanden
Music: Eri-Eri-Eri-Erik (lyrics), March 2010

Interview (June 2010) with Katherine Sanden about music and song writing.  Kat studied mathematics at Princeton University.  She now tutors in mathematics, teaches music and piano, and writes sublime rhymes and beats.

For contra dance weekend Stellar Days & Nights, in Buena Vista, Colorado, February 18-21, 2010, I drove up with Richard, Laurel, and Karina Wilson, Lauren Lamont, and Della O’Keefe. During the silent auction CDSS‘s Max Newman and I got into a fierce bidding war over a custom song written and performed by Katherine Sanden. I had every intention to win, and when the bell rang, I had. In the spirit of the “new sincerity” I requested a “monster ego explosion” (after all, how many chances will I have for someone to write a song about ME!?). What I got was much, much more! I still flush with embarrassment each time I hear it. Quality headphones are recommended for a dynamic experience of the full audial range. Everyone needs a steamy power jam — lucky me!

dance, Fun

Statistics job resources

June 19th, 2011

Applying for an academic job is serious work.  I ended up lucky (though, luck favors the prepared (Louis Pasteur)).  I received two job offers this season and took my first-choice job.  But I worked hard to get those offers.  I kept a detailed CV my entire student career (starting as a BA student, not waiting until job season to start), wrote an extensive teaching dossier for the 20 courses I’ve taught and ugrad tutoring experience, and developed a research statement as that vision became clearer to me.  Clearly, self-investment and personal excellence are the most important ingredients.  Next is to find people who want to hire you.

Two sites and one magazine basically covers the bases for statistics.

1.  If you’re a statistics student, you’re already a member of the ASA, right?  If so, the back of the AmStat News magazine has many jobs listed.

http://magazine.amstat.org/

2. Many jobs are posted at the American Statistical Association (ASA) jobs website.   Subscribe to their feed in your RSS reader:

http://jobs.amstat.org/search/results/index.cfm?SN=25&ss=1&display=rss

While I have had my CV posted on the site for years, I’ve never received any contact because of it.  I think the more direct approach of networking or replying to specific jobs is more effective.

3. The University of Florida statistics website lists many jobs, too.  My impression is that this site is even more comprehensive than jobs.amstat sometimes.

http://www.stat.ufl.edu/vlib/Index.html

I recommend being subscribed to the jobs.amstat.org in your RSS reader, because then most of the jobs will come to you.  You can follow-up at the UFlorida website to make sure you’re not missing anything.  Start looking in Sept/Oct and work on cover letters through Nov/Dec for the Dec/Jan/Feb deadlines.  Ask for letters of recommendation early (maybe even late summer while your professors are not busy with the semester).  Ask your advisor to look over your CV, cover letter, and other submission materials (scan a pdf of your unofficial transcript).  They’ve reviewed many applications hiring in their department before and will have good advice.  Send your application materials (all in pdf format — not doc!) as soon as you are ready to help yours be near the top of their review pile.  And while your application is in the hands of many hiring committees, try not to sweat — you’ve done all you can and it’s largely out of your control until they ask for an interview (or send you a form rejection letter, or never respond to you at all).  Feel free to send a follow-up email to request status if it’s a week or so after their self-predicted decision deadline, if it will help calm your nerves, but try not to hassle them.  It’s a very challenging market and positions regularly get 80-300 applications, so everything you can do to rise to the top of that deep stack can make the difference between getting a toe in the door and the alternative.

Interviewing is next step.  Here are some pages with questions to prepare for.  Write your questions down just as you’d say them and practice saying them aloud, maybe to a friend who will listen.  You want to clarify your answers to yourself and get them to flow smoothly out of your mouth.
10 tough interview questions
General advice

The job talk is the last step.
You’re a grown up, use Mac’s iWork Keynote — it’s the best presentation software available.
BBP was a great resource, provided you can ignore all the MSPP BS.  First five slidesTemplate. Video.
Matt Might’s presentation tips and job hunt advice.
CS Berkeley

Negotiating for your salary, start-up, teaching reduction, and more — ask your advisor for advice.  If you have a second offer, all of this becomes much, much easier!

Research, Statistics

Paper published: On network derivation, classification, and visualization: a response to Habeck and Moeller

June 8th, 2011

For the second issue of Brain Connectivity, a new journal, we were invited to provide a response to a “controversial article” about issues of analysis and interpretation in fMRI.  In a fun paper, Elena, Eswar, Vince, and I provide our perspective and some better practices to continue the dialogue.

On network derivation, classification, and visualization: a response to Habeck and Moeller
Erik B. Erhardt, Elena A. Allen, Eswar Damaraju, Vince D. Calhoun.
Brain Connectivity 1(2), 2011.

Abstract
In the decade and a half since Biswal’s fortuitous discovery of spontaneous correlations in functional imaging data, the field of functional connectivity (FC) has seen exponential growth resulting in the identification of widely-replicated intrinsic networks and the innovation of novel analytic methods with the promise of diagnostic application.  As such a young field undergoing rapid change, we have yet to converge upon a desired and needed set of standards.  In this issue, Habeck and Moeller begin a dialogue for developing best practices by providing four criticisms with respect to FC estimation methods, interpretation of FC networks, assessment of FC network features in classifying sub-populations, and network visualization.  Here, we respond to Habeck and Moeller and provide our own perspective on the concerns raised in the hope that the neuroimaging field will benefit from this discussion.

MIND, Research

tdllicor: estimates discrimination and other parameters associated with leaf photosynthesis

June 8th, 2011

Together with David Hanson, I developed R package tdllicor which reads TDL and Licor files, aligns them, and calculates quantities of interest with bootstrap intervals.  It is currently private as it is specialized and not of general interest.  It has already been important for a number of conference publications and is used for active research:

Conference Publications

DT Pater, EB Erhardt, and DT Hanson. Photorespiratory and respiratory carbon
isotope fractionation in leaves. In Proceedings of the Biophysical Society 55th
Annual Meeting, Baltimore, MD, Mar 2010. Biophysical Society.

DT Pater, EB Erhardt, and DT Hanson. Isotopic signature of photorespiration.
In Joint Annual Meetings of the American Society of Plant Biologists and the
Canadian Society of Plant Physiologists, Montreal, CA, August 2010.

Research, Statistics

mortest: estimates the total number of carcasses at a windfarm

June 8th, 2011

Working with Aaftab Jain, we developed a estimator for total number of bird and bat carcasses at a windfarm called “mortest” and implemented it as an R package.  We are interested in estimating c, the total number of carcasses (mortalities) in a period (year). The total number of carcasses is the sum of carcasses over size classes, c = sum_s=1^S c_s. If carcasses are retained (that is, not scavenged) and searcher efficiency is perfect (every carcass is found) and every tower is searched, then each c_s would be counted perfectly. Yet, carcass scavenging by predators and searchers overlooking carcasses are a reality, making observed counts an underestimate. Furthermore, tower sampling rather than censusing is a cost-saving convenience. Our estimator of total mortality, c, weighs the estimates from different search intervals and adjusts the observed counts for scavenging, search efficiency, searchable area of each tower, and proportion of towers searched, accounting for uncertainty in these estimates using a bootstrap.

The software was written by Erik Erhardt and is currently private.  Contact Aaftab Jain <aaftabj+gmail.com> for more information for using the software.

Research, Statistics

A simulation toolbox for fMRI data: SimTB

May 11th, 2011

Update: both papers have been published, simulation toolbox and inter-subject variability.

Elena Allen and I recently submitted two papers that detail a simulation toolbox for fMRI data (SimTB) and capturing inter-subject variability with group independent component analysis (ICA) using simulations. It’s been an exciting and interesting project because we can at last generate interesting and complex datasets to use as a “ground truth” to compare estimation and processing techniques.  We’ve learned a lot about the limits of some methods, as well as their robustness.  The papers will be submitted next week.  For those with MATLAB, it’s available at http://mialab.mrn.org/software.

SimTB, a simulation toolbox for fMRI data under a model of spatiotemporal separability
EB Erhardt, EA Allen, Y Wei, T Eichele, VD Calhoun. (2011)

We introduce SimTB, a MATLAB toolbox designed to simulate functional magnetic resonance imaging (fMRI) datasets under a model of spatiotemporal separability. The toolbox meets the increasing need of the fMRI community to more comprehensively understand the effects of complex processing strategies by providing a ground truth that estimation methods may be compared against. SimTB captures the fundamental structure of real data, but data generation is fully parameterized and fully controlled by the user, allowing for accurate and precise comparisons. The toolbox offers a wealth of options regarding the number and configuration of spatial sources, implementation of experimental paradigms, inclusion of tissue-specific properties, addition of noise and head movement, and much more. A straightforward data generation method and short computation time (3-10 seconds for each dataset) allow a practitioner to simulate and analyze many datasets to potentially understand a problem from many angles. Beginning MATLAB users can use the SimTB graphical user interface (GUI) to design and execute simulations while experienced users can write batch scripts to automate and customize this process. The toolbox is freely available at http://mialab.mrn.org/software together with sample scripts and tutorials.

Capturing inter-subject variability with group independent component analysis of fMRI data: a simulation study
EA Allen, EB Erhardt, Y Wei, T Eichele, VD Calhoun. (2011)

A key challenge in functional neuroimaging is the meaningful combination of results across subjects. Even in a sample of healthy participants, brain morphology and functional organization exhibit considerable variability, such that no two individuals have the same neural activation at the same location in response to the same stimulus. This inter-subject variability limits inferences at the group-level as average activation patterns may fail to represent the patterns seen in individuals. A promising approach to multi-subject analysis is group independent components analysis (GICA), which identities group components and reconstructs activations at the individual level. GICA has gained considerable popularity, particularly in studies where temporal response models cannot be speci ed. However, a comprehensive understanding of the performance of GICA under realistic conditions of inter-subject variability is lacking. In this study we use simulated functional magnetic resonance imaging (fMRI) data to determine the capabilities and limitations of GICA under conditions of spatial, temporal, and amplitude variability. Simulations, generated with the SimTB toolbox, address questions that commonly arise in GICA studies, such as: (1) How well can individual subject activations be estimated and when will spatial variability preclude estimation? (2) Why does component splitting occur and how is it a ected by model order? (3) How should we analyze component features to maximize sensitivity to intersubject differences? Overall, our results indicate an excellent capability of GICA to capture between-subject differences and we make a number of recommendations regarding analytic choices for application to functional imaging data.

SimTB flowchart for simulation of fMRI data

MIND, Research

VeraLight, Inc. receives Health Canada license approval

April 26th, 2011

Over the last year I’ve provided statistical consulting for VeraLight, Inc., a medical device company based in Albuquerque, NM. The SCOUT DS is the first non-invasive diabetes screening system designed to provide an accurate and convenient method for screening type 2 diabetes and pre-diabetes based on the presence of advanced glycation endproducts (AGEs) biomarkers found in skin. I have been primarily responsible for demographic subgroup analysis of pre-clinical trial data and review of the analysis plan for the FDA clinical trial.  Today they announced that they have received Health Canada license approval:

ALBUQUERQUE, N.M., April 26, 2011 — VeraLight Inc., a privately held medical device company, based here, today announced its Scout DS® Device was granted a Health Canada Medical Device Licence for non-invasive diabetes screening. The easy to operate device needs no blood and does not require fasting. The patient simply places their forearm onto the portable table-top unit and a quantitative result is reported in about three minutes.

… Scout DS is slated for market introduction later this year in Canada and select countries outside of the United States.

I’m really excited for them.  I imagine this product making diabetes screening a 5-minute procedure at every pharmacy drug counter.  I’m really proud of the work John, Ries, Ed, Jeff, and the rest of the group is doing to make this a reality.

JULY 28, 2011 – UPDATE

• VeraLight announces CE mark approval of the SCOUT DS® for non-invasive diabetes screening.  So they’ve passed Canada and Europe!

AUGUST 25, 2011 – UPDATE

• VeraLight announces agreement with Pear Healthcare Solutions.
VeraLight and Pear Healthcare Solutions sign Canadian distribution agreement for SCOUT DS® Noninvasive Diabetes device.

Research

Talk: ACASA Annual meeting 2011

April 17th, 2011

I’ll be giving a shortened version of my Bayesian stable isotope mixing model talk (title and abstract below) at the Albuquerque Chapter of the American Statistical Association (ACASA) annual meeting on Friday, April 29, 2011. I gave two distinct longer versions of this talk recently as part of job interview talks at St. Louis University and the University of New Mexico.  I’m looking forward to the meeting to visit with people who I’ve worked with over the last several years, organizing judging events at science fairs, and other events.

Read more…

Research, Statistics