**The 2017 Nordic and Baltic Stata User Group Meeting, Friday, September 1, 2017**

Venue: Karolinska Institutet, MF Salen, Nobels väg 10, Solna.

The 2017 Nordic and Baltic Stata Users Group meeting will be held at Karolinska Institutet in Stockholm on September 1, 2017. This meeting will provide Stata users the opportunity to exchange ideas, experiences, and information on new applications of Stata. Anyone interested in Stata is welcome. Representatives from StataCorp -- David Drukker, Executive Director of Econometrics, and Jeff Pitblado, Director of Statistical Software -- will attend, and there will be the usual "Wishes and grumbles" session at which you may air your thoughts to Stata developers.

**Program**

**8:45–09:00 Welcome and introduction** Peter Hedström, Metrika Consulting

**09:00-9:30 stcrmix: a Stata command for estimating mixed competing risks proportional hazard models**, Christophe Kolodziejczyk, Danish Center for Applied Social Sciences, Denmark

I present a new Stata command, stcrmix, which can estimate competing risks models with unobserved heterogeneity, i.e. mixed competing risk proportional hazard model. I show in particular how to use stcrmix to estimate the so-called Timing-of-Events model. Stcrmix follows closely the implementation of the model by Gaure et al. (Journal of Econometrics 2007) and from their crmph R-package. The mixing distribution is approximated by a discrete distribution and the model is estimated by the non parametric maximum likelihood estimator (NPMLE). For a given number of heterogeneity points a new set of points which improve the likelihood function are added. Then the likelihood function is maximized with respect to the whole set of parameters. The procedure is repeated until no further improvement in the likelihood is experienced. The model is presented as well as the estimation method, where I cover the likelihood function and the way new candidates to heterogeneity points are found. The syntax of the command is presented. I show how to set up the data to estimate the Timing-of-Events model and I show an example based on simulated data of how to estimate the model. I present results from Monte-Carlo simulations and discuss other uses of the command.

**9:30-10:00 Instantaneous geometric rates via generalized linear models** Andrea Discacciati, Matteo Bottai Unit of Biostatistics, IMM, Karolinska Institutet, Sweden

The instantaneous geometric rate represents the instantaneous probability of an event of interest per unit of time. We propose to model the effect of covariates on the instantaneous geometric rate with two models: the proportional instantaneous geometric rate and the proportional instantaneous geometric odds model. These models can be fit within the Generalized Linear Model framework by using two nonstandard link functions, which we implemented in the user-defined link programs log_igr and logit_igr. Their use is illustrated through a real-data example.

**10:00-10:30 Gompertz regression parameterized as accelerated failure time model** Filip Andersson, Nicola Orsini, Biostatistics Team, Department of Public Health Sciences, Karolinska Institutet

The only two parametric survival models currently implemented in the -streg- command in both the metric of time and hazard are the Exponential and Weibull distributions. The Gompertz survival model is parameterized only as a proportional hazard model. The accelerated failure time of the Gompertz distribution is available in the R-package “eha” (Broström, G. 2014) but not in Stata. Aim of this presentation is therefore to present an accerelated failure time parametrization of the Gompertz survival model. Parameters are estimated using the method of maximum likelihood. Applications of the model are illustrated using demographic mortality data.

10:30–11:00 Coffee Break

**11:00-11:30 Modelling multiple timescales using flexible parametric survival models** Hannah Bower and Therese M-L Andersson, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Michael J Crowther and Paul C. Lambert Department of Health Sciences, University of Leicester and Department of Medical Epidemiology and Biostatistics, Karolinska Institutet.

Time-to-event data are frequently modelled by considering only one main timescale. This may not be optimal for many research questions. When two timescales have been considered, modelling is often limited to including one main timescale and including a time-split variable version of the second timescale in the model; unfortunately this can be computationally intensive. Another less optimal solution is to include a time-fixed version of the second timescale, which does not sufficiently capture the trend of interest. Since time increases at the same rate, every timescale can be written as a function of others; for example, attained age from a diagnosis of a disease is equal to the time from the diagnosis, plus the age at diagnosis. Likelihood functions of standard time-to-event models cannot be written analytically when the model includes multiple timescales as a function of each other. However, we develop an approach to model the log hazard using flexible parametric survival models, employing numerical integration to obtain the likelihood function under an arbitrary number of timescales. Thus, we present a new Stata command which offers the possibility to model multiple timescales simultaneously using flexible parametric survival models on the log hazard scale.

**11:30–12:00 Causal inference with sample selection **David Drukker, StataCorp

**I** discuss how to use the new extended regression model (ERM) commands to estimate average causal effects when the outcome is censored or when the sample is endogenously selected. I also discuss how to use these commands to estimate causal effects in the presence of endogenous explanatory variables, which these commands also accomodate.

12:00–13:00 Lunch break

**13:00–13:30 A Journey to latent class analysis (LCA) **Jeff Pitblado, StataCorp

Stata's estimation commands have evolved in how they account for groups in the sample. Since the early days of Stata, fitting models with group-specific parameters is simply a matter of using the -if- clause to condition on group membership. Inference between group-specific parameters was made possible with the introduction of -suest- in Stata 8. In Stata 12 we introduced -sem- and group analysis for structural equation models (SEMs). Stata 15 introduces two kinds of group analysis for generalized SEMs. For observed groups, -gsem- has the new -group()- option. For latent groups, -gsem- has the -lclass()- option and the ability to perform LCA.

**13:30–14:00 Intervention time-series models using transfer functions** XingWu Zhou, Nicola Orsini, Biostatistics Team, Department of Public Health Sciences, Karolinska Institutet

The evaluation of the impact of policies on the population’s health has become a major commitment for States and Communities. The intervention (or interrupted) time series design is the strongest and most commonly used quasi-experimental design to assess the impacts of health interventions in which the standard randomized trials are not feasible. The recent user-written command itsa and its related post-estimation commands (SJ 15-2, SJ 17-1) greatly facilitates testing shifts in level and slope after intervention using linear regression models with an adjustment of the standard errors for the correlation of the repeated measures over time (-newey-, -prais-). A more advanced approach consists in ARIMA models with transfer functions proposed by Box and Tiao (JASA, 1975). Although transfer function models have been successfully used in several research areas, Stata does not have a command specially designed for it. Aim of this talk is therefore to explain how to estimate these types of models in Stata. Applications of the method will be discussed.

**14:00–14:30 The new –qcm– command for nonlinear quantile coefficient models** Matteo Bottai, MM, Karolinska Institutet, Nicola Orsini, Biostatistics Team, Department of Public Health Sciences, Karolinska Institutet

We present –qcm–, a new command for the estimation of nonlinear quantile coefficient models. These are parametric models for the conditional quantile function of an outcome variable given covariates. The parameters are defined as functions of the order of the quantile. We briefly introduce the method and illustrate the use of the command through an example on the estimation of percentiles respiratory function in healthy children.

**14:30-15:00 One-stage dose-response meta-analysis** Nicola Orsini, Alessio Crippa, Biostatistics Team, Department of Public Health Sciences, Karolinska Institutet

Synthesis of linear and non-linear exposure-disease associations based on summarized data is often limited to epidemiological studies reporting more than two non-referent categories. Being able to specify a model on the combined data rather than within each study would allow inclusion of all the available information regardless of how the exposure was initially categorized. Within the general framework of a linear mixed-effect model, aim of this presentation is to show how to specify a one-stage dose-response model suitable for this type of data. Estimation based on likelihood and restricted maximum likelihood is implemented in a new command. Simulated data and real examples will be used to illustrate the advantages offered by the proposed approach.

**15:00–15:30 Coffee break**

**15:30-16:00 Text analytics using WordStat 7 within Stata** Normand Peladeau, Provalis Research

WordStat for Stata offers advanced text analytics features, allowing Stata 13, 14 and 15 users to analyze text stored in both short- and long-string variables using numerous text-mining features, such as topic modeling, document clustering, automatic classification, GIS mapping, and state-of-the-art dictionary-based content analysis. Extracted themes may then be related to structured data using various statistics and graphic displays. WordStat also offers a tool to create a Stata project from lists of documents (including .DOC, HTML, and PDF files) and to automatically extract from those, numerical, categorical data, and dates.

**16:00-16:30 The use of Stata in Medical Statistics and Epidemiology: A Long Journey** Rino Bellocco, Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milano, and Department of Medical Epidemiology and Biostatistics, Karolinska Institutet

In my talk I will review how Stata has facilitated teaching epidemiology and biostatistics in many Master and Phd programs. Many procedures such as the one available in epitab elegantly describe simple and adjusted estimation and testing in both cohort and case-control studies. The lexis macro has turned into the stsplit powerful procedure-. The nice correspondence between the underlying methods and the simple application in Stata has represented a unique feature of the software. User's contribution and interaction represent a valuable contribution in the development of the software.

**16:30-17:00 Wishes and grumbles ***StataCorp*

**Registration**

To register for the meeting, please send an email to info@metrika.se containing your name, affiliation, and contact details.

**Organizers**

*Scientific committee:*

*Matteo Bottai, **Unit of Biostatistics, National Institute of Environmental Medicine, Karolinska Institutet*

*Paul Lambert, Department of Health Sciences at the University of Leicester and Department of Medical Epidemiology and Biostatistics, Karolinska Institutet*

*Nicola Orsini, Biostatistics Team, Department of Public Health Sciences, Karolinska Institutet*

*Logistics organizers*

The meeting is jointly organized by the Karolinska Biostatistics Team at the Department of Public Health Sciences and Metrika Consulting.