VAT: Incl. excl.



Preliminary Program for the 2018 Nordic and Baltic Stata User Group Meeting, Friday, Oslo September 12, 2018

- Registration opens at 08:30

- Conference opens at 09.00

- Literate programming - Using log2markup, basetable and matrixtools, Niels Henrik Bruun

- Exploring Marginal Treatment Effects - Flexible estimation using Stata, Martin Eckhoff Andresen

- Calculating polarisation indices for population subgroups using Stata, Jan Zwierzchowski

- Calibrating Survey Weights in Stata, Jeff Pitblado

- Introduction to Bayesian Analysis Using Stata, Chuck Huber

- Analysing time-to-event data in the presence of competing risks within the flexible parametric modelling framework. What tools are available in Stata, which one to use and when? Sarwar Islam Mozumder

- Standardized survival curves and related measures from flexible survival parametric models, Paul C. Lambert 

- MERLIN: Mixed effects regression for linear and non-linear models, Michael J. Crowther

- Wishes and grumbles, Chuck Huber and Jeff Pitblado, StataCorp.

- Conference closes at 17.00




Niels Henrik Bruun, Department of Public Health - Institute of General Medical Practice, Aarhus University: Literate programming - Using log2markup, basetable and matrixtools

During the last decade, there have been several attempts to integrate comments and statistical outputs in Stata indicating the importance of this with respect to e.g. literate programming. I present a later development based on three integrated packages log2markup, basetable and matrixtools. The log2markup transform a commented log file into a document based on mark-up languages of the users’ choice like e.g. latex, html or markdown. One of the features of log2markup is that it reads output from Stata commands as part of the mark-up language itself. One command where this is beneficial is basetable, which is one of several commands interactively and easy to build the typical first table or base table for data summaries in e.g. articles. The output can set to have the style of the mark-up language used in the comments. I briefly demonstrate its usability. Another set of Stata commands I will present are in the Stata package matrixtools. Here the basic command matprint makes it easy to print the matrix content in the wanted mark-up style. Several other matrixtools commands uses matprint. One such is sumat, which is an extension of the Stata command summarize. Summary statistics including new ones like "unique values". Sumat returns all results in a matrix (also for text variables).It is possibly to group statistics by a categorical variable. Another such command is crossmat which is a wrapper for the Stata command tabulate returning all outputs in matrices. Further, there is the command metadata, which collects metadata from current dataset, a non-current dataset or all dataset in a folder (if requested including subfolders as well).


Martin Eckhoff Andresen, Statistics Norway: Exploring marginal treatment effects - Flexible estimation using Stata

Well-known instrumental variables (IV) estimators identify treatment effects in settings with selection on levels. In settings that also exhibit selection on gains, the treatment effects for the compliers identified by IV might be very different from other populations of interest. Under stronger separability assumptions, the marginal treatment effects (MTE) framework allows us to estimate the whole distribution of treatment effects. I introduce the framework and theory behind MTE and the new package mtefe, which uses several estimation methods to fit MTE models in Stata. This package provides important improvements and flexibility over existing packages such as margte (Brave and Walstrum, 2014) and calculates various treatment-effect parameters based on the results.


Jan Zwierzchowski, Institute of Statistics and Demography, SGH Warsaw School of Economics: Calculating polarisation indices for population subgroups using Stata

In recent years, more and more attention has been focused on the effects of economic growth and inequality changes on income polarization, as well as on the changes in the middle income class fraction. Most of the literature that deals with this issue is focused on polarization indices. However, the polarization indices proposed by researchers allow only for an assessment of polarization in the whole population and does not actually explain reasons for the decline of middle class fractions in certain countries. This paper proposes a class of median relative polarization (MRP) partial indices, which allows for a comprehensive assessment of income distribution changes (its polarization or convergence) in any given sub-population, in particular, in the lower, middle and upper income class groups. Moreover, a class of proposed indices is further generalized to allow for assessment of polarization in certain cohort groups while operating on panel data sources. A new Stata program has been written which operationalize the proposed polarization indices. Polarization indices for lower, middle and upper income groups in 2005-2015 period have been calculated using panel data for Poland (Social Diagnosis Panel Survey Dataset). It has been shown, that despite the lack of polarization in the whole population, there was a slight convergence of incomes in the lower and middle income groups and a significant polarization of incomes in the upper income group. That means, that on average incomes of the lowest and middle earners tend to converge toward the median, while, in the same time, the incomes of the richest part of the population are growing even higher.


Jeff Pitblado, StataCorp: Calibrating survey weights in Stata

Calibration is a method for adjusting the sampling weights, often to account for nonresponse and underrepresented groups in the population. Another benefit of calibration is smaller variance estimates compared to estimates using unadjusted weights. Stata implements two methods for calibration: the raking-ratio method, and the generalized regression method. Stata supports calibration for the estimation of totals, ratios, and regression models. Calibration is also supported by each survey variance estimation method implemented in Stata. In this presentation, I will show how to use calibration in survey data analysis using Stata.


Chuck Huber, StataCorp: Introduction to bayesian analysis using Stata

Bayesian analysis has become a popular tool for many statistical applications. Yet many data analysts have little training in the theory of Bayesian analysis and software used to fit Bayesian models. This talk will provide an intuitive introduction to the concepts of Bayesian analysis and demonstrate how to fit Bayesian models using Stata. No
prior knowledge of Bayesian analysis is necessary and specific topics will include the relationship between likelihood functions, prior, and posterior distributions, Markov Chain Monte Carlo (MCMC) using the Metropolis-Hastings algorithm, and how to use Stata's Bayes prefix to fit Bayesian models.


Sarwar Islam Mozumder, Biostatistics Research Group, Department of Health Sciences, University of Leicester: Analysing time-to-event data in the presence of competing risks within the flexible parametric modelling framework. What tools are available in Stata, which one to use and when?

In a typical survival analysis, the time to an event of interest is studied. For example, in cancer studies, researchers often wish to analyse a patient’s time to death since diagnosis. Similar applications also exist in economics and engineering. In any case, the event of interest is often not distinguished between different causes. Although this may sometimes be useful, in many situations, this will not paint the entire picture and restricts analysis. More commonly, the event may occur due to different causes, which better reflects real-world scenarios. For instance, if the event of interest is death due to cancer, it is also possible for the patient to die due to other causes. This means that the time at which the patient would have died due to cancer is never observed. These are known as competing causes of death, or competing risks.

In a competing risks analysis, interest lies in the cause-specific cumulative incidence function (CIF). This can be calculated by either:

(1) transforming on (all) cause-specific hazards, or
(2) its direct relationship with the subdistribution hazards

Obtaining cause-specific CIFs within the flexible parametric modelling framework by adopting approach (1) is possible by using the stpm2 post-estimation command, stpm2cif. Alternatively, since competing risks is a special case of a multi-state model, an equivalent model can be fitted using the multistate package. To estimate cause-specific CIFs using approach (2), stpm2 can be used by applying time-dependent censoring weights, which are calculated on restructured data using stcrprep.

The above methods involve some form of data augmentation. Instead, estimation on individual-level data may be preferred due to computational advantages. This is possible using either approach (1) or (2) with stpm2cr. In this talk, an overview of these various tools are provided followed by some discussion on which of these to use and when.


Paul C. Lambert, Biostatistics Research Group, Department of Health Sciences, University of Leicester, UK and Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden: Standardized survival curves and related measures from flexible survival parametric models

In observational studies with time-to-event outcomes, we expect that there will be confounding and would usually adjust for these confounders in a survival model. From such models an adjusted hazard ratio comparing exposed and unexposed subjects is often reported. This is fine, but hazard ratios can be difficult to interpret, are not collapsible, and there are further problems when trying to interpret hazard ratios as causal effects. Risks are much easier to interpret than rates and so quantifying the difference on the survival scale can be desirable.

In Stata, stcurve gives survival curves after fitting a model where certain covariates can be given specific values, but those not specified are given mean values. Thus it gives a prediction for an individual who happens to have the mean values of each covariate and may not reflect the average survival in the population. An alternative is to use standardization to estimate marginal effects, where the regression model is used to predict the survival curve for unexposed and exposed subjects at all combinations of other covariates included in the model. These predictions are then averaged to give marginal effects

I will describe a command, stpm2_standsurv, to obtain various standardized measures after fitting a flexible parametric survival model. As well as estimating standardized survival curves, the command can estimate the marginal hazard function, the standardized restricted mean survival time and centiles of the standardized survival curve. Contrasts can be made between any of these measures (differences, ratios). A user defined function can be given for more complex contrasts.


Michael J Crowther, Biostatistics Research Group, Department of Health Sciences, University of Leicester: MERLIN: Mixed effects regression for linear and non-linear models

MERLIN can do a lot of things. From linear regression to a Weibull survival model, from a three-level logistic model, to a multivariate joint model of multiple longitudinal outcomes, a recurrent event and survival. merlin can do things I haven’t even thought of yet. I’ll take a single dataset, and attempt to show you the full range of capabilities of merlin, and talk about some of the new features following its rise from the ashes of megenreg. There’ll even be some surprises.



To register for the meeting, please send an email to containing your name, affiliation, and contact details. 


Scientific committee

Coordinator: Yngvar Nilssen, Statistician, The Cancer Registry of Norway – Institute of Population-based Cancer Research.


Arne Risa Hole, Reader in Economics, University of Sheffield.

Morten Wang Fagerland, Biostatistician, Oslo Centre for Biostatistics and Epidemiology (OCBE), Oslo University Hospital.

Øyvind Wiborg, Associate Professor, Department of Sociology and Human Geography, University of Oslo.

Peter Hedström, Professor, Institute of Analytical Sociology, Linköping University.

Committee email:



Logistics organizers

The meeting is jointly organized by The Cancer Registry of Norway – Institute of Population-based Cancer Research and Metrika Consulting AB. Metrika is the distributor of Stata in the Nordic and Baltic regions. For further information, please visit or contact them at


Meeting coordinators:

Bjarte Aagnes, Cancer Registry of Norway.

Ronnie Babigumira, Cancer Registry of Norway.




The meeting is free, but there will be a small optional fee to cover the cost for lunch. Due to the few alternatives at the conference venue, we recommend ordering the optional lunch.


Getting to the venue

From Oslo Airport, the Airport Express Train leaves every 15 minutes and takes 25 minutes to Nationaltheatret Station, next to the metro station at the main street Karl Johans gate.

From Nationaltheatret Station the westbound metro line 3 (Kolsås) will take 14 minutes to Montebello metro station, a ten minutes easy downhill walk to the meeting venue next to Oslo Cancer Cluster Innovation Park

The meeting will be held at the new research building (Building K), Auditorium FOBY (Forskningsbygget), Radiumhospitalet, Oslo University Hospital, Ullernchausseen 70, 0310 Oslo, GPS:59.9307,10.662.


Guide to Oslo

The official travel guide to Oslo



In the city center around the main street Karl Johans gate you can find a range of hotels. Early booking is recommended. Some you might consider:

(4-star) Thon Hotel Rosenkrantz.

(4-star) Hotel Bristol.

(4-star) Hotel Christiania Teater.

(2-star) Cochs Pensjonat. Pension on the corner of Royal Palace Park with a popular combination of central location, solid accommodation and reasonable prices.


At the venue:

(3-star) Norlandia Hotell Montebello, a patient hotel with limited places during the week for non-patients.


Further information on accommodation etc.