**Feature highlights **

Stata 15 is the biggest release ever, and it has something for everyone. We will walk you through some of the new features.

**Stata in Swedish**

In Stata 15 you can chose Swedish menus and dialog boxes. This will be a particularly useful feature in introductory classes for Swedish-speaking students.

**Extended regression models (ERMs) **

ERMs is our name for regression models that can account for endogenous covariates, nonrandom treatment assignment, and Heckman-style endogenous sample selection. While Stata already had commands such as heckman and ivregress that can address these problems individually, ERMs can account for the problems in any combination. And ERMs don’t just address these problems in linear models. There are four ERM commands: (1) eregress fits linear regression models for continuous outcomes. (2) eintreg fits interval regression, including tobit, for interval-measured and censored outcomes. (3) eprobit fits probit regression models for binary outcomes. (4) eoprobit fits ordered probit regression for ordinal outcomes. You can now fit models that were previously unavailable, even if you need only one of the new features, such as interval regression with endogenous covariates, probit regression with a binary endogenous covariate, probit regression with endogenous ordinal treatment, ordered probit regression with endogenous treatment, and linear regression with tobit endogenous sample selection

**Latent class analysis **

Stata's gsem command now supports latent class analysis (LCA). Latent class models use categorical latent variables. Categorical means group. Latent means unobserved. Categorical latent variables can be used, for instance, in marketing or management to represent consumers with different buying preferences, in health to represent patients in different risk groups, and in education or psychology to represent students with different patterns of behavior. Unobserved are the buying preferences, risk groups, and behavior patterns. These unobserved categories are the latent classes, and LCA is used to identify and understand them. If we have observed variables that are indicators of unobserved groups of consumers, we could fit a latent class model and then estimate the proportion of consumers belonging to each class, estimate the probability of a positive response to observed variables in each consumer group, evaluate the goodness of fit, and predict the probability of belonging to each consumer group for individuals with a specific pattern of observed responses. Stata’s LCA features also allow you to fit latent profile models (with continuous observed outcomes), path models with latent categorical variables, and finite mixture models (FMMs).

**Bayes prefix **

The new bayes: prefix command lets you fit Bayesian regression models more easily and fit more models. You could fit a Bayesian linear regression using bayesmh. But now you can fit it by typing . bayes: regress y x1 x2 That is convenient. What you could not previously do was fit a Bayesian survival model. Now you can with bayes: streg. You can also fit multilevel models with, for instance, bayes: mixed and bayes: melogit. The new bayes: prefix can be used with 45 Stata maximum-likelihood commands. All of Stata's Bayesian features are supported by the new bayes: prefix command. You can select from many prior distributions for model parameters or use default priors. You can use the default adaptive Metropolis-Hastings sampling, or Gibbs sampling, or a combination of the two sampling methods, when available. After estimation, you can use Stata's standard Bayesian postestimation tools such as bayesgraph to check convergence, bayesstats summary to estimate functions of model parameters, bayesstats ic and bayestest model to compute Bayes’s factors and compare Bayesian models, and bayestest interval to perform interval hypotheses testing.

**Produce Word® and PDF documents embedding Stata results and graphs **

It is now just as easy to produce Word® and PDF documents in Stata as it is to produce Excel® worksheets. Everybody loved putexcel in Stata 14. They will also love putdocx and putpdf. The new commands work just like putexcel. That means you can write do-files to create entire Word or PDF reports containing the latest results, tables, and graphs. You can automate reproducible reports. The new putdocx command writes paragraphs, images, and tables to a Word file or, to be precise about it, to Office Open XML (.docx) files. Just as with putpdf, images include Stata graphs, and you can format the objects. The new putpdf command writes paragraphs, images, and tables to a PDF file. Images include Stata graphs and other images such as your organization's logo. You can format the objects, too -- bold face, italics, size, custom tables, etc.

**Markdown and dynamic documents **

Markdown is a standard markup language that provides text formatting from plain text input. It was designed to be easily converted into HTML, the language of the web. Stata now supports it. You can create HTML files from your Stata output, including graphs. You will start with a plain text file containing Markdown-formatted text and dynamic tags specifying instruction to Stata, such as run this regression or produce that graph. You then use the new dyndoc command to convert the file to HTML. Want to produce TeX documents? With the new dyntext command, you can produce any text-based document!

**Linearized dynamic stochastic general equilibrium (DSGE) models **

Stata now fits linearized DSGE models, which are time-series models used in economics and finance. These models are an alternative to traditional forecasting models. Both attempt to explain aggregate economic phenomena, but DSGE models do this on the basis of models derived from microeconomic theory. Being based on microeconomic theory means lots of equations. The key feature of these equations is that expectations of future variables affect variables today. This is one feature that distinguishes DSGEs from a vector autoregression or a state-space model. The other feature is that, being derived from theory, the parameters can usually be interpreted in terms of that theory. After fitting a DSGE model, estat policy and estat transition can report the policy and transition matrices. You can produce forecasts using Stata's existing forecast command, and you can graph impulse-response functions using Stata's existing irf command.

**Finite mixture models **

The new fmm: prefix command can be used with 17 Stata estimation commands to fit finite mixture models (FMMs). This means that with the fmm: prefix, we can now fit finite mixtures of regression models for continuous, binary, ordinal, count, categorical, and even survival-time outcomes. The most typical use of fmm: is to fit one model and allow the parameters (coefficients, location, variance, scale, etc.) to vary across unobserved subpopulations. As with LCA, we call these unobserved subpopulations classes. Say we are interested in a linear regression and we believe there are three classes across which the parameters of the model might vary. Even though we have no variable recording the class membership, we can fit . fmm 3: regress y x1 x2 Reported will be separate regression coefficients and intercepts for each class and a model for predicting membership in those classes. In the same way, we can use fmm: logit for a binary outcome or fmm: streg for a survival-time outcome. fmm: can even be used with multiple estimation commands simultaneously when the classes might follow different models. fmm: (regress y x1 x2) (poisson y x1 x2 x3) fits a linear regression in one class and a Poisson regression in another. Postestimation commands are available to (1) estimate each class's proportion in the overall population; (2) report marginal means of the outcome variables within class; and (3) predict probabilities of class membership and predicted outcomes.

**Spatial autoregressive (SAR) models **

Stata now fits SAR models. SAR may stand for either spatial autoregressive or simultaneous autoregressive. Regardless of terminology, SAR models allow spatial lags of the dependent variable, spatial lags of the independent variables, and spatial autoregressive errors. Spatial lags are the spatial analog of time-series lags. Time-series lags are values of variables from recent times. Spatial lags are values from nearby areas. SAR models are fit with the new commands spregress, spivregress (for endogenous covariates), and spxtregress (for panel data). The models are appropriate for area (also known as areal) data. Observations are called spatial units and might be countries, states, districts, counties, cities, postal codes, or city blocks. Or they might not be geographically based at all. They could be nodes of social network. Spatial models estimate direct effects -- the effects of areas on themselves -- and estimate indirect or spillover effects -- effects from nearby areas. Stata provides a suite of commands for working with spatial data and a new [SP] manual to accompany them. When spatial units are geographically based, you can download standard-format shapefiles from the web that define the map. With a single command, you can make spillover effects proportional to the inverse distance between areas or restrict them to be just from neighboring areas. And you can create your own custom definitions of proximity.

**Interval-censored parametric survival-time models **

Stata's new stintreg command joins streg for fitting parametric survival models. stintreg fits models to interval-censored data. In interval-censored data, the time of failure is not exactly known. What is known, subject by subject, is a time when the subject had not yet failed and a later time when the subject already had failed. stintreg can fit exponential, Weibull, Gompertz, lognormal, loglogistic, and generalized gamma survival-time models. Both proportional-hazards and accelerated failure-time metrics are supported. After fitting a model with stintreg, you can plot survivor, hazard, and cumulative hazard functions, predict mean and median times, obtain Cox-Snell and martingale-like residuals, and more.

**Nonlinear mixed-effects models **

Stata’s new menl command fits nonlinear mixed-effects models, also known as nonlinear multilevel models and nonlinear hierarchical models. These models can be thought of two ways. You can think of them as nonlinear models containing random effects. Or you can think of them as linear mixed-effects models in which some or all fixed and random effects enter nonlinearly. However you think of them, the overall error distribution is assumed to be Gaussian. These models are popular because some problems are not, says their science, linear in the parameters. These models are popular in population pharmacokinetics, bioassays, and studies of biological and agricultural growth processes. For example, nonlinear mixed-effects models have been used to model • drug absorption in the body, • intensity of earthquakes, and • growth of plants.

**Mixed logit models **

Stata fits discrete choice models. Stata 15 will fit them with random coefficients. Discrete choice is another way of saying multinomial or conditional logistic regression. The word "mixed" is used by statisticians whenever some coefficients are random and others are fixed. Therefore, Stata 15 fits mixed logit models. These models are fit with the new asmixlogit command. Random coefficients arise for many reasons, but there is a special reason researchers analyzing discrete choices might be interested in them. Random coefficients are a way around the IIA assumption. If you have a choice among walking, public transportation, or a car and you choose walking, the other two alternatives are irrelevant. Take one of them away, and you would still choose walking. Human beings sometimes violate this assumption, at least judged by their behavior. Mathematically speaking, IIA makes alternatives independent after conditioning on covariates. If IIA is violated, then the alternatives would be correlated. Random coefficients allow that.

**Nonparametric regression, kernel methods **

Stata now fits nonparametric regressions. In these models, you do not specify a functional form. Instead of typing . regress y x1 x2 x3 you can type . npregress kernel y x1 x2 x3 to fit a nonparameteric regression model where you make no assumptions about the model being linear in the variables or linear in the parameters. Instead of the coefficients that regress would report, npregress reports the effects of covariates, which are average derivatives of y with respect to the covariates. You can also obtain bootstrap standard errors for the effects. You might want to know the effects at average values of covariates or at specific points. You can use margins to obtain that. And you can even use marginsplot to graph slices of the function.

**Browse and import data from FRED **

The St. Louis Federal Reserve makes available over 480,000 U.S. and international economic and financial time series to registered users. Registering is free and easy to do. The service is called Federal Reserve Economic Data (FRED). FRED includes data from 85 sources, including the Federal Reserve, the Penn World Table, Eurostat, and the World Bank. In Stata 15, you can use Stata's GUI to access and download FRED data. You search or browse by category or release or source. You click to select series of interest. Select 1 or select 100. When you click on Import, Stata will download them and combine them into a single, custom dataset in memory. These same features are also available from Stata's command line interface. The command is import fred. The command is convenient when you want to automate updating the 27 different series that you are tracking for a monthly report. Stata can access FRED and it can access ALFRED. ALFRED is FRED's historical archive data.

**Threshold regression **

New estimation command threshold fits threshold regressions. These are linear regressions in which the coefficients change by estimated thresholds. You have one set of coefficients before the first threshold, another set after the first and before the second, and so on. You can specify or estimate the number of threshold points. The thresholds can be determined on the basis of time; thus threshold regressions are often fit to time-series data. Or thresholds can be determined on the basis of an exogenous variable. In that case, you would have a set of coefficients when x < the first threshold, another set after the first and before the second, and so on. The lagged value of the dependent variable is an example of an exogenous variable. In that case, you would have a set of coefficients when l.y < the first threshold, another set after the first and before the second, and so on. This last case is known as the self-exciting threshold model.

And more!