About Stata - one of the most sophisticated statistical programs

Fast. Accurate. Easy to use. Stata is a complete, integrated software package that provides all your data science needs—data manipulation, visualization, statistics, and automated reporting.

Fast, accurate, and easy to use

With both a point-and-click interface and a powerful, intuitive command syntax, Stata is fast, accurate, and easy to use. All analyses can be reproduced and documented for publication and review. Version control ensures statistical programs will continue to produce the same results no matter when you wrote them.

Broad suite of statistical features

Stata puts hundreds of statistical tools at your fingertips:

Basic statistics
summaries • cross-tabulations • correlations • z and t tests • equality-of-variance tests • tests of proportions • confidence intervals • factor variables • more

Linear models
regression • censored outcomes • endogenous regressors • bootstrap, jackknife, and robust and cluster–robust variance • instrumental variables • three-stage least squares • constraints • quantile regression • GLS • more

Panel/longitudinal data
random and fixed effects with robust standard errors • linear mixed models • random-effects probit • GEE • random- and fixed-effects Poisson • dynamic panel-data models • instrumental variables • panel unit-root tests • more

Multilevel mixed-effects models
continuous, binary, count, and survival outcomes • two-, three-, and higher-level models • generalized linear models • nonlinear models • random intercepts • random slopes • crossed random effects • BLUPs of effects and fitted values • hierarchical models • residual error structures • DDF adjustments • support for survey data • more

Binary, count, and limited outcomes
logistic, probit, tobit • Poisson and negative binomial • conditional, multinomial, nested, ordered, rank-ordered, and stereotype logistic • multinomial probit • zero-inflated and left-truncated count models • selection models • marginal effects • more

Extended regression models (ERMs)
combine endogenous covariates, sample selection, and nonrandom treatment in models for continuous, interval-censored, binary, and ordinal outcomes • more

Generalized linear models (GLMs)
ten link functions • user-defined links • seven distributions • ML and IRLS estimation • nine variance estimators • seven residuals • more

Finite mixture models (FMMs)
fmm: prefix for 17 estimators • mixtures of a single estimator • mixtures combining multiple estimators or distributions • continuous, binary, count, ordinal, categorical, censored, truncated, and survival outcomes • more

Spatial autoregressive models
spatial lags of dependent variable, independent variables, and autoregressive errors • fixed and random effects in panel data • endogenous covariates • analyze spillover effects • more

ANOVA/MANOVA
balanced and unbalanced designs • factorial, nested, and mixed designs • repeated measures • marginal means • contrasts • more

Exact statistics
exact logistic and Poisson regression • exact case–control statistics • binomial tests • Fisher’s exact test for r × c tables • more

Linearized DSGE models
specify models algebraically • solve models • estimate parameters • identification diagnostics • policy and transition matrices • IRFs • dynamic forecasts • more

Tests, predictions, and effects
Wald tests • LR tests • linear and nonlinear combinations • predictions and generalized predictions • marginal means • least-squares means • adjusted means • marginal and partial effects • forecast models • Hausman tests • more

Contrasts, pairwise comparisons, and margins
compare means, intercepts, or slopes • compare with reference category, adjacent category, grand mean, etc. • orthogonal polynomials • multiple-comparison adjustments • graph estimated means and contrasts • interaction plots • more

Simple maximum likelihood
specify likelihood using simple expressions • no programming required • survey data • standard, robust, bootstrap, and jackknife SEs • matrix estimators • more

Programmable maximum likelihood
user-specified functions • NR, DFP, BFGS, BHHH • OIM, OPG, robust, bootstrap, and jackknife SEs • Wald tests • survey data • numeric or analytic derivatives • more

Resampling and simulation methods
bootstrap • jackknife • Monte Carlo simulation • permutation tests • more

Time series
ARIMA • ARFIMA • ARCH/GARCH • VAR • VECM • multivariate GARCH • unobserved-components model • dynamic factors • state-space models • Markov-switching models • business calendars • tests for structural breaks • threshold regression • forecasts • impulse–response functions • unit-root tests • filters and smoothers • rolling and recursive estimation • more

Survival analysis
Kaplan–Meier and Nelson–Aalen estimators, • Cox regression (frailty) • parametric models (frailty, random effects) • competing risks • hazards • time-varying covariates • left-, right-, and interval-censoring • Weibull, exponential, and Gompertz models • more

Bayesian analysis
thousands of built-in models • univariate and multivariate models • linear and nonlinear models • multilevel models • continuous, binary, ordinal, and count outcomes • bayes: prefix for 45 estimation commands • continuous univariate, multivariate, and discrete priors • add your own models • convergence diagnostics • posterior summaries • hypothesis testing • model comparison • more

Power and sample size
power • sample size • effect size • minimum detectable effect • means • proportions • variances • correlations • ANOVA • regression • cluster randomized designs • case–control studies • cohort studies • contingency tables • survival analysis • balanced or unbalanced designs • results in tables or graphs • more

Treatment effects/Causal inference
inverse probability weight (IPW) • doubly robust methods • propensity-score matching • regression adjustment • covariate matching • multilevel treatments • endogenous treatments • average treatment effects (ATEs) • ATEs on the treated (ATETs) • potential-outcome means (POMs) • continuous, binary, count, fractional, and survival outcomes • more

SEM (structural equation modeling)
graphical path diagram builder • standardized and unstandardized estimates • modification indices • direct and indirect effects • continuous, binary, count, ordinal, and survival outcomes • multilevel models • random slopes and intercepts • factor scores, empirical Bayes, and other predictions • groups and tests of invariance • goodness of fit • handles MAR data by FIML • correlated data • survey data • more

Latent class analysis
binary, ordinal, continuous, count, categorical, fractional, and survival items • add covariates to model class membership • combine with SEM path models • expected class proportions • goodness of fit • predictions of class membership • more

Multiple imputation
nine univariate imputation methods • multivariate normal imputation • chained equations • explore pattern of missingness • manage imputed datasets • fit model and pool results • transform parameters • joint tests of parameter estimates • predictions • more

Survey methods
multistage designs • bootstrap, BRR, jackknife, linearized, and SDR variance estimation • poststratification • DEFF • predictive margins • means, proportions, ratios, totals • summary tables • almost all estimators supported • more

Cluster analysis
hierarchical clustering • kmeans and kmedian nonhierarchical clustering • dendrograms • stopping rules • user-extensible analyses • more

IRT (item response theory)
binary (1PL, 2PL, 3PL), ordinal, and categorical response models • item characteristic curves • test characteristic curves • item information functions • test information functions • differential item functioning (DIF) • more

Multivariate methods
factor analysis • principal components • discriminant analysis • rotation • multidimensional scaling • Procrustean analysis • correspondence analysis • biplots • dendrograms • user-extensible analyses • more

Data management
data transformations • match-merge • import/export data • ODBC • SQL • Unicode • by-group processing • append files • sort • row–column transposition • labeling • save results • more

Graphics
lines • bars • areas • ranges • contours • confidence intervals • interaction plots • survival plots • publication quality • customize anything • Graph Editor • more

Graphical user interface
menus and dialogs for all features • Data Editor • Variables Manager • Graph Editor • Project Manager • Do-file Editor • Clipboard Preview Tool • multiple preference sets • more

Basic statistics
summaries • cross-tabulations • correlations • z and t tests • equality-of-variance tests • tests of proportions • confidence intervals • factor variables • more

Nonparametric methods
nonparametric regression • Wilcoxon–Mann–Whitney, Wilcoxon signed ranks, and Kruskal–Wallis tests • Spearman and Kendall correlations • Kolmogorov–Smirnov tests • exact binomial CIs • survival data • ROC analysis • smoothing • bootstrapping • more

Epidemiology
standardization of rates • case–control • cohort • matched case–control • Mantel–Haenszel • pharmacokinetics • ROC analysis • ICD-10 • more

GMM and nonlinear regression
generalized method of moments (GMM) • nonlinear regression • more

Other statistical methods
kappa measure of interrater agreement • Cronbach's alpha • stepwise regression • tests of normality • more

Functions
statistical • random-number • mathematical • string • date and time • more

Internet capabilities
ability to install new commands • web updating • web file sharing • latest Stata news • more

Community-contributed commands
community-written commands for meta-analysis, data management, survival, econometrics, more

Programming features
adding new commands • command scripting • object-oriented programming • menu and dialog-box programming • dynamic documents • Markdown • Project Manager • plugins • more

Matrix programming—Mata
interactive sessions • large-scale development projects • optimization • matrix inversions • decompositions • eigenvalues and eigenvectors • LAPACK engine • real and complex numbers • string matrices • interface to Stata datasets and matrices • numerical derivatives • object-oriented programming • more

Documentation
36 manuals • 19,000+ pages • seamless navigation • thousands of worked examples • quick starts • methods and formulas • references • more

Complete data-management facilities

You can combine and reshape datasets, manage variables, and collect statistics across groups or replicates. You can work with byte, integer, long, float, double, and string variables (including BLOBs and strings up to 2 billion characters). Stata also has advanced tools for managing specialized data such as survival/duration data, time-series data, panel/longitudinal data, categorical data, multiple-imputation data, and survey data.

Publication-quality graphics

Stata makes it easy to generate publication-quality, distinctly styled graphs.

You can write scripts to produce hundreds or thousands of graphs in a reproducible manner and export them to EPS or TIF for publication, to PNG or SVG for the web, or to PDF for viewing. With the integrated Graph Editor you click to change anything about your graph or to add titles, notes, lines, arrows, and text.

Extensible

Stata is so programmable that developers and users add new features every day to respond to the growing demands of today's researchers.

With Stata's Internet capabilities, new features and official updates can be installed over the Internet with a single click.

Matrix programming with Mata

Mata is a full-blown programming language that compiles what you type into bytecode, optimizes it, and executes it fast.

Though you don't need to program to use Stata, it is comforting to know that a fast and complete matrix programming language is an integral part of Stata. Mata is both an interactive environment for manipulating matrices and a full development environment that can produce compiled and optimized code. It includes special features for processing panel data, performs operations on real or complex matrices, provides complete support for object-oriented programming, and is fully integrated with every aspect of Stata.

Cross-platform compatible

Stata will run on Windows, Mac, and Linux/Unix computers; however, our licenses are not platform specific.

That means if you have a Mac laptop and a Windows desktop, you don't need two separate licenses to run Stata. You can install your Stata license on any of the supported platforms. Stata datasets, programs, and other data can be shared across platforms without translation. You can also quickly and easily import datasets from other statistical packages, spreadsheets, and databases.

Trusted

We don't just program statistical methods, we validate them.

The results you see from a Stata estimator rest on comparisons with other estimators, Monte Carlo simulations of consistency and coverage, and extensive testing by our statisticians. Every Stata we ship has passed a certification suite that includes 4.1 million lines of testing code that produces 6 million lines of output. We certify every number and piece of text from those 6 million lines of output.

Stata/MP, Stata/SE or Stata/BE?

Which Stata is right for me? Find the answer here.