Select Page

## The Gaussian Correlation Inequality in One Picture

The Gaussian Correlation Inequality in One Picture

## 6 Types of Programmers in One Picture

6 Types of Programmers in One Picture

Which type are you? Can you recognize the programming language used in this illustration? Click on the picture to zoom in.

Originally posted here

DSC Resources

Popular Articles

6 Types of Programmers in One Picture

## Free Book: Probability and Statistics Cookbook

Free Book: Probability and Statistics Cookbook

The format is very similar to a BIG cheat sheet. This cookbook integrates a variety of topics in probability theory and statistics. It is based on literature and in-class material from courses of the statistics department at the University of California in Berkeley but also influenced by other sources . If you find errors or have suggestions for further topics, I would appreciate if you send me an email.

Author: Matthias Vallentin

Contents

1 Distribution Overview 3

• 1.1 Discrete Distributions . . . . . . . . . . 3
• 1.2 Continuous Distributions . . . . . . . . 4

2 Probability Theory 6

3 Random Variables 6

• 3.1 Transformations . . . . . . . . . . . . . 7

4 Expectation 7

5 Variance 7

6 Inequalities 8

7 Distribution Relationships 8

8 Probability and Moment Generating Functions 9

9 Multivariate Distributions 9

• 9.1 Standard Bivariate Normal . . . . . . . 9
• 9.2 Bivariate Normal . . . . . . . . . . . . . 9
• 9.3 Multivariate Normal . . . . . . . . . . . 9

10 Convergence 9

• 10.1 Law of Large Numbers (LLN) . . . . . . 10
• 10.2 Central Limit Theorem (CLT) . . . . . 10

11 Statistical Inference 10

• 11.1 Point Estimation . . . . . . . . . . . . . 10
• 11.2 Normal-Based Confidence Interval . . . 11
• 11.3 Empirical distribution . . . . . . . . . . 11
• 11.4 Statistical Functionals . . . . . . . . . . 11

12 Parametric Inference 11

• 12.1 Method of Moments . . . . . . . . . . . 11
• 12.2 Maximum Likelihood . . . . . . . . . . . 12
• 12.2.1 Delta Method . . . . . . . . . . . 12
• 12.3 Multiparameter Models . . . . . . . . . 12
• 12.3.1 Multiparameter delta method . . 13
• 12.4 Parametric Bootstrap . . . . . . . . . . 13

13 Hypothesis Testing 13

14 Bayesian Inference 14

• 14.1 Credible Intervals . . . . . . . . . . . . . 14
• 14.2 Function of parameters . . . . . . . . . . 14
• 14.3 Priors . . . . . . . . . . . . . . . . . . . 15
• 14.3.1 Conjugate Priors . . . . . . . . . 15
• 14.4 Bayesian Testing . . . . . . . . . . . . . 15

15 Exponential Family 16

16 Sampling Methods 16

• 16.1 The Bootstrap . . . . . . . . . . . . . . 16
• 16.1.1 Bootstrap Confidence Intervals . 16
• 16.2 Rejection Sampling . . . . . . . . . . . . 17
• 16.3 Importance Sampling . . . . . . . . . . . 17

17 Decision Theory 17

• 17.1 Risk . . . . . . . . . . . . . . . . . . . . 17
• 17.2 Admissibility . . . . . . . . . . . . . . . 17
• 17.3 Bayes Rule . . . . . . . . . . . . . . . . 18
• 17.4 Minimax Rules . . . . . . . . . . . . . . 18

18 Linear Regression 18

• 18.1 Simple Linear Regression . . . . . . . . 18
• 18.2 Prediction . . . . . . . . . . . . . . . . . 19
• 18.3 Multiple Regression . . . . . . . . . . . 19
• 18.4 Model Selection . . . . . . . . . . . . . . 19

19 Non-parametric Function Estimation 20

• 19.1 Density Estimation . . . . . . . . . . . . 20
• 19.1.1 Histograms . . . . . . . . . . . . 20
• 19.1.2 Kernel Density Estimator (KDE) 21
• 19.2 Non-parametric Regression . . . . . . . 21
• 19.3 Smoothing Using Orthogonal Functions 21

20 Stochastic Processes 22

• 20.1 Markov Chains . . . . . . . . . . . . . . 22
• 20.2 Poisson Processes . . . . . . . . . . . . . 22

21 Time Series 23

• 21.1 Stationary Time Series . . . . . . . . . . 23
• 21.2 Estimation of Correlation . . . . . . . . 24
• 21.3 Non-Stationary Time Series . . . . . . . 24
• 21.3.1 Detrending . . . . . . . . . . . . 24
• 21.4 ARIMA models . . . . . . . . . . . . . . 24
• 21.4.1 Causality and Invertibility . . . . 25
• 21.5 Spectral Analysis . . . . . . . . . . . . . 25

22 Math 26

• 22.1 Gamma Function . . . . . . . . . . . . . 26
• 22.2 Beta Function . . . . . . . . . . . . . . . 26
• 22.3 Series . . . . . . . . . . . . . . . . . . . 27
• 22.4 Combinatorics . . . . . . . . . . . . . . 27

DSC Resources

Popular Articles

Free Book: Probability and Statistics Cookbook

## Book: R for Data Science

Book: R for Data Science

Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.

Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You’ll get a complete, big-picture understanding of the data science cycle, along with basic tools you need to manage the details. Each section of the book is paired with exercises to help you practice what you’ve learned along the way.

You’ll learn how to:

• Wrangle—transform your datasets into a form convenient for analysis
• Program—learn powerful R tools for solving data problems with greater clarity and ease
• Explore—examine your data, generate hypotheses, and quickly test them
• Model—provide a low-dimensional summary that captures true “signals” in your dataset
• Communicate—learn R Markdown for integrating prose, code, and results

### About the Author

Hadley Wickham is an Assistant Professor and the Dobelman FamilyJunior Chair in Statistics at Rice University. He is an active memberof the R community, has written and contributed to over 30 R packages, and won the John Chambers Award for Statistical Computing for his work developing tools for data reshaping and visualization. His research focuses on how to make data analysis better, faster and easier, with a particular emphasis on the use of visualization to better understand data and models.

Garrett Grolemund is a statistician, teacher and R developer who currently works for RStudio. He sees data analysis as a largely untapped fountain of value for both industry and science. Garrett received his Ph.D at Rice University in Hadley Wickham’s lab, where his research traced the origins of data analysis as a cognitive process and identified how attentional and epistemological concerns guide every data analysis.

Garrett is passionate about helping people avoid the frustration and unnecessary learning he went through while mastering data analysis. Even before he finished his dissertation, he started teaching corporate training in R and data analysis for Revolutions Analytics. He’s taught at Google, eBay, Axciom and many other companies, and is currently developing a training curriculum for RStudio that will make useful know-how even more accessible.

Outside of teaching, Garrett spends time doing clinical trials research, legal research, and financial analysis. He also develops R software, he’s co-authored the lubridate R package–which provides methods to parse, manipulate, and do arithmetic with date-times–and wrote the ggsubplot package, which extends the ggplot2 package.

DSC Resources

Popular Articles

## New Book: Data Science: Mindset, Methodologies, and Misconceptions

New Book: Data Science: Mindset, Methodologies, and Misconceptions

From the author of the bestsellers, Data Scientist and Julia for Data Science, this book covers four foundational areas of data science. The first area is the data science pipeline including methodologies and the data scientist’s toolbox. The second are essential practices needed in understanding the data including questions and hypotheses. The third are pitfalls to avoid in the data science process. The fourth is an awareness of future trends and how modern technologies like Artificial Intelligence (AI) fit into the data science framework.

The following chapters cover these four foundational areas:

• Chapter 1 – What Is Data Science?
• Chapter 2 – The Data Science Pipeline
• Chapter 3 – Data Science Methodologies
• Chapter 4 – The Data Scientist’s Toolbox
• Chapter 5 – Questions to Ask and the Hypotheses They Are Based On
• Chapter 6 – Data Science Experiments and Evaluation of Their Results
• Chapter 7 – Sensitivity Analysis of Experiment Conclusions
• Chapter 8 – Programming Bugs
• Chapter 9 – Mistakes Through the Data Science Process
• Chapter 10 – Dealing with Bugs and Mistakes Effectively and Efficiently
• Chapter 11 – The Role of Heuristics in Data Science
• Chapter 12 – The Role of AI in Data Science
• Chapter 13 – Data Science Ethics
• Chapter 14 – Future Trends and How to Remain Relevant

Targeted towards data science learners of all levels, this book aims to help the reader go beyond data science techniques and obtain a more holistic and deeper understanding of what data science entails. With a focus on the problems data science tries to solve, this book challenges the reader to become a self-sufficient player in the field.

### About the Author

Dr. Zacharias Voulgaris was born in Athens, Greece. He studied Production Engineering and Management at the Technical University of Crete, shifted to Computer Science through a Masters in Information Systems & Technology (City University of London), and then to Data Science through a PhD on Machine Learning (University of London). He has worked at Georgia Tech as a Research Fellow, at an e-marketing startup in Cyprus as an SEO manager, and as a Data Scientist in both Elavon (GA) and G2 (WA). He also was a Program Manager at Microsoft, on a data analytics pipeline for Bing. Currently he is the CTO of a data science startup in London, UK. Zacharias has authored two other books on data science: Data Scientist – The Definitive Guide to Becoming a Data Scientist, and Julia for Data Science.

DSC Resources

Popular Articles

## Quick Guide to R and Statistical Programming

Quick Guide to R and Statistical Programming

Guest blog by Rob Kabacoff. Rob is Professor of Quantitative Analytics at Wesleyan University.

R is an elegant and comprehensive statistical and graphical programming language. Unfortunately, it can also have a steep learning curve. I created this website for both current R users, and experienced users of other statistical packages (e.g., SASSPSSStata) who would like to transition to R. My goal is to help you quickly access this language in your work.

I assume that you are already familiar with the statistical methods covered and instead provide you with a roadmap and the code necessary to get started quickly, and orient yourself for future learning. I designed this web site to be an easily accessible reference. Look at the sitemap to get an overview.

If you prefer an online interactive environment to learn R, this free R tutorial by DataCamp is a great way to get started. If you’re already somewhat advanced and interested in machine learning, try this Kaggle tutorial on who survived the Titanic.

## What’s New

A link to the new resource The R Graph Gallery has been added.
A number of new sections have been added. These include:

• A new section on time series analysis.
• A new section on ggplot2 graphics.
• For old friends, please note that I’ve renamed the section on trellis graphs to lattice graphs. Since both the lattice and ggplot2 packages can be used to create trellis graphs, changing the name makes the distinction between these two sections clearer.

## Why Use R?

If you currently use another statistical package, why learn R?

1. It’s free! If you are a teacher or a student, the benefits are obvious.
2. It runs on a variety of platforms including Windows, Unix and MacOS.
3. It provides an unparalleled platform for programming new statistical methods in an easy and straightforward manner.
4. It contains advanced statistical routines not yet available in other packages.
5. It has state-of-the-art graphics capabilities.

## Obtaining R

R is available for Linux, MacOS, and Windows (95 or later) platforms. Software can be downloaded from one of the Comprehensive R Archive Network (CRAN) mirror sites.

Originally posted here

DSC Resources

Popular Articles

Quick Guide to R and Statistical Programming