STATISTICAL LABORATORY
Academic Year 2024/2025 - Docente: ANTONIO PUNZORisultati di apprendimento attesi
1. Knowledge and understanding. The objectives aim to introduce the knowledge of the R language for statistical data analysis with a special focus on descriptive statistics, probability distributions, statistical inference, and statistical modeling.
2. Applying knowledge and understanding. After finishing the course, the student will have the capability to use the R language for: i) providing basic statistical analyses of data; ii) simulating data according to given probability distributions; and iii) applying main methods of statistical inference.
3. Making judgements. Upon finishing the course, the student will have the ability to extract insights from data by utilizing statistical analyses in R.
4. Communication skills. After finishing the course, the student will have the ability to effectively communicate the outcomes of statistical analyses implemented via the R statistical software.
5. Learning skills. Upon finishing the course, students will acquire the skills to utilize the statistical software R for conducting basic data analyses and statistical modeling.
Course Structure
The course will include lectures delivered through slides and R code demonstrations. We will use the freely available R statistical software extensively. Practical activities and data analysis sessions in R will also be organized.
Required Prerequisites
Attendance of Lessons
Detailed Course Content
Getting started with R and RStudio
Descriptive Statistics. Simple Statistical Distributions. Data tables. Frequency distributions. Main summary statistics: arithmetic mean, geometric mean, harmonic mean. Median and percentiles. Variance, standard deviation, relative variation. Graphical representations. Multiple Statistical Distributions. Contingency Tables. Joint distributions, marginal and conditional distributions. Covariance and correlation.
Probability. Random number generation and data modeling according to different probability distributions: uniform, binomial, Poisson, and Gaussian.
Statistical inference. Sample distributions: Student-t, chi-square. Confidence estimation. Confidence level. Confidence bounds for means, variances, and proportions. Hypothesis testing. Null hypotheses and alternative hypotheses. P-values. Statistical tests for means, variances, proportions, comparison of means, and comparison of proportions.
Statistical models. The simple regression model. Goodness of fit. Residual analysis. Inference on the parameters of a linear regression model.
Textbook Information
·
·
·
Course Planning
Subjects | Text References | |
---|---|---|
1 | Syllabus: illustration and explanation. Getting started with R and RStudio. Why use R? How to install R. | Slide |
2 | RStudio. RStudio orientation. Console. R script. Source. Run button. Environment/History/Connections. Files/Plots/Packages/Help/Viewer. | Slide |
3 | R packages (CRAN packages and GitHub packages). Using packages. | Slide |
4 | Projects in RStudio. Directory structure. File names. R style guide. Citing R. | Slide |
5 | Some R basics. Objects in R. Errors and warnings. Naming objects. | Slide |
6 | The use of the directory. Getting help. Set the number of digits to display. | Slide |
7 | Operators in R. Using functions in R. Assignment of objects. | Slide |
8 | Vectors. Different ways to create vectors. Extracting elements from a vector. Replacing elements. Search for elements within a vector | Slide |
9 | Workspace content and manipulation. Saving in R. Data types. Missing data. | Slide |
10 | Matrices and algebraic operations. Reserved words. Arrays. | Slide |
11 | Lists. Data frames. Attach and detach. | Slide |
12 | Frequency distributions. Contingency tables. Box-plot. | Slide |
13 | Graphical representations. Empirical distribution function. Basic statistics. Concentration index and Lorenz curve. | Slide |
14 | Sampling and ad hoc generators of discrete random variables. Q-Q plot. | Slide |
15 | Univariate constrained optimization with optimize(). Multivariate unconstrained optimization with optim(). Maximum likelihood estimation method. | Slide |
16 | Chi-square test of goodness of fit. Kolmogorov-Smirnov test (goodness-of-fit and distributional comparison between 2 samples). Chi-square test of independence. | Slide |
17 | Univariate and multivariate linear regression model. Nonparametric regression. Changes in scale. | Slide |
18 | Generalized linear models. Logistic regression. Poisson regression. Regression models with qualitative covariates. 1-way ANOVA. | Slide |
Learning Assessment
Learning Assessment Procedures
The exam aims to evaluate the achievement of the learning objectives. It is carried out through a practical test concerning the writing of a convenient R code to solve a statistical problem in R and interpret the output produced by well-known functions in R.
Examples of frequently asked questions and / or exercises
· Writing an R code to find the maximum likelihood estimates of the parameters of the log-normal distribution
· Writing an R code to find the maximum likelihood estimates of the parameters of a linear model with covariates both on the mean and on the variance of the normal distribution for the error
·