# Machine-Learning-in-R **Repository Path**: econometric/Machine-Learning-in-R ## Basic Information - **Project Name**: Machine-Learning-in-R - **Description**: Workshop (6 hours): preprocessing, cross-validation, lasso, decision trees, random forest, xgboost, superlearner ensembles - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 2 - **Created**: 2021-12-23 - **Last Updated**: 2023-09-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # See the Fall 2020 tidymodels update! https://github.com/dlab-berkeley/Machine-Learning-with-tidymodels # Machine Learning in R This is the repository for D-Lab’s Introduction to Machine Learning in R workshop. [View the associated slides here](https://dlab-berkeley.github.io/Machine-Learning-in-R/slides.html#1). RStudio Binder: [![Binder](http://mybinder.org/badge.svg)](http://beta.mybinder.org/v2/gh/dlab-berkeley/Machine-Learning-in-R/master?urlpath=rstudio) ## Content outline - Background on machine learning - Classification vs regression - Performance metrics - Data preprocessing - Missing data - Train/test splits - Algorithm walkthroughs - Lasso - Decision trees - Random forests - Gradient boosted machines - SuperLearner ensembling - Principal component analysis - Hierarchical agglomerative clustering - Challenge questions ## Getting started Please follow the notes in [participant-instructions.md](participant-instructions.md). #### HAVE FUN! :^) The seven algorithm R Markdown files (lasso, decision tree, random forest, xgboost, SuperLearner, PCA, and clustering) are designed to function in a standalone manner. After installing and librarying the packages in 01-overview.Rmd, run all the code in 02-preprocessing.Rmd to preprocess the data. Then, open any one of the seven algorithm R Markdown files and "Run All" code to see the results and visualizations! ## Assumed participant background We assume that participants have familiarity with: * Basic R syntax * Statistical concepts such as mean and standard deviation ## Technology requirements Please bring a laptop with the following: * [R version](https://cloud.r-project.org/) 3.5 or greater * [RStudio integrated development environment (IDE)](https://www.rstudio.com/products/rstudio/download/#download) is highly recommended but not required. ## Resources Browse resources listed on the [D-Lab Machine Learning Working Group repository](https://github.com/dlab-berkeley/MachineLearningWG). Scroll down to see code examples in R and Python, books, courses at UC Berkeley, online classes, and other resources and groups to help you along your machine learning journey! ## Slideshow The slides were made using [xaringan](https://github.com/yihui/xaringan), which is a wrapper for [remark.js](https://remarkjs.com/#1). Check out Chapter 7 if you are interested in making your own! The theme borrows from Brad Boehmke's presentation on [Decision Trees, Bagging, and Random Forests - with an example implementation in R](https://bradleyboehmke.github.io/random-forest-training/slides-source.html#1).