General Overview

This workshop will provide an overview of key concepts for creating an effective data driven project, and will introduce tools and techniques for data wrangling, visualisation and reproducible reporting using R, a public domain language for data analysis. The R language provides a rich and flexible environment for working with data, especially data to be used for statistical modelling or graphics.

The R system has an extensive library of packages that offer state-of-the-art-abilities. Many of the analyses that they offer are not even available in any of the standard packages. R enables you to escape from the restrictive environments and sterile analyses offered by commonly used statistical software packages. It enables easy experimentation and exploration, which improves data analysis. Sharing your discovery of data analysis knowledge is necessary in making it useful. R is a tool that enables reporting modern data analyses in a reproducible manner. It makes analysis more useful to others because the data and code that actually conducted the analysis can be made available and easily shared. As such R has become the lingua franca of quantitative research. Accordingly, this workshop will emphasize packages that will help you do data analysis, visualisation and communication with a wider audience.

The workshop will start by introducing the fundamental concepts of R: basic use of R console through RStudio IDE, inputting and importing data, record keeping and general good practice of R project workflow. It will then progress to basic data pre-processing and wrangling techniques. Often complex information contained in data can be more effectively communicated by using visualisation. This is why the focus is directed on building appropriate visualisation of a given data analysis problem, and the reporting of intelligent reproducible data analysis using RMarkdown. Using real data and real examples we will introduce you to fundamental statistical concepts to set the stage for key statistical modelling techniques that you might wish to pursue learning later.

Version control has become an essential tool for keeping track when working on DS projects, as well as collaborating. RStudio supports working with Git, an open source distributed version control system, which is easy to use when combined with GitHub, a web-based Git repository hosting service. Throughout the course you will be introduced to GitHub and you’ll become acquainted with good practice when incorporating the use of Git into your R project workflow.

Objectives:

  • To become familiar with R/RStudio’s data handling facilities that will expand the range of Data Science problems that can be effectively analysed
  • To provide a framework for developing analytical skills for handling a range of data sets
  • To introduce the basic principles behind effective data visualisation
  • To enable intelligent reproducible reporting of the results of data analysis to target audiences with diverse levels of numerate understanding
  • To provide a sufficient base to enable the pursuance of fundamental statistical analysis

Who can enrol

If you would like to learn to explore, visualise, and analyse data in a reproducible and shareable manner then this workshop is for you! 😀

This course is for people from varying backgrounds and diverse profiles. It is designed for people who recognise the paramount importance of data and its application through the use of free opensource technologies such as R and GitHub.

This course will benefit anyone who has the curiosity and desire to enter the realm of data science through the use of R. We will make sense of the world of data and learn effective and attractive ways to visually analyse and communicate related information using R. With the knowledge gained on this course, you will be ready to undertake your very own data analysis in a reproducible manner through the use of R.


© 2020 Tatjana Kecojevic