Skip to main content

Data Science Bootcamp

Discover software and techniques to support your data including Python, R, Stata, machine learning, and qualitative analysis! 

  • Image of using data in the mind

August 15-26

Are you starting (or thinking about starting) research or coursework involving data science? Get a head start before the semester by learning fundamental data science skills at the CCSS Data Science Bootcamp! Over the course of two weeks (August 15-26), the bootcamp will cover a number of fundamental skills that are sure to be of use in classes and research. The first week will introduce you to software commonly used for data analysis in the social sciences: Python, R, and Stata. The second week will cover techniques for machine learning and qualitative analysis. For more details, see the per-topic descriptions below. Registration will begin in July 2022.

Lunch will be provided each day of the BootCamp!

Co-sponsored by the Center for Data Science for Enterprise and Society

Schedule - List View

Week 1: Quantitative Software Tools - Python, R, and Stata

August 15-19 with lunch provided each day!
 

8/15-16 | Python Series

Register for Python Series

Mon. Aug. 15

  • 10:00-11:00 am | Python Basics Part 1 
  • 11:15-12:15 pm | Python Basics Part 2 
  • 1:15-2:00 pm | Python Basics Part 3 
  • 2:15-3:00 pm | Working with Python Packages

Tues. Aug. 16

  • 10:00-11:00 am | Pandas Part 1 
  • 11:15am-12:15pm | Pandas Part 2 
  • 1:15pm-3pm | Advanced Topics: Common Pitfalls and Dealing with Errors

8/17-18 | R Series

Register for R Series

Wed. Aug. 17 

  • 1:00-2:00 pm | Introduction to R 
  • 2:15-3:15 pm | Working with R Packages 

Thurs. Aug. 18

  • 10:00-11:00 am | Data Cleaning in Tidyverse 
  • 11:15am-12:15pm | Data Visualizations
  • 1:15-3:00 pm | RMarkdown 

8/19 | Stata Series

Register for Stata Series

Aug. 19

  • 10:00-11:00 am | Crash Course 
  • 11:15-12:15 pm | Manipulating and Cleaning Data 
  • 1:15-2:30 pm | Results and Reporting 

Week 2: Qualitative Analysis & Machine Learning

August 22-26 with lunch provided each day!
 

8/22-8/23 | Machine Learning Series

Register for Machine Learning Series

Mon. Aug. 22

  • 10:00-11:00 am | Introduction to Machine Learning 
  • 11:15 am-12:15 pm | Machine Learning Understanding and Visualizing Data 
  • 1:15-2:15 pm | Machine Learning - Supervised Learning 

Tues. Aug. 23

  • 10:00-11:00 am | Machine Learning - Unsupervised Learning 
  • 11:15 am-12:15 pm | Natural Language Processing (NLP)

8/24-8/26 | Qualitative Analysis Series

Register for Qualitative Analysis Series

Wed. Aug. 24

  • 10:00 am-12:15 pm | Introduction to Atlas.ti 
  • 1:15-3:15 pm | Work With Your Own Data Using Atlas.ti 

Thurs. Aug. 25

  • 10:00 am-12:15 pm | Introduction to NVivo 
  • 1:15-3:15 pm | Work With Your Own Data Using NVivo 

Fri. Aug. 26

  • 10:00 am-12:15 pm | Introduction to MaxQDA 
  • 1:15-3:15 pm | Work With Your Own Data Using MaxQDA

Schedule - Calendar View 

Series Descriptions

Click on the series below to expand the descriptions. 

  • Python Series

    Python Series

    Python is a programming language that is increasingly popular in data science, due to its beginner-friendly syntax and rich ecosystem of helpful premade packages. This workshop series will help you get started with using this powerful and helpful language to manage and analyze your data. Specifically, we will cover the following topics:

    Python Basics: The first 3 parts of the series offers a beginner-friendly introduction to using Python, covering basic syntax and language features.

    Working with Python Packages: The third part of the series introduces the rich ecosystem of premade tools for data management and analysis available in Python and walks through the process of deciding which tools are right for your specific research needs.

    Pandas: The fourth and fifth parts expand on the Python Packages part by going more in-depth into a specific package, Pandas, which offers very powerful options for managing and storing data.

    Advanced Topics - Common Pitfalls and Dealing with Errors: The last part of our series will cover more advanced Python skills such as decorators, generators, and context managers

    Instructors: Jonathan Chang, Samantha De Leon Sautu

  • R Series

    R Series

    The R workshop series is designed for new and beginning R users who are hoping to become better acquainted with the basics of this programming language. The first part of the series focuses on a general introduction to R and RStudio. The second part will explore the R package ecosystem and give an overview of how to access, understand, and use packages in R. The third part introduces RMarkdown, a versatile tool for creating reports in RStudio. The fourth part of the series focuses on R tidyverse, which includes the packages most popular for data cleaning in R. We will cover basic essential functions and tools for cleaning data in R. The final part of the series will cover data visualization in R to communicate insights from data in various plots and graphs.

    Instructors: Sabrina Porcelli, Jacob Grippin

    Courses:

    • Introduction to R: This first part of the series will cover the basics of using R and RStudio including basic syntax and features of this program. Topics:
      • Setup an R environment with R and RStudio installed
      • Basic R syntax and language features
      • Load a pre-existing dataset into R
    • Working with R packages: Given the numerous R packages, we will use this second part of the series to explore some popular packages for data analysis. Topics:
      • Understand how to search for R packages in Cran
      • Understand how to interpret the help documentation for R packages
      • Experiment with multiple packages
    • RMarkdown: This part of the series will provide an introduction to RMarkdown, which allows you to have your code, output, text, formatting and personal notes all in one platform. An RMarkdown document is written using easy to write markdown text, and contains chunks of embedded R code to create output. Topics:
      • R code chunks for output
      • Inline code for output mid-sentence
      • Document formatting
      • Output formatting
    • Data Cleaning in Tidyverse: A large part of the data analysis workflow is data cleaning, and the R packages in Tidyverse are the most popular for data cleaning. This workshop will cover the following packages of Tidyverse: dplyr, tidyr, readr, purr, tibble, stringr, and forcats. Using these packages, we will explore data cleaning functions such as: changing variable format, creating new variables, summarizing functions, joining operations, and basic regressions. Topics:
      • Differentiate between Tidyverse packages to understand which packages to use for which data cleaning functions
      • Apply multiple Tidyverse packages to an R dataset
      • Design a data cleaning workflow that culminates in a regression
    • Data Visualization: Data visualization is a crucial component of data analysis. This workshop will cover some of the basics of visualizing data in R. We will cover R's essential plot functions, ggplot2, plotly, and RShiny. Topics:
      • Design visualizations to communicate data insights
      • Understand the differences between the different types of plot functions
      • Use dashboard packages to display multiple visualizations

  • Stata Series

    Stata Series

    The Stata workshop series is designed for new and beginning Stata users. Stata is a fast, accurate, and easy-to-use statistical software application. Stata provides you with all your data science needs such as data manipulation, visualizations, statistics and reporting. This workshop series will give you access to Stata and help you get started by providing you with the needed fundamental Stata programming skills along with resources you can use to further develop your skills.

    Instructors: Jacob Grippin

    Courses:

    • Crash Course: In the first session you will learn to navigate the Stata windows environment and menu. Learners will know how to open data inside of Stata, generate output and understand the results. Topics:
      • Analyzing/understanding Stata output
      • Understanding Stata menu and windows environment
      • Basic Stata coding
      • Summarize data using Stata
    • Manipulating and Cleaning Data: In the second session you will learn to enhance your ability to organize your data and prepare it for analysis. Topics:
      • Manipulating/reorganizing data
      • Merging and combining data
      • Removing data and targeting specific groups
    • Results and Reporting: In the third session you will learn to write Stata programs for generating output. Learners will discover how to generate common statistical measures. Learners will also learn to organize the results you create into neat and effective reports.
      • Batch coding with do files
      • Creating effective reports within Stata
      • Common Social Sciences statistical measures and how to analy

  • Machine Learning Series

    Machine Learning Series

    This series introduces basic concepts of machine learning (ML) and shows examples of how ML can be applied within social science research. This series offers an overview of supervised learning, unsupervised learning, and natural language processing methods through hands-on workshops in Python. It is best suited for social scientists with working Python proficiency and quantitative research experience. No prior familiarity with machine learning is required.

    Instructors: Remy Stewart, Angel Hsing-Chi Hwang

    Courses:

    • Introduction to Machine Learning: you will gain a broad overview of the fundamental methodological approaches within ML as well as learn about applied use of ML methods within social science research.
    • Understanding and Visualizing your Data: you will learn the common workflow of data preprocessing for ML within the broader context of exploratory data analysis.
    • Supervised learning: we will review core concepts of supervised learning and introduce you common supervised learning models. You will also work through the step-by-step construction of a supervised machine learning pipeline including training & testing data splits, model fitting and comparison, and generating predictions.
    • Unsupervised Learning: we will introduce core concepts and common applications of unsupervised learning. You will acquire hands-on experience with unsupervised ML methods through performing principal component analysis (PCA) and building a k-means clustering model through training, fitting, and application to real data.
    • Natural Language Processing(NLP): you will build from the foundations of ML as applied to language-based data through implementing supervised text classifiers and unsupervised topic models.

  • Qualitative Series

    Qualitative Series

    This series will introduce participants to the three leading qualitative data analysis software packages – Atlas.ti, NVivo, and MaxQDA. This introductory workshop aims to teach participants skills to manage, process, and analyze qualitative data so they can begin working on their own data. In the morning session, participants will learn how to manage and prepare qualitative data and attribute files for processing; create, save, and export a project; code transcripts, audio, video, pictures, and pdf files; search text, automatically code (autocode) search results, and query projects using conditions and attributes; and create memos, relationships, and reports. In the afternoon sessions, under the guidance of the workshop instructor, participants get to apply what they have learned in the morning session using their own data. This series is offered in hybrid (virtual and in-person) format. For those who wish to register for in-person participation, priority will be given to those who have already collected or are in the process of collecting their qualitative data.

    Instructors: Florio Arguillas Jacob Grippin

Instructor Bios

The Fall Data Science Bootcamp is given by our staff consultants, Senior Data Science Fellows, and Data Science Fellows. Learn more about our instructors below. 

  • We'd love to hear your ideas, suggestions, or questions!

    Are you
    CAPTCHA This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    1 + 2 =
    Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.