• Online, Self-Paced
Course Description

To carry out data science, you need to gather data. Extracting, parsing, and scraping data from various sources, both internal and external, is a critical first part in the data science pipeline. In this course, you'll explore examples of practical tools for data gathering.

Learning Objectives

Data Extraction

  • start the course
  • describe problems and software tools associated with data gathering
  • use curl to gather data from the Web
  • use in2csv to convert spreadsheet data to CSV format
  • use agate to extract data from spreadsheets
  • use agate to extract tabular data from dbf files
  • extract data from particular tags in an HTML document

Metadata

  • distinguish between metadata and data
  • work with metadata in HTTP Headers
  • work with Linux log files
  • work with metadata in email headers

Remote Data

  • perform a secure shell connection to a remote server
  • copy remote data using a secure copy
  • synchronize data from a remote server

Practice: Curl and HTML

  • download an HTML file and explore table data

Framework Connections

The materials within this course focus on the NICE Framework Task, Knowledge, and Skill statements identified within the indicated NICE Framework component(s):

Specialty Areas

  • Data Administration
  • Systems Analysis

Feedback

If you would like to provide feedback for this course, please e-mail the NICCS SO at NICCS@hq.dhs.gov.