• Online, Self-Paced
Course Description

Once data is gathered for data science it is often in an unstructured or raw format. Data must be filtered for content and validity. In this course, you'll explore examples of practical tools and techniques for data filtering.

Learning Objectives

Introduction to Data Filtering

  • start the course
  • identify common filtering techniques and tools
  • extract date elements from common date formats
  • parse content types in HTTP headers
  • use csvcut to filter CSV data
  • use sed to replace values in a text data stream
  • drop duplicate records from data
  • extract headers from a jpeg image
  • use pdfgrep to extract data from searchable pdf files
  • detect invalid or impossible data combinations
  • parse robots.txt from a web site to decide what should and shouldn't be crawled nor indexed

Practice: Filtering Dates

  • drop records from a CSV file based on date range

Framework Connections

The materials within this course focus on the NICE Framework Task, Knowledge, and Skill statements identified within the indicated NICE Framework component(s):

Specialty Areas

  • Data Administration
  • Systems Analysis

Feedback

If you would like to provide feedback for this course, please e-mail the NICCS SO at NICCS@hq.dhs.gov.