ARCHER logo ARCHER banner

The ARCHER Service is now closed and has been superseded by ARCHER2.

  • ARCHER homepage
  • About ARCHER
    • About ARCHER
    • News & Events
    • Calendar
    • Blog Articles
    • Hardware
    • Software
    • Service Policies
    • Service Reports
    • Partners
    • People
    • Media Gallery
  • Get Access
    • Getting Access
    • TA Form and Notes
    • kAU Calculator
    • Cost of Access
  • User Support
    • User Support
    • Helpdesk
    • Frequently Asked Questions
    • ARCHER App
  • Documentation
    • User Guides & Documentation
    • Essential Skills
    • Quick Start Guide
    • ARCHER User Guide
    • ARCHER Best Practice Guide
    • Scientific Software Packages
    • UK Research Data Facility Guide
    • Knights Landing Guide
    • Data Management Guide
    • SAFE User Guide
    • ARCHER Troubleshooting Guide
    • ARCHER White Papers
    • Screencast Videos
  • Service Status
    • Detailed Service Status
    • Maintenance
  • Training
    • Upcoming Courses
    • Online Training
    • Driving Test
    • Course Registration
    • Course Descriptions
    • Virtual Tutorials and Webinars
    • Locations
    • Training personnel
    • Past Course Materials Repository
    • Feedback
  • Community
    • ARCHER Community
    • ARCHER Benchmarks
    • ARCHER KNL Performance Reports
    • Cray CoE for ARCHER
    • Embedded CSE
    • ARCHER Champions
    • ARCHER Scientific Consortia
    • HPC Scientific Advisory Committee
    • ARCHER for Early Career Researchers
  • Industry
    • Information for Industry
  • Outreach
    • Outreach (on EPCC Website)

You are here:

  • ARCHER
  • Upcoming Courses
  • Online Training
  • Driving Test
  • Course Registration
  • Course Descriptions
  • Virtual Tutorials and Webinars
  • Locations
  • Training personnel
  • Past Course Materials Repository
  • Feedback

Contact Us

support@archer.ac.uk

Twitter Feed

Tweets by @ARCHER_HPC

ISO 9001 Certified

ISO 27001 Certified

Data Analytics with HPC

Dates: 29-30 June 2017

Location: University of Portsmouth

Please note: these materials are still in draft form and may be subject to change before the course begins, but they will give you an idea of the content to be covered.

Lecture Slides

Unless otherwise indicated all material is Copyright © EPCC, The University of Edinburgh, and is only made available for private study.

Day 1

  • 09:00 – 09:30 Arrival/set-up/Welcome
  • 09:30 – 10:30 What are data analytics, big data, data science
  • 10:30 – 11:00 COFFEE
  • 11:00 – 12:00 Data Cleaning
  • 12:00 – 13:00 Practical: Data Cleaning
  • 13:00 – 14:00 LUNCH
  • 14:00 – 14:45 Supervised Learning, feature selection, trees, forests
  • 14:45 – 15:30 Naïve Bayes
  • 15:30 – 16:00 COFFEE
  • 16:00 – 17:00 Naïve Bayes Practical
  • 17:00 CLOSE OF DAY

Day 2

  • 09:00 – 10:30 MapReduce / Hadoop
  • 10:30 – 11:00 COFFEE
  • 11:00 – 11:30 Hadoop walkthrough
  • 11:30 – 12:30 Unsupervised learning
  • 12:30 – 13:30 LUNCH
  • 13:30 – 14:15 Spark
  • 14:15 – 15:00 Data streaming
  • 15:00 – 15:30 COFFEE
  • 15:30 – 16:00 Spark, Data streaming demonstrations
  • 16:00 – CLOSE OF COURSE

Exercise Material

Unless otherwise indicated all material is Copyright © EPCC, The University of Edinburgh, and is only made available for private study.

Data cleaning materials
  • assign_fields.py
  • daltons.txt
  • DataCleaningInPython.ipynb
  • read_daltons_file.py
  • unnamed.txt
Naïve Bayes materials
  • Naïve Bayes Practical materials - .tar.gz
  • Naïve Bayes Practical materials - .zip
Hadoop materials
  • Hadoop walkthough page
  • src.zip
  • data.zip
  • answers.zip
Spark demo materials
  • Spark k-means walkthough (pdf)
  • Spark k-means walkthrough (Jupyter notebook)
  • Spark k-means walkthrough (Jupyter notebook hosted on github)

Copyright © Design and Content 2013-2019 EPCC. All rights reserved.

EPSRC NERC EPCC