MY572 Half Unit
Data for Data Scientists
This information is for the 2018/19 session.
Dr Pablo Barbera Aranguena COL7.10
This course is available on the MPhil/PhD in Social Research Methods. This course is available with permission as an outside option to students on other programmes where regulations permit.
This course is available to all research students where regulations permit.
This course will cover the principles of digital methods for storing and structuring data, including data types, relational and nonrelational database design, and query languages. Students will learn to build, populate, manipulate and query databases based on datasets relevant to their fields of interest. The course will also cover workflow management for typical data transformation and cleaning projects, frequently the starting point and most timeconsuming part of any data science project. This course uses a project-based learning approach towards the study of online publishing and group-based collaboration, essential ingredients of modern data science projects. The coverage of data sharing will include key skills in on-line publishing, including the elements of web design, the technical elements of web technologies and web programming, as well as the use of revision-control and group collaboration tools such as GitHub. Each student will build one or more interactive website based on content relevant to his/her domain-related interests, and will use GitHub for accessing and submitting course materials and assignments.
20 hours of lectures and 15 hours of computer workshops in the MT.
In this course, we introduce principles and applications of the electronic storage, structuring, manipulation, transformation, extraction, and dissemination of data. This includes data types, database design, data base implementation, and data analysis through structured queries. Through joining operations, we will also cover the challenges of data linkage and how to combine datasets from different sources. We begin by discussing concepts in fundamental data types, and how data is stored and recorded electronically. We will cover database design, especially relational databases, using substantive examples across a variety of fields. Students are introduced to SQL through MySQL, and programming assignments in this unit of the course will be designed to insure that students learn to create, populate and query an SQL database. We will introduce NoSQL using MongoDB and the JSON data format for comparison. For both types of database, students will be encouraged to work with data relevant to their own interests as they learn to create, populate and query data. In the final section of the data section of the course, we will step through a complete workflow including data cleaning and transformation, illustrating many of the practical challenges faced at the outset of any data analysis or data science project.
Students will be expected to produce 10 problem sets in the MT.
Type: Weekly, structured problem sets with a beginning component to be started in the staff-led lab sessions, to be completed by the student outside of class. Answers should be formatted and submitted for assessment.
Chodorow, Kristina MongoDB: The Definitive Guide, 2nd Edition O’Reilly 2013.
Churcher, Clare. Beginning Database Design: From Novice to Professional. Apress, 2007.
Tahaghoghi, Seyed M. and Hugh E. Williams. Learning MySQL. O'Reilly, 2006. Karumanchi, Narasimha. Data Structures and Algorithms Made Easy: Data Structure and Algorithmic Puzzles, Second Edition. CreateSpace Independent Publishing Platform, 2011.
Lee, Kent. Data Structures and Algorithms with Python. Springer, 2015.
Lake, Peter. Concise Guide to Databases: A Practical Introduction. Springer, 2013.
Nield, Thomas. Getting Started with SQL: A hands-on approach for beginners. O’Reilly, 2016.
Byron, Angela and Addison Berry, Nathan Haug, Jeff Eaton, James Walker, Jeff Robbins Using Drupal: Choosing and Configuring Modules to Build Dynamic Websites. O'Reilly Media, 2008.
Duckett, Jon HTML and CSS: Design and Build Websites New York: Wiley, 2011.â¨
Rice, Dylan. Twitter Bootstrap In Your Pocket. CreateSpace Independent Publishing Platform, 2016.
Sklar, David Learning PHP 5 O’Reilly, 2004. GitHub Guides at https://guides.github.com, including: “Understanding the GitHub Flow”, “Hello World”, and “Getting Started with GitHub Pages”.
Jacobson, Daniel APIs: A Strategy Guide O’Reilly: 2012.â¨
London, Kyle Developing Large Web Applications: Producing Code That Can Grow and Thrive O’Reilly, 2010.
Take home exam (50%) and in class assessment (50%) in the MT.
Student problem sets will be marked each week, and will provide 50% of the mark.
Marking of these assessments will be at a level appropriate for PhD students.
Total students 2017/18: Unavailable
Average class size 2017/18: Unavailable
Value: Half Unit
Personal development skills
- Team working
- Problem solving
- Application of information skills
- Application of numeracy skills
- Specialist skills