Basic Probability, Computing and Statistics 2015

People

Responsible: Khalil Sima'an
Course Design and Script: Philip Schulz
Lecturers: Christian Schaffner and Philip Schulz
Programming Labs: Philip Schulz
Teaching assistant: Thomas Brochhagen

Mini Seminar

Presentation of MoL courses using basic probability: (in G2.02)

9:00-9:05: Introduction
9:05-9:20: Kolmogorov Complexity by Leen Torenvliet
9:20-9:35: Natural Language Processing 1 by Ivan Titov
9:35-9:50: Natural Language Processing 2 by Khalil Sima'an and Wilker Aziz
9:50-10:05: BREAK
10:05-10:20: Combinatorics with Computer Science Applications by Ronald de Wolf
10:20-10:35: Quantum Computing by Ronald de Wolf
10:35-10:50: Foundations of Neuro-Cognitive Modeling by Jelle Zuidema
10:50-11:05: BREAK (and change of room to A1.04)
11:05-11:20: Information Theory by Christian Schaffner
11:20-11:35: Introduction to Modern Cryptography by Christian Schaffner

Intended Learning Outcomes

This course is designed to provide students with the background in discrete probability theory and programming that is necessary to follow other more advanced master-level courses in areas such as linguistics, natural language processing, machine learning, complexity theory, cryptography, information theory, quantum computing, combinatorics, etc. The goal is to make students that have had no prior exposure to probability theory and/or programming feel comfortable in these areas. To achieve this goal we will try to illustrate the theoretical concepts with real-life examples that relate to topics in, e.g., computer science, gambling, and the like. Moreover, we will make sure that there is a close tie between the theoretical and practical part of the course, thus enabling students to apply their newly acquired theoretical knowledge to real problems.

The course is designed to equip the students with an understanding of probability theory and programming skills that will be necessary for more advanced courses. At the end of the course, students will be able to:

Theory:

apply basic combinatorics in simple scenarios,
do calculations with discrete probabilities,
understand and use the main results in basic probability theory,
use random variables to group outcomes in a way that is adequate for solving probabilistic problems,
formulate generative probability models and derive under independence assumptions the model parameters,
build simple probabilistic models to explain real-world data,
use simple estimation techniques to infer the parameters of distributions from data
understand and use concepts in Information Theory like entropy and perplexity as means for quantifying uncertainty in predicting data relative to a given model,
get acquainted with the principles of empirical research including goals and methods of experimentation.

Programming:

use an IDE to structure their programs in understandable hierarchies,
use the Python programming language,
use certain Python libraries designed specifically for mathematical operations (e.g., numpy),
implement some of the concepts acquired in the theory part of the course.

Content Schedule

The course will largely focus on discrete probability theory. It will roughly follow this structure (with the possibility that the contents of certain weeks may be exchanged). Depending on the progress of the course, we may choose to only cover the topics from weeks 1 to 6 and spread them out more:

Week 1: Theory: Introduction to combinatorics slides #1, Theory homework #1
Programming: Introduction IDE for programming and simple Python statements
Programming homework #1, Guidelines to programming exercises
Code from first lab Notice that we added explanations on len(), list concatenation and on how to obtain system information

Week 2: Theory: Probability spaces, joint probability, conditional probability, Inclusion-Exclusion principle Theory homework #2
Programming: Git and Github, Basic Python data structures, iterations and loops Programming homework #2
Code from second lab with a short intro to list sorting added at the beginning
Reference solution for assignment
List comparison tool

Week 3: Theory: Random Variables, marginal probability, expectation and variance, probability distributions, probability mass functions, cumulative distribution functions, binomial and multinomial distributions Theory homework #3
Programming: functions, classes and inheritance, the main method Programming homework #3
Code from third lab

Week 4: Theory: Chain rule, Bayes' rule, Naive Bayes model Theory homework #4
Programming: Using the debugger, unit tests, exception handling, tail recursion, writing to files
Programming homework #4
Code from the lab is here and here

Week 5: Theory: Basic of statistics: Motivation, sample means, weak law of large numbers, maximum likelihood estimation, maximum a posteriori estmiation Theory homework #5
Programming:Implementing Naive Bayes, anonymous functions Programming homework #5

Week 6: Theory: Basics of Information Theory, expectation maximization Theory homework #6
Programming: Familiarization with Numpy, use of provided probabilty distributions, implementing EM Programming homework #6

Week 7, 12 Oct 2015: Presentation of MoL courses using basic probability: (in G2.02)
Programming: Assistance with final project, Q/A

Lectures and Exercise sessions

please check Datanose for the definite times and locations.

Prerequisites

We pre-suppose very basic prior exposure to set theory (essentially at the level of basic set operations like union, intersection and set difference). Other than that, the course will be entirely self-contained. In particular, no prior programming knowledge is required. We expect a high level of interest and engagement from the students.

Course Design

This course has a theoretical and a practical component, both of which are of equal importance. The course will consist of lectures and programming labs. The goal is to work interactively with the students. We therefore expect a high level of engagement from the students. In order to involve the students, the lectures will be interspersed with small in-class exercises.

Material

A lecture script that is closely aligned with the content of the lectures will be provided. The code produced during the programming labs will be cleaned, commented out, and be made available online. In addition, students may use Think Python, which is freely available online, as a reference.

All material for this course is provided on the github.com platform. This platform is commonly used for open-source software development and you are encouraged to get familiar with it. In particular, you are encouraged to help us with improving the lecture notes and websites. You can earn up to one extra point per week (for your weekly programming assignment) by getting a pull request accepted by us. In order to do that, you will have to get yourself (for free) an account on github, fork one of our repositories, clone the fork onto your machine, correct the typo or error you have found, commit it to your fork, and submit a pull request. Once we have checked and accepted your changes, you will get the extra point.

Number of participants

Examination

There will be weekly theoretical and programming homework which will be weighed equally. In addition, there may be a small final programming project. While the theoretical homework will be graded by the lecturers, the programming homework will be graded through peer assessment.

Cooperation among students for both theory and programming exercises is strongly encouraged. However, after this discussion phase, every student writes down and submits his/her own individual solution.

Guidelines to programming exercises