Events Type: Computer Science seminar
January 21, Tuesday
12:00 – 14:00
Unsupervised and Supervised Learning in Natural Language Processing
Computer Science seminar
Lecturer : Ido Dagan
Affiliation : Bar-Ilan University
Location : -101/58
show full content
The research field of natural language processing (NLP) has been receiving growing attention in recent years. In particular, the major focus on empirical methods, which learn vast knowledge and inferences from available text collections (corpora), boosted feasibility and robustness of language processing techniques and facilitated real world applications. This talk will first review shortly major foundational and applied tasks within NLP and how they are approached by empirical methods. Certain issues will be illustrated by references to my personal work, stressing the need to minimize reliance on human supervision and resources. I will then describe in more detail an ongoing line of research for unsupervised semantic learning, addressing several disambiguation and inference tasks. Further, a novel approach to the (essentially supervised) task of text categorization will be presented, which establishes a different category specification scheme based on unsupervised learning, enjoying several practical advantages. I will conclude with some new directions in corpus-based semantic modeling and
the roles they open for unsupervised and supervised learning.
January 14, Tuesday
12:00 – 14:00
Data synopses for data streams and massive data sets
Computer Science seminar
Lecturer : Yossi Matias
Affiliation : Tel Aviv University
Location : -101/58
Host : Mayer Goldberg
show full content
The emerging area of data synopses and streaming data analysis has seen tremendous progress over the past few years, involving deep theoretical issues on the one hand, and vast applicability on the other hand. Massive data sets with hundreds of gigabytes or more of raw data are becoming commonplace, and traditional algorithms and data structures fail to process such data sets effectively. Hence, there is a growing need for algorithms and data structures that enable fast response times for various classes of queries on such data, and for algorithms that can handle efficiently the data as it streams by. We discuss synopsis data structures that use very limited space to capture the demographics of massive data sets; these are designed to support fast and typically approximated answers to queries. We will point out several techniques that have proven useful, including adaptive sampling, random projection, and wavelets. Time permitting, we will discuss recent results on spectral bloom filters and list-traversal synopses.
09:00 – 11:00
TBA
Computer Science seminar
Lecturer : Irit Dinur
Location : -101/58
Host : Eitan Bachmat
January 13, Monday
12:00 – 14:00
TBA
Computer Science seminar
Lecturer : Irit Dinur
Location : -101/58
Host : Eitan Bachmat
January 9, Thursday
12:00 – 14:00
New Lattice Based Cryptographic Constructions
Computer Science seminar
Lecturer : Oded Regev
Location : -101/58
show full content
We introduce the use of methods from harmonic analysis as an integral part of a lattice based construction. The tools we develop provide an elegant description of certain Gaussian distributions around lattice points. Our results include two cryptographic constructions which are based on the worst-case hardness of the unique shortest vector problem. The main result is a new public key cryptosystem whose security guarantee is considerably stronger than previous results ($O(n^{1.5}$) instead of $O(n^7)$). This provides the first alternative to Ajtai and Dwork's original 1996 cryptosystem. Our second result is a collision resistant hash function which, apart from improving the security in terms of the unique shortest vector problem, is also the first example of an analysis which is not based on Ajtai's iterative step. Surprisingly, the two results are derived from the same tool which presents two indistinguishable distributions on the segment $[0,1)$. It seems that this tool can have further
applications and as an example we mention how it can be used to solve an open problem related to quantum computation
January 7, Tuesday
12:00 – 14:00
Building a Digital Library of Formal Mathematics
Computer Science seminar
Lecturer : Robert Constable
Affiliation : Cornell University
Location : -101/58
January 2, Thursday
12:00 – 14:00
Baysian Information Criterion (BIC) and Clustering
Computer Science seminar
Lecturer : Itshak Lapidot
Affiliation : IDIAP, Martigny, Switzerland
Location : -101/58
show full content
Baysian Information Criterion (BIC) is a probabilistic, model selection criterion which many works have applied to the clustering validity problem. After an overview of BIC, it will first be shown that the criterion needs to be extended in order to properly cluster short series. Second, it will be shown how BIC can be used in Vector Quantization (VQ) instead of a probabilistic model; this will be followed by its application in speaker clustering.