Research
-
Long Document Summarization in a Low Resource Setting using Pretrained Language Models
Abstractive summarization is the task of compressing a long document into a coherent short document while retaining salient information. Modern abstractive summarization methods are based on deep neural networks which often require large training datasets. Since collecting summarization datasets is an expensive and time-consuming task, practical industrial settings are usually low-resource. In this paper, we study a challenging low-resource setting of summarizing long legal briefs with an average source document length of 4268 words and only 120 available (document, summary) pairs. To account for data scarcity, we used a modern pretrained abstractive summarizer BART (Lewis et al., 2020), which only achieves 17.9 ROUGE-L as it struggles with long documents. We thus attempt to compress these long documents by identifying salient sentences in the source which best ground the summary, using a novel algorithm based on GPT-2 (Radford et al., 2019) language model perplexity scores, that operates within the low resource regime. On feeding the compressed documents to BART, we observe a 6.0 ROUGE-L improvement. Our method also beats several competitive salience detection baselines. Furthermore, the identified salient sentences tend to agree with an independent human labeling by domain experts.
-
Empirical Differential Privacy
We show how to achieve differential privacy with no or reduced added noise, based on the empirical noise in the data itself. Unlike previous works on noiseless privacy, the empirical viewpoint avoids making any explicit assumptions about the random process generating the data.
-
Quantitative null-cobordism
For a given null-cobordant Riemannian n-manifold, how does the minimal geometric complexity of a null-cobordism depend on the geometric complexity of the manifold? In [Gro99], Gromov conjectured that this dependence should be linear. We show that it is at most a polynomial whose degree depends on n. This construction relies on another of independent interest. Take X and Y to be sufficiently nice compact metric spaces, such as Riemannian manifolds or simplicial complexes. Suppose Y is simply connected and rationally homotopy equivalent to a product of Eilenberg-MacLane spaces: for example, any simply connected Lie group. Then two homotopic L-Lipschitz maps f, g : X → Y are homotopic via a CL-Lipschitz homotopy. We present a counterexample to show that this is not true for larger classes of spaces Y.
-
2-complexes with large 2-girth
The 2-girth of a 2-dimensional simplicial complex X is the minimum size of a non-zero 2-cycle in H₂(X, ℤ/2). We consider the maximum possible girth of a complex with n vertices and m 2-faces. If m = n^{2+α} for α < 1/2, then we show that the 2-girth is at most 4n^{2−2α} and we prove the existence of complexes with 2-girth at least c_{α,ε} n^{2−2α−ε}. On the other hand, if α > 1/2, the 2-girth is at most C_α. So there is a phase transition as α passes 1/2. Our results depend on a new upper bound for the number of combinatorial types of triangulated surfaces with v vertices and f faces.
-
On Expansion and Topological Overlap
We give a detailed and easily accessible proof of Gromov's Topological Overlap Theorem. Let X be a finite simplicial complex or, more generally, a finite polyhedral cell complex of dimension d. Informally, the theorem states that if X has sufficiently strong higher-dimensional expansion properties (which generalize edge expansion of graphs and are defined in terms of cellular cochains of X) then X has the following topological overlap property: for every continuous map X → ℝᵈ there exists a point p ∈ ℝᵈ that is contained in the images of a positive fraction μ > 0 of the d-cells of X. More generally, the conclusion holds if ℝᵈ is replaced by any d-dimensional piecewise-linear (PL) manifold M, with a constant μ that depends only on d and on the expansion properties of X, but not on M.
-
Higher dimensional distortion of random complexes
Using the random complexes of Linial and Meshulam, we exhibit a large family of simplicial complexes for which, whenever affinely embedded into Euclidean space, the filling areas of simplicial cycles is greatly distorted. This phenomenon can be regarded as a higher order analogue of the metric distortion of embeddings of random graphs.
-
The filling problem in the cube
We prove an isoperimetric inequality for filling cellular cycles in a high dimensional cube with cellular chains. In addition, we provide a family of cubical cellular cycles for which the exponent in the inequality is optimal.
-
Coboundary expanders
We describe a natural topological generalization of edge expansion for graphs to regular CW complexes and prove that this property holds with high probability for certain random complexes.
-
Properties of the Generalized Zig-Zag Product of Graphs
The operation of zig-zag products of graphs is the analogue of the semidirect product of groups. Using this observation, we present a categorical description of zig-zag products in order to generalize the construction for the category of simple graphs. Also, we examine the covering properties of zig-zag products and we utilize these results to estimate their spectral invariants in general. In addition, we provide specific spectral analysis for some such products.