Sameer Singh

Sameer Singh
4204 Donald Bren Hall
University of California
Irvine, CA 92697-3435
sameer@uci.edu

Dr. Sameer Singh is an Assistant Professor of Computer Science at the University of California, Irvine. He is working on large-scale and interpretable machine learning applied to information extraction and natural language processing. Before UCI, Sameer was a Postdoctoral Research Associate at the University of Washington, working primarily with Carlos Guestrin. He received his PhD from the University of Massachusetts, Amherst in 2014 under the supervision of Andrew McCallum, during which he also interned at Microsoft Research, Google Research, and Yahoo! Labs. He was awarded the Adobe Research Data Science Faculty Award, was selected as a DARPA Riser, won the grand prize in the Yelp dataset challenge, and received the Yahoo! Key Scientific Challenge (umass story, yahoo story).

Curriculum Vitae

External Links

Appointments

Univ of California
Irvine CA
University of California, Irvine
Assistant Professor
2016 - current

Univ of Washington
Seattle WA
...
Postdoctoral Researcher
2013 - 2016

Education

PhD (CS)
Univ of Massachusetts
...
Amherst MA
2014

MS (CS)
Vanderbilt University
...
Nashville TN
2007

BEng (EE)
University of Delhi
...
New Delhi
2004

High School
Sardar Patel Vidyalaya
...
New Delhi
2000

Industry

Microsoft Research
Cambridge UK

...

Research Intern
Summer 2012

Google Research
Mountain View CA
...
Research Intern
Summer 2010

Yahoo! Labs
Sunnyvale CA
...
Research Intern
Summer 2009

Google
Piitsburgh PA
...
Research Intern
Summer, Fall 2007

Selected Recent Publications see all...

  • D. Dua, Y. Wang, P. Dasigi, G. Stanovsky, S. Singh, M. Gardner. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs. Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2019Conference
    Reading comprehension has recently seen rapid progress, with systems matching humans on the most popular datasets for the task. However, a large body of work has highlighted the brittleness of these systems, showing that there is much work left to be done. We introduce a new reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs. In this crowdsourced, adversarially-created, 55k-question benchmark, a system must resolve references in a question, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs, as they remove the paraphrase-and-entity-typing shortcuts available in prior datasets. We apply state-of-the-art methods from both the reading comprehension and semantic parsing literatures on this dataset and show that the best systems only achieve 38.4% F1 on our generalized accuracy metric, while expert human performance is 96%. We additionally present a new model that combines reading comprehension methods with simple numerical reasoning to achieve 51% F1.
    @inproceedings{drop:naacl19,
         author = {Dheeru Dua and Yizhong Wang and Pradeep Dasigi and Gabriel Stanovsky and Sameer Singh and Matt Gardner},
         title = { DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs },
         booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
         year = {2019}
        }
  • M. Tulio Ribeiro, S. Singh, C. Guestrin. Semantically Equivalent Adversarial Rules for Debugging NLP models. Association for Computational Linguistics (ACL). 2018Conference
    ACL 2018 Honorable Mention for Best Paper.
    Complex machine learning models for NLP are often brittle, making different predictions for input instances that are extremely similar semantically. To automatically detect this behavior for individual instances, we present semantically equivalent adversaries (SEAs) – semantic-preserving perturbations that induce changes in the model’s predictions. We generalize these adversaries into semantically equivalent adversarial rules (SEARs) – simple, universal replacement rules that induce adversaries on many instances. We demonstrate the usefulness and flexibility of SEAs and SEARs by detecting bugs in black-box state-of-the-art models for three domains: machine comprehension, visual question-answering, and sentiment analysis. Via user studies, we demonstrate that we generate high-quality local adversaries for more instances than humans, and that SEARs induce four times as many mistakes as the bugs discovered by human experts. SEARs are also actionable: retraining models using data augmentation significantly reduces bugs, while maintaining accuracy.
    @inproceedings{sears:acl18,
        author = {Marco Tulio Ribeiro and Sameer Singh and Carlos Guestrin},
        title = { Semantically Equivalent Adversarial Rules for Debugging NLP models },
        booktitle = {Association for Computational Linguistics (ACL)},
        year = {2018}
      }
  • N. Gupta, S. Singh, D. Roth. Entity Linking via Joint Encoding of Types, Descriptions, and Context. Empirical Methods in Natural Language Processing (EMNLP). 2017Conference
    For accurate entity linking, we need to capture various information aspects of an entity, such as its description in a KB, contexts in which it is mentioned, and structured knowledge. Additionally, a linking system should work on texts from different domains without requiring domain-specific training data or hand-engineered features.
    In this work we present a neural, modular entity linking system that learns a unified dense representation for each entity using multiple sources of information, such as its description, contexts around its mentions, and its fine-grained types. We show that the resulting entity linking system is effective at combining these sources, and performs competitively, sometimes out-performing current state-of-the-art systems across datasets, without requiring any domain-specific training data or hand-engineered features. We also show that our model can effectively "embed" entities that are new to the KB, and is able to link its mentions accurately.
    @inproceedings{neuralel:emnlp17,
          author = {Nitish Gupta and Sameer Singh and Dan Roth},
          title = { Entity Linking via Joint Encoding of Types, Descriptions, and Context },
          booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
          year = {2017}
        }
  • M. Tulio Ribeiro, S. Singh, C. Guestrin. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. Knowledge Discovery and Data Mining (KDD). 2016Conference
    Audience Appreciation Award
    Also presented at the CHI 2016 Workshop on Human-Centred Machine Learning (HCML).
    Coming Soon!
    @inproceedings{lime:kdd16,
          author = {Marco Tulio Ribeiro and Sameer Singh and Carlos Guestrin},
          title = { "Why Should I Trust You?": Explaining the Predictions of Any Classifier },
          booktitle = {Knowledge Discovery and Data Mining (KDD)},
          year = {2016}
        }