My research focuses on machine learning for extracting, modeling, and analyzing large-scale structured data. Answers to many important questions, from social science to bioinformatics, can be found in structured data, such as networks and relational databases, but most often these data sets are not explicitly represented anywhere. Instead, they need to be extracted from web pages, scientific publications, and even images and video. Only once we've extracted the structured data from its unstructured representation can we begin to find the answers. I work on machine learning tools for doing this extraction with high quality, modeling the resulting structured data, and analyzing it to answer questions.

I have developed hinge-loss Markov random fields (HL-MRFs), a new class of probabilistic graphical models for scalable modeling of big, structured data. HL-MRFs can easily be constructed from templates defined with probabilistic soft logic (PSL), making them general-purpose tools.