Research

Data integration and quality

Getting comprehensive information from a variety of heterogeneous data sources is crucial for a large variety of data-informed applications. Our research focuses on algorithms and systems that combine such data effectively, resulting in high data quality, while operating in a resource-conscious way.

  • Entity resolution
  • Semi-structured data
  • Dynamic and streaming data
  • Adaptive data integration
  • Data lakes

Accountable and fair data processing

As decisions increasingly rely on data analysis and machine learning, we develop technologies to understand, explain, document, monitor, and improve the underlying data processing. We thereby facilitate responsible data use.

  • Data provenance
  • Fairness and bias
  • Meta-data modeling
  • Data capture, management, and querying
  • Trust in data engineering

Data exploration and analytics

Investigating what information lies in one’s data and applying appropriate analysis techniques to derive insights typically is an interactive process. In our research, we devise algorithms and system that guide users such as data scientists and domain experts in this process.

  • Human-in-the-loop exploration
  • Iterative pipeline refinement
  • Recommendations

Complex data processing pipelines

As the amount and variety of data constantly increase, their processing and management require novel data management technologies. We investigate solutions that cater to different types of data, domain requirements, deployment environments, and users with varying digital / data management literacy levels.

  • Data pipeline debugging
  • Performance optimization
  • Domain-specific pipelines