Keynote talk 1
Modeling, Exploring and Analyzing Change: The Janus Project
Gold Coast Time: Mon 16 Dec 2024 11:00 AEST (UTC+10)
Tokyo Time: Mon 16 Dec 2024 10:00 JST (UTC+9)
▶ Abstract
Data change, all the time. The Janus project seeks to model, explore, and analyze such change, providing valuable insights into the evolving real world and the ways in which data about it are collected and used. We start by identifying technical challenges that need to be addressed to realize the Janus vision. Based on an analysis of the history of 3.5M tables on the English Wikipedia for a total of 53.8M table versions, we then illustrate the rich history of structured Wikipedia data: their creation, evolution, and deletion; indeed, each table has a life of its own. To help automatically interpret the useful knowledge harbored in the history of Wikipedia tables, we present recent results on two technical problems that help infer identity of entities and tables across changes over time: (i) matching tables, infoboxes and lists within a Wikipedia page across page revisions, and (ii) identifying Natural Keys, which serves as a primary key in tables over time and consists of attributes inherent to an entity. Finally, we show how to accurately recommend schema changes to Wikipedia tables, based on rules derived from the history of past schema changes. We solve these problems at scale and make the resulting curated datasets available to the community to facilitate future research.
▶ Speaker
Divesh Srivastava
AT&T Labs-Research
Divesh Srivastava is the Head of Database Research at AT&T. He is an AT&T Fellow, a Fellow of the ACM, the President of the VLDB Endowment, co-chair of the ACM Publications Board, and on the Board of Directors of the Computing Research Association. He has served as PC co-chair of many international conferences including VLDB 2024 (Industrial), SIGMOD 2021, VLDB 2020 (Industrial), SIGMOD 2020 (Industrial), and ICDE 2019. He has presented keynote talks at several international conferences, and his research interests and publications span a variety of topics in data management. He received his Ph.D. from the University of Wisconsin, Madison, USA, and his Bachelor of Technology from the Indian Institute of Technology, Bombay, India.
Keynote talk 2
LLM and Database: Opportunities and Challenges
Gold Coast Time: Mon 16 Dec 2024 12:00 AEST (UTC+10)
Tokyo Time: Mon 16 Dec 2024 11:00 JST (UTC+9)
▶ Abstract
Large language models (LLMs) have shown superior performance in various areas. And LLMs have the potential to revolutionize data management by serving as the "brain" of next-generation database systems. However, there are several challenges that utilize LLMs to optimize databases, including hallucinations, high overhead, and complex reasoning. In this talk, I will present the challenges and opportunities of designing LLM-powered data management systems.
▶ Speaker
Guoliang Li
Tsinghua University
Guoliang Li is a full professor at Department of Computer Science, Tsinghua University, Beijing, China. The research interests of Guoliang Li include database systems, machine learning for databases, human-in-the-loop data management, large-scale data cleaning and integration. He got VLDB 2017 Early Research Contribution Award, TCDE 2014 Early Career Award, SIGMOD 2024 Research Highlight Award, SIGMOD 2023 Best Papers, VLDB 2023 Best Industry Paper Runner-up, VLDB 2020 Best Papers, CIKM 2017 Best Paper Award, KDD 2018 Best Papers, ICDE 2018 Best Papers, DASFAA 2023 Best Paper Award, DASFAA 2014 Best Paper Runnerup, APWeb 2014 Best Paper Award, EDBT 2013 Similarity Join and Search Champion. He was SIGMOD 2021 General Co-Chair. He regularly served as PC Member of SIGMOD, VLDB, ICDE, KDD, WWW. He is serving as associate editor for IEEE TKDE and VLDB Journal.
Keynote talk 3
Data Valuation in Data Systems
Gold Coast Time: Mon 16 Dec 2024 14:00 AEST (UTC+10)
Tokyo Time: Mon 16 Dec 2024 13:00 JST (UTC+9)
▶ Abstract
In the AI era, data is the fuel driving transformative applications across every industry. Data systems today must manage data not only as a resource but as a strategic asset with measurable value. Understanding the impact of data on downstream applications has become essential, shaping how data systems are designed and optimized. In this talk, I will explore the critical task of data valuation within data systems and the technical challenges we face in unlocking this potential. I will also share some of the latest advancements in this exciting frontier, offering a glimpse into the future of intelligent data management.
▶ Speaker
Jian Pei
Duke University
Jian Pei is the Arthur S. Pearse Distinguished Professor at Duke University, where he conducts pioneering research in data science, applied machine learning, big data, data mining, and database systems. His work centers on creating powerful, efficient data analysis techniques tailored for today’s AI-driven, data-intensive applications, making a profound impact both in academia and in real-world practice. Dr. Pei is a Fellow of the Royal Society of Canada (RSC), the Canadian Academy of Engineering, ACM, and IEEE, reflecting his influential contributions to the field. A prolific author, Dr. Pei has published a textbook, two monographs, and over 300 research papers in leading journals and conferences since 2000, with his work cited more than 130,000 times. His algorithms are integrated into production systems and adopted in popular open-source software suites, and he has overseen the development of several commercial systems of unprecedented scale. Dr. Pei’s achievements have earned him numerous prestigious awards, including the 2017 ACM SIGKDD Innovation Award, the 2015 ACM SIGKDD Service Award, and multiple best paper and test-of-time accolades.
Keynote talk 4
Efficient Query Processing in Vector Databases
Gold Coast Time: Tue 17 Dec 2024 11:00 AEST (UTC+10)
Tokyo Time: Tue 17 Dec 2024 10:00 JST (UTC+9)
▶ Abstract
A significant challenge in current vector databases is handling approximate similarity queries in high-dimensional space. With the rise of large language models, vector databases have become a focal point of research. The "curse of dimensionality" problem raises questions about the feasibility of effectively indexing high-dimensional data and whether nearest-neighbor queries remain relevant in these spaces. Additionally, the efficient processing of such queries is another crucial concern. In this talk, I will provide an overview of the research advancements in this domain and present our recent findings on managing and processing high-dimensional data in the context of generative AI.
▶ Speaker
Xiaofang Zhou
Hong Kong University of Science and Technology
Xiaofang Zhou holds the Otto Poon Professorship in Engineering and is a Chair Professor of Computer Science and Engineering at HKUST, where he leads the department. His work spans database systems, data quality management, big data analytics, machine learning, and AI. He chaired the IEEE ICDE 2013, ACM CIKM 2016, and PVLDB 2020 conferences, and was General Chair for ICDE 2025 and ACM MM 2015. Prior to HKUST, he was a Computer Science Professor at The University of Queensland, heading its Data Science discipline. He is a Global STEM Scholar and an IEEE Fellow.
Keynote talk 5
Bipartite Graph Analytics: Applications, Models and Future Trends
Gold Coast Time: Tue 17 Dec 2024 12:00 AEST (UTC+10)
Tokyo Time: Tue 17 Dec 2024 11:00 JST (UTC+9)
▶ Abstract
Bipartite graphs connects two distinct sets of vertices, and are widely used in diverse fields especially in e-commerce networks and biological networks. Analytics of bipartite graphs has gained significant attention both in industry and academia. This talk aims to shed light on analysis methods for bipartite graphs, categorizing them into three directions: querying-based models, learning-based models, and application-driven models. I will start by outlining the importance of bipartite graph analytics, and the unique challenges that need to be addressed. Then, I will highlight some of our recent work in this topic. Finally, I will share my insights for new research directions.
▶ Speaker
Wenjie Zhang
The University of New South Wales
Wenjie Zhang is a Professor and Head of Data and Knowledge Research Group in School of Computer Science and Engineering, University of New South Wales Australia. Her research interests lie in developing efficient and scalable techniques for data intensive applications. She has published over 180 research papers in leading international journals and conferences. Wenjie serves as an Associate Editor for IEEE Transactions on Knowledge and Data Engineering and VLDB Journal, PC chair for ICDE 2025, a senior PC or track chair for VLDB 2023/2022, CIKM 2019-2024, and ICDE 2023/2019, and an organization committee member for more than 30 international conferences. Wenjie is the recipient of the Australasian CORE Chris Wallace Research Award in 2019. Her works receive the ACM SIGMOD Research Highlight Award 2021, one of the Best Papers in SIGMOD 2020, ICDE 2013/2012/2010, and several Best (Student) Paper Awards from international conferences.
Keynote talk 6
Data Profiling for Data Integration
Gold Coast Time: Tue 17 Dec 2024 14:00 AEST (UTC+10)
Tokyo Time: Tue 17 Dec 2024 13:00 JST (UTC+9)
▶ Abstract
Data profiling comprises a broad range of methods to efficiently analyze a given dataset. In a typical scenario, which mirrors the capabilities of commercial data profiling tools, tables of a relational database are scanned to derive metadata, such as data types and value patterns, completeness and uniqueness of columns, keys and foreign keys, and various data dependencies. The talk highlights the key insights behind recent state of the art methods and presents various use cases in the areas of data cleaning and data integration: violations of dependencies point to errors in the data; key discovery identifies the core entities of a data source; inclusion dependencies are candidates to join up multiple sources; and in general, data profiling results can be used to organize data lakes.
▶ Speaker
Felix Naumann
University of Technology in Berlin
Felix Naumann studied mathematics, economy, and computer sciences at the University of Technology in Berlin. After receiving his diploma (MA) in 1997 he completed his PhD thesis in the area of data quality at Humboldt University of Berlin in 2000. In 2001 and 2002 he worked at the IBM Almaden Research Center on data integration topics. From 2003 - 2006 he was assistant professor for information integration, again at the Humboldt-University of Berlin. Since 2006 he holds the chair for information systems at the Hasso Plattner Institute (HPI) at the University of Potsdam in Germany. He has been visiting researcher at QCRI, AT&T Research, IBM Research, and SAP, and he is currently visiting researcher at CIRES/UQ. His research interests include data profiling, data cleansing, and data integration with over 200 scientific publications. Next to numerous PC memberships for international conferences, he has organized several conferences in various roles, including VLDB 2021 as PC co-chair, and he was trustee of the VLDB Endowment.