This project aims to design a join sampling framework that enables very fast approximate queries in open-source database systems.
This project has reached full capacity for the current term. Please check back next semester for updates.
Join sampling is a useful technique to draw random samples from a complex database join query without computing the join results in full. It can be used to provide fast approximation of aggregation over the join results. Existing algorithms are often implemented outside DBMS kernel, and have rigid design that does not consider the sampling and cost trade-offs compared to their full sampling counterpart. In this project, we aim to design a new join sampling framework integrated in open-source database systems, such that we can enable query optimizer to evaluate the cost/accuracy trade-off of algorithms, and potentially enable hybrid algorithms that combine full join computation and join sampling. The students will be introduced to our existing systems based on PostgreSQL, a mostly used open-source database system, and background on random sampling in database systems. After that, the students are expected to design and implement a join sampling framework in iterator model that can express existing join sampling algorithms and explore query optimization strategies within the framework.
The specific outcomes of this project will be identified by the faculty mentor at the beginning of your collaboration.
| Length of commitment | Year-long |
| Start time | Spring |
| In-person, remote, or hybrid? | Hybrid |
| Level of collaboration | Small group project (2-3 students) |
| Benefits | Stipend Potential Academic Credit |
| Who is eligible | Sophomores, Juniors, and Seniors that have taken CSE 220, CSE 250, and CSE 331. The student should be proficient with the C programming language and be familiar and comfortable with data structures and algorithms. |
Zhuoyue Zhao
Assistant Professor
Computer Science and Engineering
Phone: (716) 645-4735
Email: zzhao35@buffalo.edu
Once you begin the digital badge series, you will have access to all the necessary activities and instructions. Your mentor has indicated they would like you to also complete the specific preparation activities below. After you’re approved to begin the project, your mentor will send the relevant materials. Please reference this when you get to Step 2 of the Preparation Phase.
The student should be able to complete an onboard training for developing code in public code base: https://github.com/UB-ADBLAB/aqp_demo_public/.
By the time they start they should have successfully set up the development environment locally or on our research server, and be able to build and set up the PostgreSQL server with pgAQP extension, execute an approximate single-table query, use GDB to debug the code. Please contact the project mentor for access to the servers if needed.
computer science, engineering, C programming, database management
