An optimal skew-insensitive join and multi-join algorithm for distributed architecture

The development of scalable parallel database systems requires the design of efficient algorithms for the join operation which is the most frequent and expensive operation in relational database systems. The join is also the most vulnerable operation to data skew and to the high cost of communication in distributed architectures. In this paper, we present a new parallel algorithm for join and multi-join operations on distributed architectures based on an efficient semi-join computation technique. This algorithm is proved to have optimal complexity and deterministic perfect load balancing. Its tradeoff between balancing overhead and speedup is analyzed using the BSP cost model which predicts a negligible join product skew and a linear speed-up. This algorithm improves our fa_join and sfa_join algorithms by reducing their communication and synchronization cost to a minimum while offering the same load balancing properties even for highly skewed data.

Data and Resources

Additional Info

Field Value
Source Proceedings of the International Conference on Database and Expert Systems Applications (DEXA'2005).
Author Bamha, Mostafa
Maintainer CCSD
Last Updated May 11, 2026, 11:58 (UTC)
Created May 11, 2026, 11:58 (UTC)
Identifier hal-00081350
Language en
contributor Laboratoire d'Informatique Fondamentale d'Orléans (LIFO) ; Université d'Orléans (UO)-Ecole Nationale Supérieure d'Ingénieurs de Bourges
creator Bamha, Mostafa
date 2005-05-11T00:00:00
harvest_object_id a01a418c-f1c0-452f-9458-b30ada0753ec
harvest_source_id 3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title test moissonnage SELUNE
metadata_modified 2025-08-12T00:00:00
set_spec type:COMM