Discussion:
Finding missing rows -- JOIN with multiple conditions
(too old to reply)
Bill Moseley
2013-06-01 23:22:17 UTC
Permalink
After stumbling around for a while I realize I need some help with building
a query with DBIC.

After writing what follows I found that I want to use a JOIN with multiple
join conditions.

What I'm after is how to write a query in DBIC with a JOIN like what is
bold below. I can't find anything in the docs that supports this usage
unless I'm not understanding the sub {} syntax for add_relationship()

(the schema that follows will help this make sense)

SELECT
musician_band.musician, band.id as band, album.id as album

FROM
musician_band
LEFT JOIN band ON musician.band = band.id
LEFT JOIN album ON album.band = band.id
LEFT JOIN track ON track.album = album.id* AND track.songwriter =
musician_band.musician*
LEFT JOIN track_detail ON track_detail.id = track.id

WHERE
band.id in ( @bands )
-- NULL thus means there's no track written by the musician on
that album.
-- or the track was not included on the final album.
AND ( track.id IS NULL OR track_detail.is_on_album IS FALSE )




I have the typical music database. I need to find musicians that played on
albums but where they were NOT the song writer on ANY track on that album.

For example, if a drummer (musician.id = 123) never writes songs and played
in two bands, and each band had three albums I'd want:

musician | band | album
----------+---------+---------
123 | 22 | 45
123 | 22 | 46
123 | 22 | 47
123 | 54 | 78
123 | 54 | 92
123 | 54 | 103

That musician played on 6 albums yet never was the songwriter for a track
on any of the albums.

My question is what is the best query to use, and how to represent that in
DBIC?



The tables are as you might expect:

A band has many albums, and albums have many tracks. Each track has just
one songwriter which is a musician.

table musician ( id integer );

table band ( id integer );

table album (
id integer,
band integer REFERENCES band
);

table track (
id integer,
album integer REFERENCES album,
songwriter integer REFERENCES musician
);

This can't be too easy so there's a table that manages additional details
about a track -- like if it is included on the final album. Sorry, it's
just that way.

table track_detail (
id integer REFERENCES track (id),
is_on_album boolean default false
);


Of course, musicians are in many bands. So, there's this many-to-many
table to associate musicians with bands:

table musician_band(
musician integer REFERENCES musician,
band integer REFERENCE band
);

Is it possible with just joins? I thought I read DBIC could do custom
joins now. This turns out to be *much faster* than the correlated
sub-query below.

That is, LEFT JOIN with tracks *specific to the songwriter* and check for
NULL

SELECT
musician_band.musician, musician.band, album.id as album

FROM
musician_band
LEFT JOIN band ON musician.band = band.id
LEFT JOIN album ON album.band = band.id
LEFT JOIN track ON track.album = album.id* AND track.songwriter =
musician_band.musician*
LEFT JOIN track_detail ON track_detail.id = track.id
WHERE
band in ( @bands )
AND (
track.id IS NULL -- NULL thus means there's no track written
by the musician on that album.
OR track_detail.is_on_album IS FALSE
)




Here, I think, is 'NOT EXISTS ($sub_query)' correlated sub-query. But, it
seems to be much slower due to the join on track_detail.

# List of all tracks authored by the musician
my $sub_query = $schema->resultset( 'Track' )->search(
{
songwriter => { -ident => 'me.musician' },
album => { -ident => 'album.id' },
'track_detail.is_on_album' => 1,
},
{
columns => ['id'],
alias => 'tracks_authored',
join => 'track_detail',
},
);


# List all musicians where they do not have an associated track

my $rs = $schema->resultset( 'MusicianBand' )->search(
{
'me.band' => { -in => \@bands },
'NOT EXISTS => $sub_query->as_query,
},
{
select => [qw/ me.musician me.band album.id /],
as => [qw/ musician band album /],
join => {
band => 'album',
},
},
);

Wow, Using the above correlated sub-query I see "Total runtime: 2396.274
ms". If I remove the join track_detail JOIN in the subquery I get: "Total
runtime: 0.229 ms"

If instead of the correlated sub-query approach I instead use the complex
JOIN it's then: "Total runtime: 0.357 ms". Time to look at the query plan.
--
Bill Moseley
***@hank.org
Peter Rabbitson
2013-06-03 12:04:44 UTC
Permalink
Post by Bill Moseley
After stumbling around for a while I realize I need some help with building
a query with DBIC.
That was a rather long email. It isn't immediately clear *which* part
you are still having a problem with. Can you rephrase the question...?

Cheers
Bill Moseley
2013-06-03 17:06:05 UTC
Permalink
Post by Bill Moseley
Post by Bill Moseley
After stumbling around for a while I realize I need some help with
building
Post by Bill Moseley
a query with DBIC.
That was a rather long email. It isn't immediately clear *which* part
you are still having a problem with. Can you rephrase the question...?
Did I see that more complex join conditions are possible now with DBIC w/o
a virtual view?


This is one of those "find where joined row do not exist" problems that is
often solved with a correlated sub-query. But, I think I can solve (and
much faster) with just a join. But I need to have extra join condition
(in *bold* below).


This is essentially the query I'm after where I join with the "track" table
but only if the track.songwriter is associated with the musician (because
there's other musicians that might have track rows).

SELECT
musician_band.musician, musician.band, album.id as album

FROM
musician_band
LEFT JOIN band ON musician.band = band.id
LEFT JOIN album ON album.band = band.id
LEFT JOIN track ON track.album = album.id* AND track.songwriter =
musician_band.musician*
LEFT JOIN track_detail ON track_detail.id = track.id

WHERE
band in ( @bands )
AND (
track.id IS NULL -- NULL thus means there's no track written
by the musician on that album.
OR track_detail.is_on_album IS FALSE
)


So, with the normal join it would be something like this:


schema->resultset( 'MusicianBand' )->search(
{
'me.band' => { -in => \@band_ids },
-or => [
'tracks.id' => undef,
'track_detail.is_on_album' => 0,
],
},
{
select => [qw/ me.musician me.band albums.id /],
as => [qw/ musician band album / ],
join => {
bands => {
albums => {
tracks => 'track_detail',
},
},
},
},
);
--
Bill Moseley
***@hank.org
Bill Moseley
2013-06-06 14:26:33 UTC
Permalink
Peter, we discussed multiple join conditions in 2009 (
http://dbix-class.35028.n2.nabble.com/Left-join-with-an-extra-condition-td3880896.html
).

Is that still the case that I need to use a virtual view?
Post by Bill Moseley
Post by Bill Moseley
Post by Bill Moseley
After stumbling around for a while I realize I need some help with
building
Post by Bill Moseley
a query with DBIC.
That was a rather long email. It isn't immediately clear *which* part
you are still having a problem with. Can you rephrase the question...?
Did I see that more complex join conditions are possible now with DBIC w/o
a virtual view?
This is one of those "find where joined row do not exist" problems that is
often solved with a correlated sub-query. But, I think I can solve (and
much faster) with just a join. But I need to have extra join condition
(in *bold* below).
This is essentially the query I'm after where I join with the "track"
table but only if the track.songwriter is associated with the musician
(because there's other musicians that might have track rows).
SELECT
musician_band.musician, musician.band, album.id as album
FROM
musician_band
LEFT JOIN band ON musician.band = band.id
LEFT JOIN album ON album.band = band.id
LEFT JOIN track ON track.album = album.id* AND track.songwriter
= musician_band.musician*
LEFT JOIN track_detail ON track_detail.id = track.id
WHERE
AND (
track.id IS NULL -- NULL thus means there's no track
written by the musician on that album.
OR track_detail.is_on_album IS FALSE
)
schema->resultset( 'MusicianBand' )->search(
{
-or => [
'tracks.id' => undef,
'track_detail.is_on_album' => 0,
],
},
{
select => [qw/ me.musician me.band albums.id /],
as => [qw/ musician band album / ],
join => {
bands => {
albums => {
tracks => 'track_detail',
},
},
},
},
);
--
Bill Moseley
--
Bill Moseley
***@hank.org
xli
2013-06-06 20:51:25 UTC
Permalink
Post by Bill Moseley
much faster) with just a join. But I need to have extra join condition
(in *bold* below).
Is this what you're after?

http://search.cpan.org/dist/DBIx-Class/lib/DBIx/Class/Relationship/Base.pm#condition

"will result in the JOIN clause:

... FROM book me
LEFT JOIN edition editions ON
editions.publisher_id = me.publisher_id
AND editions.type_id = me.type_id ...
"
Bill Moseley
2013-06-07 06:54:54 UTC
Permalink
Post by xli
Post by Bill Moseley
much faster) with just a join. But I need to have extra join condition
(in *bold* below).
Is this what you're after?
http://search.cpan.org/dist/DBIx-Class/lib/DBIx/Class/Relationship/Base.pm#condition
Yes, took me a few tries to understand it. I was confused about how to
specify a possibly unknown alias for the join that referenced a table that
was not directly involved in the relation between albums and tracks. I.e.
not "foreign" or "self". But, it's just a custom relationship so I can
make it custom. :)

This seems to work:


DB::Albums->has_many( musician_tracks => 'DB::Tracks',
sub {
my $args = shift;
return {
"$args->{foreign_alias}.album" => { -ident =>
"$args->{self_alias}.id" },
"$args->{foreign_alias}.songwriter" => { -ident =>
'me.musician' },
};
},
);


Thanks for making me look at it again.
--
Bill Moseley
***@hank.org
Loading...