Wednesday, March 28, 2012

outer-join results to cartesian product .... help!

All,

A very happy New Year to you all!!!

I have two tables f and sm

structure for f:

uid int not null,
cbuid int not null,
pid int not null,
gid int not null,
sid int not null,
mnth tinyint not null

structure for sm:

sid int not null,
mnth tinyint not null

contents/rows in table f are:

uid cbid pid gid sid mnth
-- -- -- -- -- --
8 92 10057 4 40 2
8 92 10057 4 40 3
8 92 10057 4 40 4
18 125 10057 4 40 2

contents/rows in table sm are:

sid mnth
-- --
40 2
40 3
40 4
40 5

The requirement is compare (f, sm) and return matching and non-matching
rows.

now the sql:

1)

select f.uid, f.cbid, f.pid, f.gid, f.sid, f.mnth, sm.mnth
from f left join sm on f.sid = sm.sid
where f.Pid = 10057 AND
f.gid = 4 AND f.cbid = '125'

output:

uid cbid pid gid sid
mnth mnth
---- ------- ---- ---- ----
-- --
18 125 10057 4 40
2 5 -> Row retrieved
18 125 10057 4 40
2 4
18 125 10057 4 40
2 3
18 125 10057 4 40
2 2

The above output returns as expected until I change the predicate....
See below:

select f.uid, f.cbid, f.pid, f.gid, f.sid, f.mnth, sm.mnth
from f left join sm on f.sid = sm.sid
where f.Pid = 10057 AND
f.gid = 4 AND f.cbid = '92'

output:

uid cbid pid gid sid
mnth mnth
---- ------- ---- ---- ----
-- --
8 92 10057 4 40
2 5
8 92 10057 4 40
3 5
8 92 10057 4 40
4 5
8 92 10057 4 40
2 4
8 92 10057 4 40
3 4
8 92 10057 4 40
4 4
8 92 10057 4 40
2 3
8 92 10057 4 40
3 3
8 92 10057 4 40
4 3
8 92 10057 4 40
2 2
8 92 10057 4 40
3 2
8 92 10057 4 40
4 2

The above output seems to be cartesian ?

Please help on how to resolve ...

2)

Is there a way where I could have non-matching rows like MINUS in
Oracle... I even tried NOT EXISTS but that did not work...

Any thoughts would be highly appreciated...
Thanks a bunch in advance,
AnuOn 12 Jan 2005 19:24:20 -0800, anuu_radhaa@.yahoo.com wrote:

(snip)
>The above output seems to be cartesian ?
>Please help on how to resolve ...

Hi Anu,

The output appears to be correct. Three rows in table f match the filter
condition in the WHERE clause. Each of these three rows matches the join
condition in the ON clause for all 4 rows in table sm, so you'll get a
result set of (3 x 4 =) 12 rows.

You seem to expect different results, but you didn't specify what the
desired results are and why.

>2)
>Is there a way where I could have non-matching rows like MINUS in
>Oracle... I even tried NOT EXISTS but that did not work...

I don't know Oracle, nor the MINUS operator. Is MINUS the Orcale
implementation of the ANSI-standard EXCEPT operation? Or does it something
else?

Both of your questions can be answered lots better if you provide
a) A SQL script to create your tables (including constraints and indexes,
but excluding irrelevant columns) and fill them with some sample data, and
b) The expected output, along with aan explanation.

Also, read http://www.aspfaq.com/5006.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)|||2)

You can do it with an outer join. Example:

CREATE TABLE A (x INTEGER PRIMARY KEY)
CREATE TABLE B (x INTEGER PRIMARY KEY)

INSERT INTO A VALUES (1)

SELECT A.x
FROM A
LEFT JOIN B
ON A.x = B.x
WHERE B.x IS NULL

--
David Portas
SQL Server MVP
--|||Thanks all for the info.

Here are the details

CREATE TABLE [FCast] (
[BusinessUnitId] [int] NOT NULL ,
[UserId] [int] NOT NULL ,
[SeasonId] [int] NOT NULL ,
[FMonth] [tinyint] NULL ,
[DivisionId] [int] NOT NULL ,
[ProductId] [int] NOT NULL
)
GO

INSERT INTO [FCast] VALUES ( 92, 8, 40, 2, 4, 10057 )
GO

INSERT INTO [FCast] VALUES ( 92, 8, 40, 3, 4, 10057 )
GO

INSERT INTO [FCast] VALUES ( 92, 8, 40, 4, 4, 10057 )
GO

INSERT INTO [FCast] VALUES ( 125, 18, 40, 2, 4, 10057 )
GO

CREATE TABLE [SMonths] (
[SeasonId] [int] NOT NULL ,
[SMonth] [tinyint] NOT NULL
)

GO

INSERT INTO [SMONTHS] VALUES (40, 2)
GO

INSERT INTO [SMONTHS] VALUES (40, 3)
GO

INSERT INTO [SMONTHS] VALUES (40, 4)
GO

INSERT INTO [SMONTHS] VALUES (40, 5)
GO

one of my colleage happened to delete all those values having 'null'
which caused the problem.

for every month in smonths there would be a row in fcast for a
productid. earlier, the application,
would insert a row into fcast table with month value as 'null'. This
was actually a application bug.
to resolve this, my colleage did took up a hasty decision and wrote a
SQL which really blew up
all the rows in production environment...

the funniest part is, it is almost 3 months after this SQL is executed.

so, database restore is not possible...

hence, thought of writing a SQL which populates the missing rows in
fcast table.

so, now the requirement is to insert the missing rows in fcast table.

the query which i framed works fine for businessunitid = 125 and fails
for businessunitid = 92

the output should be:

userid businessunitid productid divisionid seasonid smonth
-- ----- --- ---- --- --
18 125 10057 4 40 3
18 125 10057 4 40 4
18 125 10057 4 40 5
8 92 10057 4 40 5

this output would then be inserted into fcast table...

any ideas or thoughts would really help...
thanks in advance,

Anu|||On 13 Jan 2005 18:06:42 -0800, anuu wrote:

>Thanks all for the info.
>Here are the details
(snip)

Hi Anu,

Thanks. Unfortunately, therre still are some questions to ask.

1. What are the keys for your tables? For SMonths, either SMonth or
(SeasonId, SMonth) are logical possibilities. For FCast, I can't even
begin to guess.

2. In your example, the input for business unit 125 consists of one row;
the output has three rows, with the "missing" months and the remaining
columns taken from the one row that is present. Fine. For business unit
92, the situation gets muddy: you start withh three rows and want to
create one extra row for the "missing" month, again with the remainig
columns taken from the rows already presen. But which one? In your
example, the three rows for BU 92 all have user 8, division 4 and product
10057. What would be the expected output if the input changes to
INSERT INTO [FCast] VALUES ( 92, 8, 40, 2, 4, 10057 )
INSERT INTO [FCast] VALUES ( 92, 7, 40, 3, 3, 10056 )
INSERT INTO [FCast] VALUES ( 92, 6, 40, 4, 2, 10055 )

3. From your examples, it appears that there always is a row for the
"first" month of the season (month 2), but rows for subsequent months
might be missing. Is this a correct assumption or is your example
incomplete?

Here's some code that will produce the requested output from your sample
data, but relies very heavy on several assumptions. If my assumptions are
wrong, the code will produce incorrect results. I didn't try to optimize
it, as this is probably (hopefully!) a one-time operation.

SELECT f.BusinessUnitId, f.UserId, f.SeasonId,
s.SMonth, f.DivisionId, f.ProductId
FROM FCast AS f
INNER JOIN SMonths AS s
ON s.SeasonId = f.SeasonId
WHERE f.FMonth = (SELECT MIN(s2.SMonth)
FROM SMonths AS s2
WHERE s2.SeasonId = f.SeasonId)
AND NOT EXISTS (SELECT *
FROM FCast AS f2
WHERE f2.BusinessUnitId = f.BusinessUnitId
AND f2.FMonth = s.SMonth)

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hugo Kornelis wrote:
> On 13 Jan 2005 18:06:42 -0800, anuu wrote:
> >Thanks all for the info.
> >Here are the details
> (snip)
> Hi Anu,
> Thanks. Unfortunately, therre still are some questions to ask.
[Anu]: No issues, Hugo. Ready to answer the questions. Find below
embedded

> 1. What are the keys for your tables? For SMonths, either SMonth or
> (SeasonId, SMonth) are logical possibilities. For FCast, I can't even
> begin to guess.
[Anu]: For SMonths (SeasonId, SMonth) and for FCast (SeasonId, FMonth)
which relates to SMonths

> 2. In your example, the input for business unit 125 consists of one
row;
> the output has three rows, with the "missing" months and the
remaining
> columns taken from the one row that is present. Fine. For business
unit
> 92, the situation gets muddy: you start withh three rows and want to
> create one extra row for the "missing" month, again with the remainig
> columns taken from the rows already presen. But which one? In your
> example, the three rows for BU 92 all have user 8, division 4 and
product
> 10057. What would be the expected output if the input changes to
> INSERT INTO [FCast] VALUES ( 92, 8, 40, 2, 4, 10057 )
> INSERT INTO [FCast] VALUES ( 92, 7, 40, 3, 3, 10056 )
> INSERT INTO [FCast] VALUES ( 92, 6, 40, 4, 2, 10055 )
[Anu]: Fine. If SMonths has these values

SI Mo SI = SeasonId, Mo = Month
---
40, 2
40, 3
40, 4
40, 5

then for the above input below would be output
BU = Business Unit, UI = User Id, SI = Season ID, DI =
Division ID, Mo = Month, PI = Product ID

BU UI SI DI Mo PI
--------
92, 8, 40, 2, 2, 10057
92, 8, 40, 2, 3, 10057
92, 8, 40, 2, 5, 10057

92, 7, 40, 3, 2, 10056
92, 7, 40, 3, 4, 10056
92, 7, 40, 3, 5, 10056

92, 6, 40, 4, 3, 10055
92, 6, 40, 4, 4, 10055
92, 6, 40, 4, 5, 10055

In short, FCast table would have per ProductId, per BU, all the months
available for a season.

> 3. From your examples, it appears that there always is a row for the
> "first" month of the season (month 2), but rows for subsequent months
> might be missing. Is this a correct assumption or is your example
> incomplete?

[Anu]: Nope, the assumption is not correct. For a season, the months
spread would be defined in SMonths
table. So, for example, the SeasonId 40 has 12,1,2,3,4 defined then the
output for the above input (in point 2) would differ. The available
rows in FCast would _be_ the ones defined in SMonths.

> Here's some code that will produce the requested output from your
sample
> data, but relies very heavy on several assumptions. If my assumptions
are
> wrong, the code will produce incorrect results. I didn't try to
optimize
> it, as this is probably (hopefully!) a one-time operation.
> SELECT f.BusinessUnitId, f.UserId, f.SeasonId,
> s.SMonth, f.DivisionId, f.ProductId
> FROM FCast AS f
> INNER JOIN SMonths AS s
> ON s.SeasonId = f.SeasonId
> WHERE f.FMonth = (SELECT MIN(s2.SMonth)
> FROM SMonths AS s2
> WHERE s2.SeasonId = f.SeasonId)
> AND NOT EXISTS (SELECT *
> FROM FCast AS f2
> WHERE f2.BusinessUnitId = f.BusinessUnitId
> AND f2.FMonth = s.SMonth)
[Anu]: Thanks, Hugo. I would start working on this and see if I could
accomplish. Meanwhile, let me know if you need more info....

> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||On 14 Jan 2005 17:10:05 -0800, anuu wrote:

Hi Anu,

(snip)
>> 1. What are the keys for your tables? For SMonths, either SMonth or
>> (SeasonId, SMonth) are logical possibilities. For FCast, I can't even
>> begin to guess.
>>
>[Anu]: For SMonths (SeasonId, SMonth) and for FCast (SeasonId, FMonth)
>which relates to SMonths

Huh? The sample data you posted in your original post in this thread
violates the key you state for FCast - it has two rows for SeasonId 40,
FMonth 2, which would not be possible with the key you state above!

>> 2. In your example, the input for business unit 125 consists of one
>row;
>> the output has three rows, with the "missing" months and the
>remaining
>> columns taken from the one row that is present. Fine. For business
>unit
>> 92, the situation gets muddy: you start withh three rows and want to
>> create one extra row for the "missing" month, again with the remainig
>> columns taken from the rows already presen. But which one? In your
>> example, the three rows for BU 92 all have user 8, division 4 and
>product
>> 10057. What would be the expected output if the input changes to
>> INSERT INTO [FCast] VALUES ( 92, 8, 40, 2, 4, 10057 )
>> INSERT INTO [FCast] VALUES ( 92, 7, 40, 3, 3, 10056 )
>> INSERT INTO [FCast] VALUES ( 92, 6, 40, 4, 2, 10055 )
>>
>[Anu]: Fine. If SMonths has these values
>SI Mo SI = SeasonId, Mo = Month
>---
>40, 2
>40, 3
>40, 4
>40, 5
>
>then for the above input below would be output
>BU = Business Unit, UI = User Id, SI = Season ID, DI =
>Division ID, Mo = Month, PI = Product ID
>BU UI SI DI Mo PI
>--------
>92, 8, 40, 2, 2, 10057
>92, 8, 40, 2, 3, 10057
>92, 8, 40, 2, 5, 10057
>92, 7, 40, 3, 2, 10056
>92, 7, 40, 3, 4, 10056
>92, 7, 40, 3, 5, 10056
>92, 6, 40, 4, 3, 10055
>92, 6, 40, 4, 4, 10055
>92, 6, 40, 4, 5, 10055
>In short, FCast table would have per ProductId, per BU, all the months
>available for a season.

Again: huh? This data would never be accepted in the table if the primary
key for FCast is (SeasonID, FMonth), as you state above. So I guess that's
not the primary key after all.

Also, in a previous post you wrote "for every month in smonths there would
be a row in fcast for a productid". Now, you write that you need to have a
row for every month "per ProductId, per BU". Not exactly the same, right?

I guess I could now make a new guess at the primary key in FCast, then
change the code I posted before to reflect my new guess. But there would
still be a lot of uncertainty. So instead of wasting time on writing a new
query on insufficient specs, I'll now refer you to www.aspfaq.com/5006,
where you will find instructions on how to assemble the details you should
post here to get help, in the best format for this group: SQL.

Also, please tell me the expected output if the input looks like this:

BU UI SI DI Mo PI
--------
92, 8, 40, 2, 2, 10057
92, 7, 40, 2, 3, 10057
92, 7, 40, 3, 5, 10057

From your description above, I guess there should be one extra row, for BU
92, PPI 10057, SI 40 and Mo 4 - but what should be the values for UI and
DI?

>> 3. From your examples, it appears that there always is a row for the
>> "first" month of the season (month 2), but rows for subsequent months
>> might be missing. Is this a correct assumption or is your example
>> incomplete?
>>
>[Anu]: Nope, the assumption is not correct. For a season, the months
>spread would be defined in SMonths
>table. So, for example, the SeasonId 40 has 12,1,2,3,4 defined then the
>output for the above input (in point 2) would differ. The available
>rows in FCast would _be_ the ones defined in SMonths.

And if the SeasonId 40 has months 12, 1, 2, 3, and 4, would there than be
any months that is "complete", such as month 2 was "complete" in your
original sample data?
Please post better sample data (as INSERT statements - see the link I
supplied above), indicating all possible situations. The "garbage in,
garbage out" principle applies in this group as much as anywhere else!

>[Anu]: Thanks, Hugo. I would start working on this and see if I could
>accomplish. Meanwhile, let me know if you need more info....

I don't "need" more info. But if could probably help you better if you
provided more info...

If you need more help, then please provide table structure (as CREATE
TABLE statements, including constrainst but excluding irrelevant columns),
sample data (as INSERT statements) and expected output. In case you missed
the link above: see www.aspfaq.com/5006.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hi Anu,

(snip)

>> 1. What are the keys for your tables? For SMonths, either SMonth or
>> (SeasonId, SMonth) are logical possibilities. For FCast, I can't
even
>> begin to guess.

>[Anu]: For SMonths (SeasonId, SMonth) and for FCast (SeasonId, FMonth)
>which relates to SMonths

Huh? The sample data you posted in your original post in this thread
violates the key you state for FCast - it has two rows for SeasonId 40,
FMonth 2, which would not be possible with the key you state above!

[Anu_Again] Hugo, the key mentioned is Foreign keys and not Primary.

- Hide quoted text -
- Show quoted text -

>> 2. In your example, the input for business unit 125 consists of one
>row;
>> the output has three rows, with the "missing" months and the
>remaining
>> columns taken from the one row that is present. Fine. For business
>unit
>> 92, the situation gets muddy: you start withh three rows and want to
>> create one extra row for the "missing" month, again with the
remainig
>> columns taken from the rows already presen. But which one? In your
>> example, the three rows for BU 92 all have user 8, division 4 and
>product
>> 10057. What would be the expected output if the input changes to

>> INSERT INTO [FCast] VALUES ( 92, 8, 40, 2, 4, 10057 )
>> INSERT INTO [FCast] VALUES ( 92, 7, 40, 3, 3, 10056 )
>> INSERT INTO [FCast] VALUES ( 92, 6, 40, 4, 2, 10055 )
>[Anu]: Fine. If SMonths has these values

>SI Mo SI = SeasonId, Mo = Month

>---
>40, 2
>40, 3
>40, 4
>40, 5

>then for the above input below would be output

>BU = Business Unit, UI = User Id, SI = Season ID, DI =
>Division ID, Mo = Month, PI = Product ID

>BU UI SI DI Mo PI

>--------
>92, 8, 40, 2, 2, 10057
>92, 8, 40, 2, 3, 10057
>92, 8, 40, 2, 5, 10057

>92, 7, 40, 3, 2, 10056
>92, 7, 40, 3, 4, 10056
>92, 7, 40, 3, 5, 10056

>92, 6, 40, 4, 3, 10055
>92, 6, 40, 4, 4, 10055
>92, 6, 40, 4, 5, 10055

>In short, FCast table would have per ProductId, per BU, all the months
>available for a season.

Again: huh? This data would never be accepted in the table if the
primary
key for FCast is (SeasonID, FMonth), as you state above. So I guess
that's
not the primary key after all.

Also, in a previous post you wrote "

for every month in smonths there would
be a row in fcast for a productid

". Now, you write that you need to have a
row for every month "per ProductId, per BU". Not exactly the same,
right?

[Anu_again]: Hugo, it is foreign key and not primary key. primary key
is an identity column which I did not incude
in the structure as I thought that will not make any difference.
Nope, it is same. all these ProductID, BU are all foreign keys in FCast
table. the primary key
is only an identity column.

I guess I could now make a new guess at the primary key in FCast, then
change the code I posted before to reflect my new guess. But there
would
still be a lot of uncertainty. So instead of wasting time on writing a
new
query on insufficient specs, I'll now refer you to www.aspfaq.com/5006,
where you will find instructions on how to assemble the details you
should
post here to get help, in the best format for this group: SQL.

[Anu_again]: No guesses.....

Also, please tell me the expected output if the input looks like this:

BU UI SI DI Mo PI

--------
92, 8, 40, 2, 2, 10057
92, 7, 40, 2, 3, 10057
92, 7, 40, 3, 5, 10057

>From your description above, I guess there should be one extra row, for
BU
92, PPI 10057, SI 40 and Mo 4 - but what should be the values for UI
and
DI?

[Anu_again] : should be 92, 7, 40, 3, 4, 10057. You are right

>> 3. From your examples, it appears that there always is a row for the
>> "first" month of the season (month 2), but rows for subsequent
months
>> might be missing. Is this a correct assumption or is your example
>> incomplete?

>[Anu]: Nope, the assumption is not correct. For a season, the months
>spread would be defined in SMonths
>table. So, for example, the SeasonId 40 has 12,1,2,3,4 defined then
the
>output for the above input (in point 2) would differ. The available
>rows in FCast would _be_ the ones defined in SMonths.

And if the SeasonId 40 has months 12, 1, 2, 3, and 4, would there than
be
any months that is "complete", such as month 2 was "complete" in your
original sample data?
[Anu_again]: Complete ? The available months in FCast table are
considered as complete and the ones
not are to be INSERTed

Please post better sample data (as INSERT statements - see the link I
supplied above), indicating all possible situations. The "garbage in,
garbage out" principle applies in this group as much as anywhere else!

[Anu_again]: I feel, I did not communicate properly and this caused the
confusion otherwise you are in
the right track.

>[Anu]: Thanks, Hugo. I would start working on this and see if I could
>accomplish. Meanwhile, let me know if you need more info....

I don't "need" more info. But if could probably help you better if you
provided more info...

If you need more help, then please provide table structure (as CREATE
TABLE statements, including constrainst but excluding irrelevant
columns),
sample data (as INSERT statements) and expected output. In case you
missed
the link above: see

www.aspfaq.com/5006.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address|||On 16 Jan 2005 16:25:23 -0800, anuu wrote:

(snip)
>[Anu_again]: Hugo, it is foreign key and not primary key. primary key
>is an identity column which I did not incude
>in the structure as I thought that will not make any difference.

Hi Anu,

You're right, knowing that the primary key is an identity column doesn't
help me to help you. But knowing the natural key of your data would have
helped. I assume that you do know the difference between an artificial key
(identity) and a natural key? I also assume that you are aware than even
if you use an identity column as primary key, the natural key should still
be declared (using a UNIQEU constraint)?

If you had posted your table structure and illustrative sample data, as I
requested in my previous message, then I would now be able to see the
PRIMARY KEY constraint on the identity column, as well as the UNIQUE
constraint on whatever combination of columns makes up the natural key for
this table. This is information I really *need* in order to write a query
that returns the rows you need.

(snip)
>>Also, please tell me the expected output if the input looks like this:
>>
>>
>>BU UI SI DI Mo PI
>>
>>
>>--------
>>92, 8, 40, 2, 2, 10057
>>92, 7, 40, 2, 3, 10057
>>92, 7, 40, 3, 5, 10057
>>
>>>From your description above, I guess there should be one extra row, for
>>BU
>>92, PPI 10057, SI 40 and Mo 4 - but what should be the values for UI
>>and
>>DI?
>[Anu_again] : should be 92, 7, 40, 3, 4, 10057. You are right

While I still don't know the natural key of your table, your answer
supports my hunch that the natural key is the combination of (business
unit, productid, seasonid, month).
On the other hand, your answer also raises some questions. WHY should the
user id in the extra row be 7 (as in the rows for march and may), not 8
(as in the row for february)? And why should the division in the extra row
be 3 (as in the row for may), not 2 (as in the rows for february and
march)? This part of the specifications is still unclear!

(snip)
>>Please post better sample data (as INSERT statements - see the link I
>>supplied above), indicating all possible situations. The "garbage in,
>>garbage out" principle applies in this group as much as anywhere else!
>[Anu_again]: I feel, I did not communicate properly and this caused the
>confusion otherwise you are in
>the right track.

You're right. The proper way to communicate in this group, is to post your
table structure as CREATE TABLE statements, including all constraints and
properties, some illustrative sample data as INSERT statements and the
output expected from that sample data.
You did post a partial table structure in an earlier post, but you didn't
include the constraints. You also posted some sample data, but it was not
illustrative of your problem, so the query I wrote and tested against that
set of sample data will probably not be of much use.

If you still need assistance, I strongly urge you (again!) to read the
information at http://www.aspfaq.com/etiquette.asp?id=5006 and follow
those instructions to post the information and specifications that are
required to get a good working solution to your problem.
Without clear specifications, table structure and good sample data, I
really don't think I can help you.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

No comments:

Post a Comment