Gaurav Ashara February 2016

Datawharehouse Fact Table Geneate

Currently, I am building fact table base my ODS tables and dimension tables. My ODS Table like

1.dateWiseData

sid_date

total_impressions

2.devices(Mobile PHONE,PC,etc)

sid_date

device_id

device_name

total_impressions

3.AdvertiserData

sid_date

advertiser_id

advertiser_name

total_impressions

Fact Table

id

sid_date

devicesID

advertiserID

total_impressions

Here, We facing issue to join all three table data in single fact table. In this case we have three different total impression on each table. But in fact table we just add only single total impression. How we can calculate it and join then. We tried different join technicians but didn't find perfect solution.

Please help us handle this case

Answers


Ron Dunn February 2016

Are you trying to build a dimensional (Kimball) data warehouse? If so, there are some changes you can make.

(1) You don't need dateWiseData.

(2) You don't need total_impressions on devices.

(3) You don't need total_impressions on advertiserData.

Might I suggest that your model should look like more like this:

calendar (id, cal_date, cal_year, cal_month, ... )
device (id, name)
advertiser (id, name)
impression (calendar_id, device_id, advertiser_id, impression_count)

This assumes that a fact can have more than one impression. If this is not correct, then you have what is known as a "factless fact", and the table should look like this:

impression (calendar_id, device_id, advertiser_id)

Now you can run queries that, for example, calculate the total impressions by device on a given date:

select device.name,count(*)
from   impression
       inner join device on device.id = device_id
       inner join calendar on calendar.id = calendar_id
where  calendar.cal_date = '2016-02-08'
group  by device.name;

Does that meet your requirement?

EDIT: Respond to question, below.

In that case you want the first form of the fact table:

impression (calendar_id, device_id, advertiser_id, impression_count)

and your query to find impressions by advertiser looks like this:

select advertiser.name,sum(impression_count)
from   impression
       inner join advertiser on advertiser.id = advertiser_id
       inner join calendar on calendar.id = calendar_id
where  calendar.cal_date = '2016-02-08'
group  by advertiser.name;

If you attempt to carry totals anywhere else you will end up in a world of pain.

Post Status

Asked in February 2016
Viewed 2,177 times
Voted 6
Answered 1 times

Search




Leave an answer