Showing posts with label warehouse. Show all posts
Showing posts with label warehouse. Show all posts

Friday, March 9, 2012

Fail to call child package

Have a parent package that calls many child packages (over 30) for a daily data warehouse update. On any given day, it randomly fails to call a child package with the following error:

Error 0x800706BE while preparing to load the package. The remote procedure call failed.

Its not failing the same package each day. I need a better explanation of this error message.

Environment: SQL2005 Enterprise Dec2005 RTM on Itanium64 with Windows DataCenter.

Are all the Execute Package tasks in the parent package configured identically? Where are the child packages stored?|||Yes, all Exec Package tasks are set up the same. All packages are stored in MSDB on our DataWarehouse server. Because we have such a large volume of packages (200), we are using folders to separate the packages. The parent package calls children in the Dimensions and various other folders. The call failures span many of the children folders, so there doesn't seem to be a pattern.

Wednesday, March 7, 2012

Fact Table Design Question

We're putting together our data warehouse, and I had a questions regarding
design of fact tables for our situation. We have invoices and payments to
those invoices...would I include all information in one fact table, or would
I separate them into two tables? If I do two tables, can I include two fact
tables in an OLAP cube?
Thanks in advance.
You can use a view...
it's even better to use a partitioned view...
Message posted via http://www.sqlmonster.com
|||Hi,
It’s really depending how often you will be analyzing invoices and payments
together.
Sometimes - you can built two cubes and join them into virtual cube.
Very often - you should include all information in your fact table design.
Tomasz B.
"T." wrote:

> We're putting together our data warehouse, and I had a questions regarding
> design of fact tables for our situation. We have invoices and payments to
> those invoices...would I include all information in one fact table, or would
> I separate them into two tables? If I do two tables, can I include two fact
> tables in an OLAP cube?
> Thanks in advance.
>
>

Fact Table Design Question

We're putting together our data warehouse, and I had a questions regarding
design of fact tables for our situation. We have invoices and payments to
those invoices...would I include all information in one fact table, or would
I separate them into two tables? If I do two tables, can I include two fact
tables in an OLAP cube?
Thanks in advance.You can use a view...
it's even better to use a partitioned view...
Message posted via http://www.droptable.com|||Hi,
It’s really depending how often you will be analyzing invoices and payment
s
together.
Sometimes - you can built two cubes and join them into virtual cube.
Very often - you should include all information in your fact table design.
Tomasz B.
"T." wrote:

> We're putting together our data warehouse, and I had a questions regarding
> design of fact tables for our situation. We have invoices and payments to
> those invoices...would I include all information in one fact table, or wou
ld
> I separate them into two tables? If I do two tables, can I include two fa
ct
> tables in an OLAP cube?
> Thanks in advance.
>
>

Fact table design

Hello.

I am working on an educational data warehouse. I have a semi-additive fact table that deals with student grades and averages. It's as such:

School, Acad Year, Student, Course, Term, Week, Week Grade, Term Average

where Week Grade and Term Average are the measures.

The term average is an average calculated based on a specific formula on top of week grade. It's supposed to appear when the user is standing on All weeks member of Week's dimension.

I have an average calculation problem in the term average...say i want to compare the student term averages.
i don't know if its correct to write an mdx formula to get it or to create a separate table for the Term Average containing the granularity by term rather than by week.

Thanks in advance for your advice

Christina

I think your design has some problems. The biggest one is that you are mixing grain levels in the fact table (terms and weeks) and both are dates. My advice is to first focus on creating the fact table at the most detailed grain level, which I think is probably days (complete day). So

FactGrades: SchoolKey, DateKey, StudentKey, CourseKey, Grade.

The date key should have some week attributes so you know to which week the current day is referring to, and the same with the term. Your Date dimension should look something like this:

DimDate: DateKey, FullDate, CalendarYear, CalendarSemester, CalendarMonth, CalendarDayInMonth, TermYear, TermSemester, TermMonth?, TermWeek, TermDayNumber, etc, etc.

From this you could create an aggregate fact table for the term averages or better yet, derive the term average with an MDX expression when you create your Olap Cube in Analysis Services.

|||

Ok let's say i fix the date issue. i still have a concern. i want to precalculate the term average because the way to get it is based on many factors and i am trying to avoid its complexity. i prefer to add it to this fact table or to create a fact table just for it.

but this measure would be considered at a different granularity..is it correct to add it to this fact table or should i create a fact table just for it....i did the second option and want to make sure its correct.

and honestly that's why i separated the week from the term because i assumed that if i put it in a different fact table, i would need to use only the term key, and not the week.

in AS,i am using the scope function in the Calcuations to change the aggregated value of the Grade measure to TermAverage when the currentmember is at the Term level.

That's one of my concerns.

I have another issue resulting from this one... i would like to share it with you in case you could give me some advice.

1-assume i'm standing at class section level, which is a level in the student dimension (school, class, section, student). for a given course i want to compare the average between sections of the same class to see which section is performing better during this week.

2-i want to do the same but for a term instead of week

in case 1, the measure used should be the Grade.
in case 2, it should be the TermAverage.

when i added these measures to the cube, i set the aggregation property to average..however the avg is semi-additive and will only function on time dimension. i therefore created 2 calculated measures, one for the grade and another for the TermAverage as such:

Grade Average = Grade/count of records in the fact table
TermAverage Average = TermAverage/count of records in the other fact table containing only the TermAverage.

and then modified the scope calculation to put these measures in the aggergation instead of the original Grade and TermAverage because the other will give me Sum while standing at Class or Class Section while what i want is the average.

Am i doing right like that by creating both fact tables, doing the measures like that and then modifying the scope to do this? i would like to mention that the aggregation now is taking much more time to give results because of the calculated measures...

Thanks again for your help

Christina

|||

I really think you should try to compute the term average with the information from the daily grades using MDX. How complex can it be? Keep it as simple as possible. If you still want to have it separately, you should create a different fact table for the term average. The only thing is that this fact table should no longer have a dateKey but a TermKey or put it as a "measure" in the fact table so you don't have to maintain a term dimension. This value is the same as the one defined for that term in the Date dimension. This is for the Data Warehouse. E.g.

DimDate.Term: 200509

FactTermAverage: SchoolKey, StudentKey, CourseKey, Term (200509), Average (19.45)

In this way you can link the term with your date dimension in Analysis Services. Just make sure to set the attribute hierarchical relationships properly for the Date dimension, and the scope for each measure (Checking the term average per calendar day, calendar month, day, etc. doesn't make sense so the term average should be displayed as "NA" in these cases).

In regards to your second question, you should have a single cube in AS for a DW (best practice). This means you would have a count measure in the cube for each fact table. However you should try to design a single calculated measure called "Section Average" which is calculated depending on where the user is in the cube:

Case

When user is doing week comparisons
Grade/count of records in the grade fact table
When user is doing term comparisons
Term Average/count of records in the term fact table
Else
"NA"
End

|||

Thank you very much for your insight.

First i never knew i could link the Term to the Weekly dimension like that.. it saved me a lot.

I will also try to do the measure. The problem is that i'm not that good at MDX and i try to avoid using it..

You wondered how complex the MDX computation could be..well to me it's complex..i will explain it to you maybe you could help me again at this one..
in the time dimension i said i have Academic Year,Term and Week.

the term average=avg(Weeks 1-8)*0.60+avg(week 9)*0.40

The acad year average = Term 1*0.25 + Term 2 *0.25 + Term 3(week1-8)*0.20 + Term 3 (week9)*0.30

Thanks much

Christina

|||

hi .. i am also working on university data warehouse but i am only working on two subjects i.e. Student record and Admission ..if some one can help me in designing of the model ..

thanx

Zeeshan Ali

Fact table design

Hello.

I am working on an educational data warehouse. I have a semi-additive fact table that deals with student grades and averages. It's as such:

School, Acad Year, Student, Course, Term, Week, Week Grade, Term Average

where Week Grade and Term Average are the measures.

The term average is an average calculated based on a specific formula on top of week grade. It's supposed to appear when the user is standing on All weeks member of Week's dimension.

I have an average calculation problem in the term average...say i want to compare the student term averages.
i don't know if its correct to write an mdx formula to get it or to create a separate table for the Term Average containing the granularity by term rather than by week.

Thanks in advance for your advice

Christina

I think your design has some problems. The biggest one is that you are mixing grain levels in the fact table (terms and weeks) and both are dates. My advice is to first focus on creating the fact table at the most detailed grain level, which I think is probably days (complete day). So

FactGrades: SchoolKey, DateKey, StudentKey, CourseKey, Grade.

The date key should have some week attributes so you know to which week the current day is referring to, and the same with the term. Your Date dimension should look something like this:

DimDate: DateKey, FullDate, CalendarYear, CalendarSemester, CalendarMonth, CalendarDayInMonth, TermYear, TermSemester, TermMonth?, TermWeek, TermDayNumber, etc, etc.

From this you could create an aggregate fact table for the term averages or better yet, derive the term average with an MDX expression when you create your Olap Cube in Analysis Services.

|||

Ok let's say i fix the date issue. i still have a concern. i want to precalculate the term average because the way to get it is based on many factors and i am trying to avoid its complexity. i prefer to add it to this fact table or to create a fact table just for it.

but this measure would be considered at a different granularity..is it correct to add it to this fact table or should i create a fact table just for it....i did the second option and want to make sure its correct.

and honestly that's why i separated the week from the term because i assumed that if i put it in a different fact table, i would need to use only the term key, and not the week.

in AS,i am using the scope function in the Calcuations to change the aggregated value of the Grade measure to TermAverage when the currentmember is at the Term level.

That's one of my concerns.

I have another issue resulting from this one... i would like to share it with you in case you could give me some advice.

1-assume i'm standing at class section level, which is a level in the student dimension (school, class, section, student). for a given course i want to compare the average between sections of the same class to see which section is performing better during this week.

2-i want to do the same but for a term instead of week

in case 1, the measure used should be the Grade.
in case 2, it should be the TermAverage.

when i added these measures to the cube, i set the aggregation property to average..however the avg is semi-additive and will only function on time dimension. i therefore created 2 calculated measures, one for the grade and another for the TermAverage as such:

Grade Average = Grade/count of records in the fact table
TermAverage Average = TermAverage/count of records in the other fact table containing only the TermAverage.

and then modified the scope calculation to put these measures in the aggergation instead of the original Grade and TermAverage because the other will give me Sum while standing at Class or Class Section while what i want is the average.

Am i doing right like that by creating both fact tables, doing the measures like that and then modifying the scope to do this? i would like to mention that the aggregation now is taking much more time to give results because of the calculated measures...

Thanks again for your help

Christina

|||

I really think you should try to compute the term average with the information from the daily grades using MDX. How complex can it be? Keep it as simple as possible. If you still want to have it separately, you should create a different fact table for the term average. The only thing is that this fact table should no longer have a dateKey but a TermKey or put it as a "measure" in the fact table so you don't have to maintain a term dimension. This value is the same as the one defined for that term in the Date dimension. This is for the Data Warehouse. E.g.

DimDate.Term: 200509

FactTermAverage: SchoolKey, StudentKey, CourseKey, Term (200509), Average (19.45)

In this way you can link the term with your date dimension in Analysis Services. Just make sure to set the attribute hierarchical relationships properly for the Date dimension, and the scope for each measure (Checking the term average per calendar day, calendar month, day, etc. doesn't make sense so the term average should be displayed as "NA" in these cases).

In regards to your second question, you should have a single cube in AS for a DW (best practice). This means you would have a count measure in the cube for each fact table. However you should try to design a single calculated measure called "Section Average" which is calculated depending on where the user is in the cube:

Case

When user is doing week comparisons
Grade/count of records in the grade fact table
When user is doing term comparisons
Term Average/count of records in the term fact table
Else
"NA"
End

|||

Thank you very much for your insight.

First i never knew i could link the Term to the Weekly dimension like that.. it saved me a lot.

I will also try to do the measure. The problem is that i'm not that good at MDX and i try to avoid using it..

You wondered how complex the MDX computation could be..well to me it's complex..i will explain it to you maybe you could help me again at this one..
in the time dimension i said i have Academic Year,Term and Week.

the term average=avg(Weeks 1-8)*0.60+avg(week 9)*0.40

The acad year average = Term 1*0.25 + Term 2 *0.25 + Term 3(week1-8)*0.20 + Term 3 (week9)*0.30

Thanks much

Christina

|||

hi .. i am also working on university data warehouse but i am only working on two subjects i.e. Student record and Admission ..if some one can help me in designing of the model ..

thanx

Zeeshan Ali

Sunday, February 19, 2012

Extracting from the Warehouse

Hi,
I'm working on a warehouse project that loads data from a number of source
systems. There's a new project starting up in my company that will deliver
data to customers via the web. This data is more or less the same that is
being loaded into the warehouse. In fact, the intention is to use the data
model (with some modifications) from the warehouse for the web database.
My question is: Should the web database source it's data from the warehouse
or directly from the source systems?
Given the complexity of the transformations, I would have thought it would
be more practical to extract from the warehouse, rather than do the same work
twice from the source systems. The concern is that by adding another system
(ie the warehouse) into the process, there is a greater risk of something
going wrong, whereas extracting directly from the source, there's only one
place where there could be a problem.
If anyone has any references to white papers or case studies on either
option, or any thoughts themselves, I'd appreciate it.
Thanks,
Wreck.
Hello Wreck,
I agree with you using warehouse is more reasonable according to your
situaiton. Because data model in warehouse is more stable and you could
perform ETL and focus on data on warehouse side. Also, for aggration
information, it is not practical to calucate them on the fly when
porcessing query from clients via Web site for performance purpose. The
following articles might be helpful:
908618 MSDN Support WebCast: How to use ADOMD.NET in .NET Framework
applications
http://support.microsoft.com/?id=908618
How To: Traverse an ADO MD Cellset Object by Using Visual C# .NET
http://support.microsoft.com/default...b;en-us;828279
Best Regards,
Peter Yang
MCSE2000/2003, MCSA, MCDBA
Microsoft Online Partner Support
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
================================================== ===
This posting is provided "AS IS" with no warranties, and confers no rights.
--
>Thread-Topic: Extracting from the Warehouse
>thread-index: AcXat3s7NYEyOH1QQOGB+IwOj4BmTg==
>X-WBNR-Posting-Host: 203.110.135.198
>From: "=?Utf-8?B?V3JlY2s=?=" <Wreck@.community.nospam>
>Subject: Extracting from the Warehouse
>Date: Wed, 26 Oct 2005 22:30:03 -0700
>Lines: 24
>Message-ID: <FA7A38B4-C640-408E-BE3A-0DD79CBDF669@.microsoft.com>
>MIME-Version: 1.0
>Content-Type: text/plain;
>charset="Utf-8"
>Content-Transfer-Encoding: 7bit
>X-Newsreader: Microsoft CDO for Windows 2000
>Content-Class: urn:content-classes:message
>Importance: normal
>Priority: normal
>X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
>Newsgroups: microsoft.public.sqlserver.datawarehouse
>NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
>Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
>Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.sqlserver.datawarehouse:2378
>X-Tomcat-NG: microsoft.public.sqlserver.datawarehouse
>Hi,
>I'm working on a warehouse project that loads data from a number of source
>systems. There's a new project starting up in my company that will deliver
>data to customers via the web. This data is more or less the same that is
>being loaded into the warehouse. In fact, the intention is to use the data
>model (with some modifications) from the warehouse for the web database.
>My question is: Should the web database source it's data from the
warehouse
>or directly from the source systems?
>Given the complexity of the transformations, I would have thought it would
>be more practical to extract from the warehouse, rather than do the same
work
>twice from the source systems. The concern is that by adding another
system
>(ie the warehouse) into the process, there is a greater risk of something
>going wrong, whereas extracting directly from the source, there's only one
>place where there could be a problem.
>If anyone has any references to white papers or case studies on either
>option, or any thoughts themselves, I'd appreciate it.
>Thanks,
>Wreck.
>
|||Depends on the requirements for supplying the web data. What must the
availability for the web data be? You mentioned your company would like to
use the same model. hopefully it's just a small portion of the model/data.
Most data warehouses are built with tools like DTS or Informatica. Building
the DW with this type of tool usually requires large staging areas of data.
You could perhaps push the data you needed from the staging area to the Web
database.
"Wreck" <Wreck@.community.nospam> wrote in message
news:FA7A38B4-C640-408E-BE3A-0DD79CBDF669@.microsoft.com...
> Hi,
> I'm working on a warehouse project that loads data from a number of source
> systems. There's a new project starting up in my company that will deliver
> data to customers via the web. This data is more or less the same that is
> being loaded into the warehouse. In fact, the intention is to use the data
> model (with some modifications) from the warehouse for the web database.
> My question is: Should the web database source it's data from the
> warehouse
> or directly from the source systems?
> Given the complexity of the transformations, I would have thought it would
> be more practical to extract from the warehouse, rather than do the same
> work
> twice from the source systems. The concern is that by adding another
> system
> (ie the warehouse) into the process, there is a greater risk of something
> going wrong, whereas extracting directly from the source, there's only one
> place where there could be a problem.
> If anyone has any references to white papers or case studies on either
> option, or any thoughts themselves, I'd appreciate it.
> Thanks,
> Wreck.
>
|||Hi,
I believe the data needs to be available on a daily basis, arriving the next
day. So overnight batch processing should be sufficient. There are some feeds
that they need more frequently, which would make more sense to load directly,
or via our staging area.
I'm assuming the model will be a cut down version of the warehouse. Probably
not quite the same detail or granularity.
We are using DTS to load data into a staging area and then populate the
warehouse. I agree using a common staging area is also an option, something
like a federated warehouse.
The main issue I see is that the ETL process of getting this data in the
first place from multiple systems is fairly involved. To try and do this
twice seems like unnecessary effort, and there is a risk of the two ETL
processes diverging and creating different results. There should be a "single
source of truth"
On the other hand, adding another step between the source systems and the
web database means its one more thing that can go wrong, which is the concern
being expressed. I would have thought something like mirroring would address
this issue.
Using a common staging area to do the scrubbing/transforms is a good idea
but I haven't had much luck putting it forward. That's why I was asking for
any white papers on the subject.
Thanks,
Wreck.
"Danny" wrote:

> Depends on the requirements for supplying the web data. What must the
> availability for the web data be? You mentioned your company would like to
> use the same model. hopefully it's just a small portion of the model/data.
> Most data warehouses are built with tools like DTS or Informatica. Building
> the DW with this type of tool usually requires large staging areas of data.
> You could perhaps push the data you needed from the staging area to the Web
> database.
> "Wreck" <Wreck@.community.nospam> wrote in message
> news:FA7A38B4-C640-408E-BE3A-0DD79CBDF669@.microsoft.com...
>
>
|||Hi Peter,
The web application is sitting out in a DMZ, and won't have access to the
warehouse. The idea is to push data into the web application database, but
nothing will be allowed back through the fire wall.
Thanks for the links, I haven't looked at ADOMD.Net yet, but I'm more after
articles on the architecture / topology side of things.
Thanks,
Wreck.
"Peter Yang [MSFT]" wrote:

> Hello Wreck,
> I agree with you using warehouse is more reasonable according to your
> situaiton. Because data model in warehouse is more stable and you could
> perform ETL and focus on data on warehouse side. Also, for aggration
> information, it is not practical to calucate them on the fly when
> porcessing query from clients via Web site for performance purpose. The
> following articles might be helpful:
> 908618 MSDN Support WebCast: How to use ADOMD.NET in .NET Framework
> applications
> http://support.microsoft.com/?id=908618
> How To: Traverse an ADO MD Cellset Object by Using Visual C# .NET
> http://support.microsoft.com/default...b;en-us;828279
> Best Regards,
> Peter Yang
> MCSE2000/2003, MCSA, MCDBA
> Microsoft Online Partner Support
> When responding to posts, please "Reply to Group" via your newsreader so
> that others may learn and benefit from your issue.
> ================================================== ===
>
> This posting is provided "AS IS" with no warranties, and confers no rights.
>
> --
> warehouse
> work
> system
>
|||Hello Wreck,
I believe this kind of issue may fall under the umbrella of Advisory
Services. Microsoft now offers short-term and proactive assistance for
specific planning, design, development or assistance with installing,
deploying, and general "how to" advice via telephone. For more information:
http://support.microsoft.com/default...dvisoryservice
Best Regards,
Peter Yang
MCSE2000/2003, MCSA, MCDBA
Microsoft Online Partner Support
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
================================================== ===
This posting is provided "AS IS" with no warranties, and confers no rights.
--
>Thread-Topic: Extracting from the Warehouse
>thread-index: AcXbcLjRmGu+PdK4TIK9aXmjx+gQZw==
>X-WBNR-Posting-Host: 203.110.135.198
>From: "=?Utf-8?B?V3JlY2s=?=" <Wreck@.community.nospam>
>References: <FA7A38B4-C640-408E-BE3A-0DD79CBDF669@.microsoft.com>
<9C28f.29353$gF4.25576@.trnddc07>
>Subject: Re: Extracting from the Warehouse
>Date: Thu, 27 Oct 2005 20:36:03 -0700
>Lines: 75
>Message-ID: <ECA3D407-B302-4765-A301-7140451F88A5@.microsoft.com>
>MIME-Version: 1.0
>Content-Type: text/plain;
>charset="Utf-8"
>Content-Transfer-Encoding: 7bit
>X-Newsreader: Microsoft CDO for Windows 2000
>Content-Class: urn:content-classes:message
>Importance: normal
>Priority: normal
>X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
>Newsgroups: microsoft.public.sqlserver.datawarehouse
>NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
>Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
>Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.sqlserver.datawarehouse:2384
>X-Tomcat-NG: microsoft.public.sqlserver.datawarehouse
>Hi,
>I believe the data needs to be available on a daily basis, arriving the
next
>day. So overnight batch processing should be sufficient. There are some
feeds
>that they need more frequently, which would make more sense to load
directly,
>or via our staging area.
>I'm assuming the model will be a cut down version of the warehouse.
Probably
>not quite the same detail or granularity.
>We are using DTS to load data into a staging area and then populate the
>warehouse. I agree using a common staging area is also an option,
something
>like a federated warehouse.
>The main issue I see is that the ETL process of getting this data in the
>first place from multiple systems is fairly involved. To try and do this
>twice seems like unnecessary effort, and there is a risk of the two ETL
>processes diverging and creating different results. There should be a
"single
>source of truth"
>On the other hand, adding another step between the source systems and the
>web database means its one more thing that can go wrong, which is the
concern
>being expressed. I would have thought something like mirroring would
address
>this issue.
>Using a common staging area to do the scrubbing/transforms is a good idea
>but I haven't had much luck putting it forward. That's why I was asking
for[vbcol=seagreen]
>any white papers on the subject.
>Thanks,
>Wreck.
>
>"Danny" wrote:
to[vbcol=seagreen]
model/data.[vbcol=seagreen]
Building[vbcol=seagreen]
data.[vbcol=seagreen]
Web[vbcol=seagreen]
source[vbcol=seagreen]
deliver[vbcol=seagreen]
is[vbcol=seagreen]
data[vbcol=seagreen]
database.[vbcol=seagreen]
would[vbcol=seagreen]
same[vbcol=seagreen]
something[vbcol=seagreen]
one
>
|||Thanks Peter, but this is not really a technology specific problem.
It's more a question about the approach/design options for moving data
through the enterprise.
As I see it, the options are:
1 - Do multiple extracts, and multiple transformations from single source
systems to different databases;
2 - Have a single ETL process into the warehouse, which then feeds other
databases further downstream;
3 - Something in between where we have a common staging area, but not
everything is loaded into the warehouse or web database, similar to a
federated data warehouse.
The issues lie around latency, redundant/diverging code, and risk of an
extra system (ie the warehouse) failing, and thus impacting the web
application's data.
I see options 2 or 3 as the better way to go, but the people responsible for
the web application prefer option 1. I'm looking for info on any of the
options so that I can either back my point of view, or understand the problem
better, and why option 1 would be better.
Sorry if I haven't been clear. Maybe I'm posting to the wrong forum.
Thanks,
Wreck.
"Peter Yang [MSFT]" wrote:

> Hello Wreck,
> I believe this kind of issue may fall under the umbrella of Advisory
> Services. Microsoft now offers short-term and proactive assistance for
> specific planning, design, development or assistance with installing,
> deploying, and general "how to" advice via telephone. For more information:
> http://support.microsoft.com/default...dvisoryservice
> Best Regards,
> Peter Yang
> MCSE2000/2003, MCSA, MCDBA
> Microsoft Online Partner Support
> When responding to posts, please "Reply to Group" via your newsreader so
> that others may learn and benefit from your issue.
> ================================================== ===
>
> This posting is provided "AS IS" with no warranties, and confers no rights.
>
> --
|||Hello Wreck,
Since the Web application looks like a reporting/Analysis type program, I
think using option 2 or 3 is more appropriate. However, if
performance/availbility is first priority, using a differernt database
might be option because OLAP processing/updating might cause performance
problem and bring more complexity into consideration.
If the Web application wants to use OLAP cube directly, option 2 is better
because you could build seperate cube for this specific Web application
while using the same data source in warehouse.
Using a common staging area could bring less performance impact on the OLTP
databases as it is not necssary to transfer data twice from them.
As you know, this selection is a balance consideration which is dependent
on how Web application is built and which factors have higher priorty.
Thus, there is no general Whitepaper/Case study on such specific
design/planning problem.
As I mentioned, Advisory Services might be a more appropriate on this kind
of consulting type issue. Of course, you might recive other suggestions
from the community. :-)
Best Regards,
Peter Yang
MCSE2000/2003, MCSA, MCDBA
Microsoft Online Partner Support
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
================================================== ===
This posting is provided "AS IS" with no warranties, and confers no rights.
--
>Thread-Topic: Extracting from the Warehouse
>thread-index: AcXdziuqX8OUZxY7ReaseSkr6r586w==
>X-WBNR-Posting-Host: 203.110.135.198
>From: "=?Utf-8?B?V3JlY2s=?=" <Wreck@.community.nospam>
>References: <FA7A38B4-C640-408E-BE3A-0DD79CBDF669@.microsoft.com>
<9C28f.29353$gF4.25576@.trnddc07>
<ECA3D407-B302-4765-A301-7140451F88A5@.microsoft.com>
<Px#AwZc3FHA.3220@.TK2MSFTNGXA01.phx.gbl>
>Subject: Re: Extracting from the Warehouse
>Date: Sun, 30 Oct 2005 19:50:01 -0800
>Lines: 57
>Message-ID: <8074ABBA-63F9-422B-A0ED-C62FD5666CC6@.microsoft.com>
>MIME-Version: 1.0
>Content-Type: text/plain;
>charset="Utf-8"
>Content-Transfer-Encoding: 7bit
>X-Newsreader: Microsoft CDO for Windows 2000
>Content-Class: urn:content-classes:message
>Importance: normal
>Priority: normal
>X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
>Newsgroups: microsoft.public.sqlserver.datawarehouse
>NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
>Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
>Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.sqlserver.datawarehouse:2390
>X-Tomcat-NG: microsoft.public.sqlserver.datawarehouse
>Thanks Peter, but this is not really a technology specific problem.
>It's more a question about the approach/design options for moving data
>through the enterprise.
>As I see it, the options are:
>1 - Do multiple extracts, and multiple transformations from single source
>systems to different databases;
>2 - Have a single ETL process into the warehouse, which then feeds other
>databases further downstream;
>3 - Something in between where we have a common staging area, but not
>everything is loaded into the warehouse or web database, similar to a
>federated data warehouse.
>The issues lie around latency, redundant/diverging code, and risk of an
>extra system (ie the warehouse) failing, and thus impacting the web
>application's data.
>I see options 2 or 3 as the better way to go, but the people responsible
for
>the web application prefer option 1. I'm looking for info on any of the
>options so that I can either back my point of view, or understand the
problem[vbcol=seagreen]
>better, and why option 1 would be better.
>Sorry if I haven't been clear. Maybe I'm posting to the wrong forum.
>Thanks,
>Wreck.
>
>"Peter Yang [MSFT]" wrote:
information:[vbcol=seagreen]
rights.
>
|||Hi Wreck,
the most basic advice is OLTP happens in OLTP systems and BI happens of
ODS/EDW systems....there can be some update in ODS/EDW but it is a
mistake to do too much..
So, if your web app is mostly query with little update having one set
of ETL would make a good idea...if it is mostly update you should write
it like an OLTP system and pass data to/from the OLTP systems and then
forward it onto the ODS/EDW.
You are correct in thinking that writing two lots of ETL is a bad
idea...ETL is easily the most people expensive portion of a BI
project...
Best Regards
Peter
www.peternolan.com
|||Thanks Peter.
As far as I know, there won't be much transactional activity on the web. If
so, I'm sure it would be modelled to handle that. Having said that, the
historical reporting they wish to do would still feed from the
warehouse/source system the same way.
Thanks for your comments, I'm glad to see I'm not off the mark with this.
The warehouse has been sidelined to an extent, which is what I'm trying to
address before the web app goes a long way into development.
Do you know of any good articles to back me up? So far it's been a losing
battle trying to get people to use the warehouse for more effectively.
Thanks,
Wreck.
"Peter Nolan" wrote:

> Hi Wreck,
> the most basic advice is OLTP happens in OLTP systems and BI happens of
> ODS/EDW systems....there can be some update in ODS/EDW but it is a
> mistake to do too much..
> So, if your web app is mostly query with little update having one set
> of ETL would make a good idea...if it is mostly update you should write
> it like an OLTP system and pass data to/from the OLTP systems and then
> forward it onto the ODS/EDW.
> You are correct in thinking that writing two lots of ETL is a bad
> idea...ETL is easily the most people expensive portion of a BI
> project...
> Best Regards
> Peter
> www.peternolan.com
>

Extracting from the Warehouse

Hi,
I'm working on a warehouse project that loads data from a number of source
systems. There's a new project starting up in my company that will deliver
data to customers via the web. This data is more or less the same that is
being loaded into the warehouse. In fact, the intention is to use the data
model (with some modifications) from the warehouse for the web database.
My question is: Should the web database source it's data from the warehouse
or directly from the source systems?
Given the complexity of the transformations, I would have thought it would
be more practical to extract from the warehouse, rather than do the same wor
k
twice from the source systems. The concern is that by adding another system
(ie the warehouse) into the process, there is a greater risk of something
going wrong, whereas extracting directly from the source, there's only one
place where there could be a problem.
If anyone has any references to white papers or case studies on either
option, or any thoughts themselves, I'd appreciate it.
Thanks,
Wreck.Hello Wreck,
I agree with you using warehouse is more reasonable according to your
situaiton. Because data model in warehouse is more stable and you could
perform ETL and focus on data on warehouse side. Also, for aggration
information, it is not practical to calucate them on the fly when
porcessing query from clients via Web site for performance purpose. The
following articles might be helpful:
908618 MSDN Support WebCast: How to use ADOMD.NET in .NET Framework
applications
http://support.microsoft.com/?id=908618
How To: Traverse an ADO MD Cellset Object by Using Visual C# .NET
http://support.microsoft.com/defaul...kb;en-us;828279
Best Regards,
Peter Yang
MCSE2000/2003, MCSA, MCDBA
Microsoft Online Partner Support
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
========================================
=============
This posting is provided "AS IS" with no warranties, and confers no rights.
--
>Thread-Topic: Extracting from the Warehouse
>thread-index: AcXat3s7NYEyOH1QQOGB+IwOj4BmTg==
>X-WBNR-Posting-Host: 203.110.135.198
>From: "examnotes" <Wreck@.community.nospam>
>Subject: Extracting from the Warehouse
>Date: Wed, 26 Oct 2005 22:30:03 -0700
>Lines: 24
>Message-ID: <FA7A38B4-C640-408E-BE3A-0DD79CBDF669@.microsoft.com>
>MIME-Version: 1.0
>Content-Type: text/plain;
> charset="Utf-8"
>Content-Transfer-Encoding: 7bit
>X-Newsreader: Microsoft CDO for Windows 2000
>Content-Class: urn:content-classes:message
>Importance: normal
>Priority: normal
>X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
>Newsgroups: microsoft.public.sqlserver.datawarehouse
>NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
>Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
>Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.sqlserver.datawarehouse:2378
>X-Tomcat-NG: microsoft.public.sqlserver.datawarehouse
>Hi,
>I'm working on a warehouse project that loads data from a number of source
>systems. There's a new project starting up in my company that will deliver
>data to customers via the web. This data is more or less the same that is
>being loaded into the warehouse. In fact, the intention is to use the data
>model (with some modifications) from the warehouse for the web database.
>My question is: Should the web database source it's data from the
warehouse
>or directly from the source systems?
>Given the complexity of the transformations, I would have thought it would
>be more practical to extract from the warehouse, rather than do the same
work
>twice from the source systems. The concern is that by adding another
system
>(ie the warehouse) into the process, there is a greater risk of something
>going wrong, whereas extracting directly from the source, there's only one
>place where there could be a problem.
>If anyone has any references to white papers or case studies on either
>option, or any thoughts themselves, I'd appreciate it.
>Thanks,
>Wreck.
>|||Depends on the requirements for supplying the web data. What must the
availability for the web data be? You mentioned your company would like to
use the same model. hopefully it's just a small portion of the model/data.
Most data warehouses are built with tools like DTS or Informatica. Building
the DW with this type of tool usually requires large staging areas of data.
You could perhaps push the data you needed from the staging area to the Web
database.
"Wreck" <Wreck@.community.nospam> wrote in message
news:FA7A38B4-C640-408E-BE3A-0DD79CBDF669@.microsoft.com...
> Hi,
> I'm working on a warehouse project that loads data from a number of source
> systems. There's a new project starting up in my company that will deliver
> data to customers via the web. This data is more or less the same that is
> being loaded into the warehouse. In fact, the intention is to use the data
> model (with some modifications) from the warehouse for the web database.
> My question is: Should the web database source it's data from the
> warehouse
> or directly from the source systems?
> Given the complexity of the transformations, I would have thought it would
> be more practical to extract from the warehouse, rather than do the same
> work
> twice from the source systems. The concern is that by adding another
> system
> (ie the warehouse) into the process, there is a greater risk of something
> going wrong, whereas extracting directly from the source, there's only one
> place where there could be a problem.
> If anyone has any references to white papers or case studies on either
> option, or any thoughts themselves, I'd appreciate it.
> Thanks,
> Wreck.
>|||Hi,
I believe the data needs to be available on a daily basis, arriving the next
day. So overnight batch processing should be sufficient. There are some feed
s
that they need more frequently, which would make more sense to load directly
,
or via our staging area.
I'm assuming the model will be a cut down version of the warehouse. Probably
not quite the same detail or granularity.
We are using DTS to load data into a staging area and then populate the
warehouse. I agree using a common staging area is also an option, something
like a federated warehouse.
The main issue I see is that the ETL process of getting this data in the
first place from multiple systems is fairly involved. To try and do this
twice seems like unnecessary effort, and there is a risk of the two ETL
processes diverging and creating different results. There should be a "singl
e
source of truth"
On the other hand, adding another step between the source systems and the
web database means its one more thing that can go wrong, which is the concer
n
being expressed. I would have thought something like mirroring would address
this issue.
Using a common staging area to do the scrubbing/transforms is a good idea
but I haven't had much luck putting it forward. That's why I was asking for
any white papers on the subject.
Thanks,
Wreck.
"Danny" wrote:

> Depends on the requirements for supplying the web data. What must the
> availability for the web data be? You mentioned your company would like t
o
> use the same model. hopefully it's just a small portion of the model/data.
> Most data warehouses are built with tools like DTS or Informatica. Buildi
ng
> the DW with this type of tool usually requires large staging areas of data
.
> You could perhaps push the data you needed from the staging area to the We
b
> database.
> "Wreck" <Wreck@.community.nospam> wrote in message
> news:FA7A38B4-C640-408E-BE3A-0DD79CBDF669@.microsoft.com...
>
>|||Hi Peter,
The web application is sitting out in a DMZ, and won't have access to the
warehouse. The idea is to push data into the web application database, but
nothing will be allowed back through the fire wall.
Thanks for the links, I haven't looked at ADOMD.Net yet, but I'm more after
articles on the architecture / topology side of things.
Thanks,
Wreck.
"Peter Yang [MSFT]" wrote:

> Hello Wreck,
> I agree with you using warehouse is more reasonable according to your
> situaiton. Because data model in warehouse is more stable and you could
> perform ETL and focus on data on warehouse side. Also, for aggration
> information, it is not practical to calucate them on the fly when
> porcessing query from clients via Web site for performance purpose. The
> following articles might be helpful:
> 908618 MSDN Support WebCast: How to use ADOMD.NET in .NET Framework
> applications
> http://support.microsoft.com/?id=908618
> How To: Traverse an ADO MD Cellset Object by Using Visual C# .NET
> http://support.microsoft.com/defaul...kb;en-us;828279
> Best Regards,
> Peter Yang
> MCSE2000/2003, MCSA, MCDBA
> Microsoft Online Partner Support
> When responding to posts, please "Reply to Group" via your newsreader so
> that others may learn and benefit from your issue.
> ========================================
=============
>
> This posting is provided "AS IS" with no warranties, and confers no rights
.
>
> --
> warehouse
> work
> system
>|||Hello Wreck,
I believe this kind of issue may fall under the umbrella of Advisory
Services. Microsoft now offers short-term and proactive assistance for
specific planning, design, development or assistance with installing,
deploying, and general "how to" advice via telephone. For more information:
http://support.microsoft.com/defaul...advisoryservice
Best Regards,
Peter Yang
MCSE2000/2003, MCSA, MCDBA
Microsoft Online Partner Support
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
========================================
=============
This posting is provided "AS IS" with no warranties, and confers no rights.
--
>Thread-Topic: Extracting from the Warehouse
>thread-index: AcXbcLjRmGu+PdK4TIK9aXmjx+gQZw==
>X-WBNR-Posting-Host: 203.110.135.198
>From: "examnotes" <Wreck@.community.nospam>
>References: <FA7A38B4-C640-408E-BE3A-0DD79CBDF669@.microsoft.com>
<9C28f.29353$gF4.25576@.trnddc07>
>Subject: Re: Extracting from the Warehouse
>Date: Thu, 27 Oct 2005 20:36:03 -0700
>Lines: 75
>Message-ID: <ECA3D407-B302-4765-A301-7140451F88A5@.microsoft.com>
>MIME-Version: 1.0
>Content-Type: text/plain;
> charset="Utf-8"
>Content-Transfer-Encoding: 7bit
>X-Newsreader: Microsoft CDO for Windows 2000
>Content-Class: urn:content-classes:message
>Importance: normal
>Priority: normal
>X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
>Newsgroups: microsoft.public.sqlserver.datawarehouse
>NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
>Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
>Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.sqlserver.datawarehouse:2384
>X-Tomcat-NG: microsoft.public.sqlserver.datawarehouse
>Hi,
>I believe the data needs to be available on a daily basis, arriving the
next
>day. So overnight batch processing should be sufficient. There are some
feeds
>that they need more frequently, which would make more sense to load
directly,
>or via our staging area.
>I'm assuming the model will be a cut down version of the warehouse.
Probably
>not quite the same detail or granularity.
>We are using DTS to load data into a staging area and then populate the
>warehouse. I agree using a common staging area is also an option,
something
>like a federated warehouse.
>The main issue I see is that the ETL process of getting this data in the
>first place from multiple systems is fairly involved. To try and do this
>twice seems like unnecessary effort, and there is a risk of the two ETL
>processes diverging and creating different results. There should be a
"single
>source of truth"
>On the other hand, adding another step between the source systems and the
>web database means its one more thing that can go wrong, which is the
concern
>being expressed. I would have thought something like mirroring would
address
>this issue.
>Using a common staging area to do the scrubbing/transforms is a good idea
>but I haven't had much luck putting it forward. That's why I was asking
for
>any white papers on the subject.
>Thanks,
>Wreck.
>
>"Danny" wrote:
>
to[vbcol=seagreen]
model/data.[vbcol=seagreen]
Building[vbcol=seagreen]
data.[vbcol=seagreen]
Web[vbcol=seagreen]
source[vbcol=seagreen]
deliver[vbcol=seagreen]
is[vbcol=seagreen]
data[vbcol=seagreen]
database.[vbcol=seagreen]
would[vbcol=seagreen]
same[vbcol=seagreen]
something[vbcol=seagreen]
one[vbcol=seagreen]
>|||Thanks Peter, but this is not really a technology specific problem.
It's more a question about the approach/design options for moving data
through the enterprise.
As I see it, the options are:
1 - Do multiple extracts, and multiple transformations from single source
systems to different databases;
2 - Have a single ETL process into the warehouse, which then feeds other
databases further downstream;
3 - Something in between where we have a common staging area, but not
everything is loaded into the warehouse or web database, similar to a
federated data warehouse.
The issues lie around latency, redundant/diverging code, and risk of an
extra system (ie the warehouse) failing, and thus impacting the web
application's data.
I see options 2 or 3 as the better way to go, but the people responsible for
the web application prefer option 1. I'm looking for info on any of the
options so that I can either back my point of view, or understand the proble
m
better, and why option 1 would be better.
Sorry if I haven't been clear. Maybe I'm posting to the wrong forum.
Thanks,
Wreck.
"Peter Yang [MSFT]" wrote:

> Hello Wreck,
> I believe this kind of issue may fall under the umbrella of Advisory
> Services. Microsoft now offers short-term and proactive assistance for
> specific planning, design, development or assistance with installing,
> deploying, and general "how to" advice via telephone. For more informatio
n:
> http://support.microsoft.com/defaul...advisoryservice
> Best Regards,
> Peter Yang
> MCSE2000/2003, MCSA, MCDBA
> Microsoft Online Partner Support
> When responding to posts, please "Reply to Group" via your newsreader so
> that others may learn and benefit from your issue.
> ========================================
=============
>
> This posting is provided "AS IS" with no warranties, and confers no rights
.
>
> --|||Hello Wreck,
Since the Web application looks like a reporting/Analysis type program, I
think using option 2 or 3 is more appropriate. However, if
performance/availbility is first priority, using a differernt database
might be option because OLAP processing/updating might cause performance
problem and bring more complexity into consideration.
If the Web application wants to use OLAP cube directly, option 2 is better
because you could build seperate cube for this specific Web application
while using the same data source in warehouse.
Using a common staging area could bring less performance impact on the OLTP
databases as it is not necssary to transfer data twice from them.
As you know, this selection is a balance consideration which is dependent
on how Web application is built and which factors have higher priorty.
Thus, there is no general Whitepaper/Case study on such specific
design/planning problem.
As I mentioned, Advisory Services might be a more appropriate on this kind
of consulting type issue. Of course, you might recive other suggestions
from the community. :-)
Best Regards,
Peter Yang
MCSE2000/2003, MCSA, MCDBA
Microsoft Online Partner Support
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
========================================
=============
This posting is provided "AS IS" with no warranties, and confers no rights.
--
>Thread-Topic: Extracting from the Warehouse
>thread-index: AcXdziuqX8OUZxY7ReaseSkr6r586w==
>X-WBNR-Posting-Host: 203.110.135.198
>From: "examnotes" <Wreck@.community.nospam>
>References: <FA7A38B4-C640-408E-BE3A-0DD79CBDF669@.microsoft.com>
<9C28f.29353$gF4.25576@.trnddc07>
<ECA3D407-B302-4765-A301-7140451F88A5@.microsoft.com>
<Px#AwZc3FHA.3220@.TK2MSFTNGXA01.phx.gbl>
>Subject: Re: Extracting from the Warehouse
>Date: Sun, 30 Oct 2005 19:50:01 -0800
>Lines: 57
>Message-ID: <8074ABBA-63F9-422B-A0ED-C62FD5666CC6@.microsoft.com>
>MIME-Version: 1.0
>Content-Type: text/plain;
> charset="Utf-8"
>Content-Transfer-Encoding: 7bit
>X-Newsreader: Microsoft CDO for Windows 2000
>Content-Class: urn:content-classes:message
>Importance: normal
>Priority: normal
>X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
>Newsgroups: microsoft.public.sqlserver.datawarehouse
>NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
>Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
>Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.sqlserver.datawarehouse:2390
>X-Tomcat-NG: microsoft.public.sqlserver.datawarehouse
>Thanks Peter, but this is not really a technology specific problem.
>It's more a question about the approach/design options for moving data
>through the enterprise.
>As I see it, the options are:
>1 - Do multiple extracts, and multiple transformations from single source
>systems to different databases;
>2 - Have a single ETL process into the warehouse, which then feeds other
>databases further downstream;
>3 - Something in between where we have a common staging area, but not
>everything is loaded into the warehouse or web database, similar to a
>federated data warehouse.
>The issues lie around latency, redundant/diverging code, and risk of an
>extra system (ie the warehouse) failing, and thus impacting the web
>application's data.
>I see options 2 or 3 as the better way to go, but the people responsible
for
>the web application prefer option 1. I'm looking for info on any of the
>options so that I can either back my point of view, or understand the
problem
>better, and why option 1 would be better.
>Sorry if I haven't been clear. Maybe I'm posting to the wrong forum.
>Thanks,
>Wreck.
>
>"Peter Yang [MSFT]" wrote:
>
information:[vbcol=seagreen]
rights.[vbcol=seagreen]
>|||Hi Wreck,
the most basic advice is OLTP happens in OLTP systems and BI happens of
ODS/EDW systems....there can be some update in ODS/EDW but it is a
mistake to do too much..
So, if your web app is mostly query with little update having one set
of ETL would make a good idea...if it is mostly update you should write
it like an OLTP system and pass data to/from the OLTP systems and then
forward it onto the ODS/EDW.
You are correct in thinking that writing two lots of ETL is a bad
idea...ETL is easily the most people expensive portion of a BI
project...
Best Regards
Peter
www.peternolan.com|||Thanks Peter.
As far as I know, there won't be much transactional activity on the web. If
so, I'm sure it would be modelled to handle that. Having said that, the
historical reporting they wish to do would still feed from the
warehouse/source system the same way.
Thanks for your comments, I'm glad to see I'm not off the mark with this.
The warehouse has been sidelined to an extent, which is what I'm trying to
address before the web app goes a long way into development.
Do you know of any good articles to back me up? So far it's been a losing
battle trying to get people to use the warehouse for more effectively.
Thanks,
Wreck.
"Peter Nolan" wrote:

> Hi Wreck,
> the most basic advice is OLTP happens in OLTP systems and BI happens of
> ODS/EDW systems....there can be some update in ODS/EDW but it is a
> mistake to do too much..
> So, if your web app is mostly query with little update having one set
> of ETL would make a good idea...if it is mostly update you should write
> it like an OLTP system and pass data to/from the OLTP systems and then
> forward it onto the ODS/EDW.
> You are correct in thinking that writing two lots of ETL is a bad
> idea...ETL is easily the most people expensive portion of a BI
> project...
> Best Regards
> Peter
> www.peternolan.com
>