Hi,
I'm working on a warehouse project that loads data from a number of source
systems. There's a new project starting up in my company that will deliver
data to customers via the web. This data is more or less the same that is
being loaded into the warehouse. In fact, the intention is to use the data
model (with some modifications) from the warehouse for the web database.
My question is: Should the web database source it's data from the warehouse
or directly from the source systems?
Given the complexity of the transformations, I would have thought it would
be more practical to extract from the warehouse, rather than do the same work
twice from the source systems. The concern is that by adding another system
(ie the warehouse) into the process, there is a greater risk of something
going wrong, whereas extracting directly from the source, there's only one
place where there could be a problem.
If anyone has any references to white papers or case studies on either
option, or any thoughts themselves, I'd appreciate it.
Thanks,
Wreck.
Hello Wreck,
I agree with you using warehouse is more reasonable according to your
situaiton. Because data model in warehouse is more stable and you could
perform ETL and focus on data on warehouse side. Also, for aggration
information, it is not practical to calucate them on the fly when
porcessing query from clients via Web site for performance purpose. The
following articles might be helpful:
908618 MSDN Support WebCast: How to use ADOMD.NET in .NET Framework
applications
http://support.microsoft.com/?id=908618
How To: Traverse an ADO MD Cellset Object by Using Visual C# .NET
http://support.microsoft.com/default...b;en-us;828279
Best Regards,
Peter Yang
MCSE2000/2003, MCSA, MCDBA
Microsoft Online Partner Support
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
================================================== ===
This posting is provided "AS IS" with no warranties, and confers no rights.
--
>Thread-Topic: Extracting from the Warehouse
>thread-index: AcXat3s7NYEyOH1QQOGB+IwOj4BmTg==
>X-WBNR-Posting-Host: 203.110.135.198
>From: "=?Utf-8?B?V3JlY2s=?=" <Wreck@.community.nospam>
>Subject: Extracting from the Warehouse
>Date: Wed, 26 Oct 2005 22:30:03 -0700
>Lines: 24
>Message-ID: <FA7A38B4-C640-408E-BE3A-0DD79CBDF669@.microsoft.com>
>MIME-Version: 1.0
>Content-Type: text/plain;
>charset="Utf-8"
>Content-Transfer-Encoding: 7bit
>X-Newsreader: Microsoft CDO for Windows 2000
>Content-Class: urn:content-classes:message
>Importance: normal
>Priority: normal
>X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
>Newsgroups: microsoft.public.sqlserver.datawarehouse
>NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
>Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
>Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.sqlserver.datawarehouse:2378
>X-Tomcat-NG: microsoft.public.sqlserver.datawarehouse
>Hi,
>I'm working on a warehouse project that loads data from a number of source
>systems. There's a new project starting up in my company that will deliver
>data to customers via the web. This data is more or less the same that is
>being loaded into the warehouse. In fact, the intention is to use the data
>model (with some modifications) from the warehouse for the web database.
>My question is: Should the web database source it's data from the
warehouse
>or directly from the source systems?
>Given the complexity of the transformations, I would have thought it would
>be more practical to extract from the warehouse, rather than do the same
work
>twice from the source systems. The concern is that by adding another
system
>(ie the warehouse) into the process, there is a greater risk of something
>going wrong, whereas extracting directly from the source, there's only one
>place where there could be a problem.
>If anyone has any references to white papers or case studies on either
>option, or any thoughts themselves, I'd appreciate it.
>Thanks,
>Wreck.
>
|||Depends on the requirements for supplying the web data. What must the
availability for the web data be? You mentioned your company would like to
use the same model. hopefully it's just a small portion of the model/data.
Most data warehouses are built with tools like DTS or Informatica. Building
the DW with this type of tool usually requires large staging areas of data.
You could perhaps push the data you needed from the staging area to the Web
database.
"Wreck" <Wreck@.community.nospam> wrote in message
news:FA7A38B4-C640-408E-BE3A-0DD79CBDF669@.microsoft.com...
> Hi,
> I'm working on a warehouse project that loads data from a number of source
> systems. There's a new project starting up in my company that will deliver
> data to customers via the web. This data is more or less the same that is
> being loaded into the warehouse. In fact, the intention is to use the data
> model (with some modifications) from the warehouse for the web database.
> My question is: Should the web database source it's data from the
> warehouse
> or directly from the source systems?
> Given the complexity of the transformations, I would have thought it would
> be more practical to extract from the warehouse, rather than do the same
> work
> twice from the source systems. The concern is that by adding another
> system
> (ie the warehouse) into the process, there is a greater risk of something
> going wrong, whereas extracting directly from the source, there's only one
> place where there could be a problem.
> If anyone has any references to white papers or case studies on either
> option, or any thoughts themselves, I'd appreciate it.
> Thanks,
> Wreck.
>
|||Hi,
I believe the data needs to be available on a daily basis, arriving the next
day. So overnight batch processing should be sufficient. There are some feeds
that they need more frequently, which would make more sense to load directly,
or via our staging area.
I'm assuming the model will be a cut down version of the warehouse. Probably
not quite the same detail or granularity.
We are using DTS to load data into a staging area and then populate the
warehouse. I agree using a common staging area is also an option, something
like a federated warehouse.
The main issue I see is that the ETL process of getting this data in the
first place from multiple systems is fairly involved. To try and do this
twice seems like unnecessary effort, and there is a risk of the two ETL
processes diverging and creating different results. There should be a "single
source of truth"
On the other hand, adding another step between the source systems and the
web database means its one more thing that can go wrong, which is the concern
being expressed. I would have thought something like mirroring would address
this issue.
Using a common staging area to do the scrubbing/transforms is a good idea
but I haven't had much luck putting it forward. That's why I was asking for
any white papers on the subject.
Thanks,
Wreck.
"Danny" wrote:
> Depends on the requirements for supplying the web data. What must the
> availability for the web data be? You mentioned your company would like to
> use the same model. hopefully it's just a small portion of the model/data.
> Most data warehouses are built with tools like DTS or Informatica. Building
> the DW with this type of tool usually requires large staging areas of data.
> You could perhaps push the data you needed from the staging area to the Web
> database.
> "Wreck" <Wreck@.community.nospam> wrote in message
> news:FA7A38B4-C640-408E-BE3A-0DD79CBDF669@.microsoft.com...
>
>
|||Hi Peter,
The web application is sitting out in a DMZ, and won't have access to the
warehouse. The idea is to push data into the web application database, but
nothing will be allowed back through the fire wall.
Thanks for the links, I haven't looked at ADOMD.Net yet, but I'm more after
articles on the architecture / topology side of things.
Thanks,
Wreck.
"Peter Yang [MSFT]" wrote:
> Hello Wreck,
> I agree with you using warehouse is more reasonable according to your
> situaiton. Because data model in warehouse is more stable and you could
> perform ETL and focus on data on warehouse side. Also, for aggration
> information, it is not practical to calucate them on the fly when
> porcessing query from clients via Web site for performance purpose. The
> following articles might be helpful:
> 908618 MSDN Support WebCast: How to use ADOMD.NET in .NET Framework
> applications
> http://support.microsoft.com/?id=908618
> How To: Traverse an ADO MD Cellset Object by Using Visual C# .NET
> http://support.microsoft.com/default...b;en-us;828279
> Best Regards,
> Peter Yang
> MCSE2000/2003, MCSA, MCDBA
> Microsoft Online Partner Support
> When responding to posts, please "Reply to Group" via your newsreader so
> that others may learn and benefit from your issue.
> ================================================== ===
>
> This posting is provided "AS IS" with no warranties, and confers no rights.
>
> --
> warehouse
> work
> system
>
|||Hello Wreck,
I believe this kind of issue may fall under the umbrella of Advisory
Services. Microsoft now offers short-term and proactive assistance for
specific planning, design, development or assistance with installing,
deploying, and general "how to" advice via telephone. For more information:
http://support.microsoft.com/default...dvisoryservice
Best Regards,
Peter Yang
MCSE2000/2003, MCSA, MCDBA
Microsoft Online Partner Support
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
================================================== ===
This posting is provided "AS IS" with no warranties, and confers no rights.
--
>Thread-Topic: Extracting from the Warehouse
>thread-index: AcXbcLjRmGu+PdK4TIK9aXmjx+gQZw==
>X-WBNR-Posting-Host: 203.110.135.198
>From: "=?Utf-8?B?V3JlY2s=?=" <Wreck@.community.nospam>
>References: <FA7A38B4-C640-408E-BE3A-0DD79CBDF669@.microsoft.com>
<9C28f.29353$gF4.25576@.trnddc07>
>Subject: Re: Extracting from the Warehouse
>Date: Thu, 27 Oct 2005 20:36:03 -0700
>Lines: 75
>Message-ID: <ECA3D407-B302-4765-A301-7140451F88A5@.microsoft.com>
>MIME-Version: 1.0
>Content-Type: text/plain;
>charset="Utf-8"
>Content-Transfer-Encoding: 7bit
>X-Newsreader: Microsoft CDO for Windows 2000
>Content-Class: urn:content-classes:message
>Importance: normal
>Priority: normal
>X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
>Newsgroups: microsoft.public.sqlserver.datawarehouse
>NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
>Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
>Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.sqlserver.datawarehouse:2384
>X-Tomcat-NG: microsoft.public.sqlserver.datawarehouse
>Hi,
>I believe the data needs to be available on a daily basis, arriving the
next
>day. So overnight batch processing should be sufficient. There are some
feeds
>that they need more frequently, which would make more sense to load
directly,
>or via our staging area.
>I'm assuming the model will be a cut down version of the warehouse.
Probably
>not quite the same detail or granularity.
>We are using DTS to load data into a staging area and then populate the
>warehouse. I agree using a common staging area is also an option,
something
>like a federated warehouse.
>The main issue I see is that the ETL process of getting this data in the
>first place from multiple systems is fairly involved. To try and do this
>twice seems like unnecessary effort, and there is a risk of the two ETL
>processes diverging and creating different results. There should be a
"single
>source of truth"
>On the other hand, adding another step between the source systems and the
>web database means its one more thing that can go wrong, which is the
concern
>being expressed. I would have thought something like mirroring would
address
>this issue.
>Using a common staging area to do the scrubbing/transforms is a good idea
>but I haven't had much luck putting it forward. That's why I was asking
for[vbcol=seagreen]
>any white papers on the subject.
>Thanks,
>Wreck.
>
>"Danny" wrote:
to[vbcol=seagreen]
model/data.[vbcol=seagreen]
Building[vbcol=seagreen]
data.[vbcol=seagreen]
Web[vbcol=seagreen]
source[vbcol=seagreen]
deliver[vbcol=seagreen]
is[vbcol=seagreen]
data[vbcol=seagreen]
database.[vbcol=seagreen]
would[vbcol=seagreen]
same[vbcol=seagreen]
something[vbcol=seagreen]
one
>
|||Thanks Peter, but this is not really a technology specific problem.
It's more a question about the approach/design options for moving data
through the enterprise.
As I see it, the options are:
1 - Do multiple extracts, and multiple transformations from single source
systems to different databases;
2 - Have a single ETL process into the warehouse, which then feeds other
databases further downstream;
3 - Something in between where we have a common staging area, but not
everything is loaded into the warehouse or web database, similar to a
federated data warehouse.
The issues lie around latency, redundant/diverging code, and risk of an
extra system (ie the warehouse) failing, and thus impacting the web
application's data.
I see options 2 or 3 as the better way to go, but the people responsible for
the web application prefer option 1. I'm looking for info on any of the
options so that I can either back my point of view, or understand the problem
better, and why option 1 would be better.
Sorry if I haven't been clear. Maybe I'm posting to the wrong forum.
Thanks,
Wreck.
"Peter Yang [MSFT]" wrote:
> Hello Wreck,
> I believe this kind of issue may fall under the umbrella of Advisory
> Services. Microsoft now offers short-term and proactive assistance for
> specific planning, design, development or assistance with installing,
> deploying, and general "how to" advice via telephone. For more information:
> http://support.microsoft.com/default...dvisoryservice
> Best Regards,
> Peter Yang
> MCSE2000/2003, MCSA, MCDBA
> Microsoft Online Partner Support
> When responding to posts, please "Reply to Group" via your newsreader so
> that others may learn and benefit from your issue.
> ================================================== ===
>
> This posting is provided "AS IS" with no warranties, and confers no rights.
>
> --
|||Hello Wreck,
Since the Web application looks like a reporting/Analysis type program, I
think using option 2 or 3 is more appropriate. However, if
performance/availbility is first priority, using a differernt database
might be option because OLAP processing/updating might cause performance
problem and bring more complexity into consideration.
If the Web application wants to use OLAP cube directly, option 2 is better
because you could build seperate cube for this specific Web application
while using the same data source in warehouse.
Using a common staging area could bring less performance impact on the OLTP
databases as it is not necssary to transfer data twice from them.
As you know, this selection is a balance consideration which is dependent
on how Web application is built and which factors have higher priorty.
Thus, there is no general Whitepaper/Case study on such specific
design/planning problem.
As I mentioned, Advisory Services might be a more appropriate on this kind
of consulting type issue. Of course, you might recive other suggestions
from the community. :-)
Best Regards,
Peter Yang
MCSE2000/2003, MCSA, MCDBA
Microsoft Online Partner Support
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
================================================== ===
This posting is provided "AS IS" with no warranties, and confers no rights.
--
>Thread-Topic: Extracting from the Warehouse
>thread-index: AcXdziuqX8OUZxY7ReaseSkr6r586w==
>X-WBNR-Posting-Host: 203.110.135.198
>From: "=?Utf-8?B?V3JlY2s=?=" <Wreck@.community.nospam>
>References: <FA7A38B4-C640-408E-BE3A-0DD79CBDF669@.microsoft.com>
<9C28f.29353$gF4.25576@.trnddc07>
<ECA3D407-B302-4765-A301-7140451F88A5@.microsoft.com>
<Px#AwZc3FHA.3220@.TK2MSFTNGXA01.phx.gbl>
>Subject: Re: Extracting from the Warehouse
>Date: Sun, 30 Oct 2005 19:50:01 -0800
>Lines: 57
>Message-ID: <8074ABBA-63F9-422B-A0ED-C62FD5666CC6@.microsoft.com>
>MIME-Version: 1.0
>Content-Type: text/plain;
>charset="Utf-8"
>Content-Transfer-Encoding: 7bit
>X-Newsreader: Microsoft CDO for Windows 2000
>Content-Class: urn:content-classes:message
>Importance: normal
>Priority: normal
>X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
>Newsgroups: microsoft.public.sqlserver.datawarehouse
>NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
>Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
>Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.sqlserver.datawarehouse:2390
>X-Tomcat-NG: microsoft.public.sqlserver.datawarehouse
>Thanks Peter, but this is not really a technology specific problem.
>It's more a question about the approach/design options for moving data
>through the enterprise.
>As I see it, the options are:
>1 - Do multiple extracts, and multiple transformations from single source
>systems to different databases;
>2 - Have a single ETL process into the warehouse, which then feeds other
>databases further downstream;
>3 - Something in between where we have a common staging area, but not
>everything is loaded into the warehouse or web database, similar to a
>federated data warehouse.
>The issues lie around latency, redundant/diverging code, and risk of an
>extra system (ie the warehouse) failing, and thus impacting the web
>application's data.
>I see options 2 or 3 as the better way to go, but the people responsible
for
>the web application prefer option 1. I'm looking for info on any of the
>options so that I can either back my point of view, or understand the
problem[vbcol=seagreen]
>better, and why option 1 would be better.
>Sorry if I haven't been clear. Maybe I'm posting to the wrong forum.
>Thanks,
>Wreck.
>
>"Peter Yang [MSFT]" wrote:
information:[vbcol=seagreen]
rights.
>
|||Hi Wreck,
the most basic advice is OLTP happens in OLTP systems and BI happens of
ODS/EDW systems....there can be some update in ODS/EDW but it is a
mistake to do too much..
So, if your web app is mostly query with little update having one set
of ETL would make a good idea...if it is mostly update you should write
it like an OLTP system and pass data to/from the OLTP systems and then
forward it onto the ODS/EDW.
You are correct in thinking that writing two lots of ETL is a bad
idea...ETL is easily the most people expensive portion of a BI
project...
Best Regards
Peter
www.peternolan.com
|||Thanks Peter.
As far as I know, there won't be much transactional activity on the web. If
so, I'm sure it would be modelled to handle that. Having said that, the
historical reporting they wish to do would still feed from the
warehouse/source system the same way.
Thanks for your comments, I'm glad to see I'm not off the mark with this.
The warehouse has been sidelined to an extent, which is what I'm trying to
address before the web app goes a long way into development.
Do you know of any good articles to back me up? So far it's been a losing
battle trying to get people to use the warehouse for more effectively.
Thanks,
Wreck.
"Peter Nolan" wrote:
> Hi Wreck,
> the most basic advice is OLTP happens in OLTP systems and BI happens of
> ODS/EDW systems....there can be some update in ODS/EDW but it is a
> mistake to do too much..
> So, if your web app is mostly query with little update having one set
> of ETL would make a good idea...if it is mostly update you should write
> it like an OLTP system and pass data to/from the OLTP systems and then
> forward it onto the ODS/EDW.
> You are correct in thinking that writing two lots of ETL is a bad
> idea...ETL is easily the most people expensive portion of a BI
> project...
> Best Regards
> Peter
> www.peternolan.com
>
Showing posts with label theres. Show all posts
Showing posts with label theres. Show all posts
Subscribe to:
Posts (Atom)