Extracting Sql: dimension

Showing posts with label dimension. Show all posts

Wednesday, March 21, 2012

Failed to add Time dimension to the cube

HI, all experts here,

Thank you very much for your kind attention.

I am trying to add a time intelligence to the cube within my SQL Server 2005 Analysis Services. But first of all, I need to add the Time dimension as required. When I tried to add the time dimension, it failed and got the error message as :"Object referenced not set to an instance of an object (Microsoft.Analysis Services.Wizard)."

Why is that? And what probably is the cause of it? Please any experts here give me some help on it. I have strgguled ages for it.

Thank you very much and I am looking forward to hearing from you shortly.

With best regards,

Yours sincerely,

Hello Helen. Please tell the newsgroup of each step before this error appears!

Where, exactly did this error appear? Where you finished with creating the dimension?

Regards

Thomas

|||

Hi, Thomas,

Thank you very much for your help.

I was not able to create the time dimension since I was trying to add the time dimension after the creation of the cube. I could only create the time dimension while during the process of the cube creation. Why is that? Is this the way SQL Server 2005 works? Or I have missed out something?

I am looking forward to hearing from you further shortly and thank you very much again.

With best regards,

Yours sincerely,

|||

To answer my own question

The reason I failed to create the time dimension for my cube is that-I did not have a time table in the data source view. I added the time table there along with the relationships creation between the time table and other fact and dimension tables in the data source view, and then being able to create the time dimension.

Hope it is clear.

With best regards,

Yours sincerely,

Wednesday, March 7, 2012

Fact Tables and Clustered Indexes

A design question:
Say I have a medium-sized fact table w/ a handful of dimension key
columns (including a date) and a handful of measures.
I necessarily need a primary key on the composite dimension key
columns, and I don't know ahead of time which of my dimension key
column(s) will be the best candidate for a clustered index. I do plan
on putting non-clustered indexes on all my dimension key columns, and
the related dimension tables' key columns.
For the sake of argument, let's say we're not partitioning the fact
table.
Assume that new facts occur in time, the fact table grows with time,
and (nearly) all changes to the fact table occur as INSERTs.
Now, all things being equal, is there a benefit of adding a clustered
index to the fact table? Two options:
- Add an IDENTITY column, make it the primary key, and add the
clustered index to it.
- Add the clustered index on the date column, since it has a natural
order.
Basically, I'm after two answers in this scenario:
- Is there a benefit to having a clustered index on a table when the
application doesn't 'really' call for one?
- If so, is it better to add an IDENTITY column (adding size to the
table) or to pick an naturally ordered dimension key? A random key?
The fact's composite key?
Thanks much.
Steven D. Clark
stevec@.clarkdev.com
"Steven Clark" <stevec@.clarkdev.com> wrote in message
news:d7740507.0407100710.2b1644b5@.posting.google.c om...
> Basically, I'm after two answers in this scenario:
> - Is there a benefit to having a clustered index on a table when the
> application doesn't 'really' call for one?
In my opinion, there is never a reason NOT to have one; it's a freebie,
basically, unlike clustered indexes. It doesn't take up any extra disc
space, and you might as well order the data on the disc somehow, rather than
letting the server take care of it... So I always make sure that every table
has one.

> - If so, is it better to add an IDENTITY column (adding size to the
> table) or to pick an naturally ordered dimension key? A random key?
> The fact's composite key?
Some tips for clustered indexes:
- They assist with range queries and grouping. So try to use them for
columns that will be used for those kinds of operations (>, <, BETWEEN, etc,
or make it composite in the same order that you'll be grouping. If you do
composite, order the columns by selectivity, least selective first. This
will create a wider tree, which will result in somewhat quicker grouping.)
- Clustering on a random key is a very bad idea, because it will cause a
lot of page splits, leading to fragmentation. This will slow your data
loads. It will also give you no query benefits at all. So you'll actually
lose on this option.
- Clustering on an IDENTITY key or a DATETIME column that's
automatically set to the date the row is inserted will actually speed up
inserts as it will create a hotspot at the end of the table. So you'll
never have page splits when inserting new data. This can definitely help
speed your data load! Clustering on an IDENTITY will usually not help too
much with queries as, in my experience, most grouping and range operations
don't consider surrogates. Depending on your app, clustering on a DATETIME
as I described can help a lot, as a lot of queries will request data between
two dates, greater than a date, etc.
- Finally, clustering on the composite of your dimensions may be helpful
if you're grouping on them or requesting ranges. However, in the latter
case, remember that a composite index will only be used for searching if the
first column is part of the search criteria, so try to choose one of your
dimensions that will always be searched on (if that exists in your
warehouse).
I hope that answered your questions? Post back if you need some
clarification or further assistance.
|||> It doesn't take up any extra disc space,
That's not true. The clustered index uses the datapages as the leaves, so
that is disk space you use already anyway, but the nodes of the index still
take up extra space on disk. That doesn't mean btw that it is not a good
idea to have a clustered index on every table. In almost all cases the
performance improvement that a clustered index provides more than offsets
the extra disk space used.
Jacco Schalkwijk
SQL Server MVP
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:%23rDxElvZEHA.2388@.TK2MSFTNGP11.phx.gbl...
> "Steven Clark" <stevec@.clarkdev.com> wrote in message
> news:d7740507.0407100710.2b1644b5@.posting.google.c om...
> In my opinion, there is never a reason NOT to have one; it's a
freebie,
> basically, unlike clustered indexes. It doesn't take up any extra disc
> space, and you might as well order the data on the disc somehow, rather
than
> letting the server take care of it... So I always make sure that every
table
> has one.
>
> Some tips for clustered indexes:
> - They assist with range queries and grouping. So try to use them for
> columns that will be used for those kinds of operations (>, <, BETWEEN,
etc,
> or make it composite in the same order that you'll be grouping. If you do
> composite, order the columns by selectivity, least selective first. This
> will create a wider tree, which will result in somewhat quicker grouping.)
> - Clustering on a random key is a very bad idea, because it will cause
a
> lot of page splits, leading to fragmentation. This will slow your data
> loads. It will also give you no query benefits at all. So you'll
actually
> lose on this option.
> - Clustering on an IDENTITY key or a DATETIME column that's
> automatically set to the date the row is inserted will actually speed up
> inserts as it will create a hotspot at the end of the table. So you'll
> never have page splits when inserting new data. This can definitely help
> speed your data load! Clustering on an IDENTITY will usually not help too
> much with queries as, in my experience, most grouping and range operations
> don't consider surrogates. Depending on your app, clustering on a
DATETIME
> as I described can help a lot, as a lot of queries will request data
between
> two dates, greater than a date, etc.
> - Finally, clustering on the composite of your dimensions may be
helpful
> if you're grouping on them or requesting ranges. However, in the latter
> case, remember that a composite index will only be used for searching if
the
> first column is part of the search criteria, so try to choose one of your
> dimensions that will always be searched on (if that exists in your
> warehouse).
> I hope that answered your questions? Post back if you need some
> clarification or further assistance.
>
>
|||"Jacco Schalkwijk" <jacco.please.reply@.to.newsgroups.mvps.org.invalid > wrote
in message news:%23cIz3GDaEHA.2544@.TK2MSFTNGP10.phx.gbl...
> That's not true. The clustered index uses the datapages as the leaves, so
> that is disk space you use already anyway, but the nodes of the index
still
> take up extra space on disk. That doesn't mean btw that it is not a good
> idea to have a clustered index on every table. In almost all cases the
> performance improvement that a clustered index provides more than offsets
> the extra disk space used.
Thanks for the clarification on that... I also didn't think about fill
factor, which could also create the impression of more disc space being
used.

Fact Tables and Clustered Indexes

A design question:
Say I have a medium-sized fact table w/ a handful of dimension key
columns (including a date) and a handful of measures.
I necessarily need a primary key on the composite dimension key
columns, and I don't know ahead of time which of my dimension key
column(s) will be the best candidate for a clustered index. I do plan
on putting non-clustered indexes on all my dimension key columns, and
the related dimension tables' key columns.
For the sake of argument, let's say we're not partitioning the fact
table.
Assume that new facts occur in time, the fact table grows with time,
and (nearly) all changes to the fact table occur as INSERTs.
Now, all things being equal, is there a benefit of adding a clustered
index to the fact table? Two options:
- Add an IDENTITY column, make it the primary key, and add the
clustered index to it.
- Add the clustered index on the date column, since it has a natural
order.
Basically, I'm after two answers in this scenario:
- Is there a benefit to having a clustered index on a table when the
application doesn't 'really' call for one?
- If so, is it better to add an IDENTITY column (adding size to the
table) or to pick an naturally ordered dimension key? A random key?
The fact's composite key?
Thanks much.
Steven D. Clark
stevec@.clarkdev.com"Steven Clark" <stevec@.clarkdev.com> wrote in message
news:d7740507.0407100710.2b1644b5@.posting.google.com...
> Basically, I'm after two answers in this scenario:
> - Is there a benefit to having a clustered index on a table when the
> application doesn't 'really' call for one?
In my opinion, there is never a reason NOT to have one; it's a freebie,
basically, unlike clustered indexes. It doesn't take up any extra disc
space, and you might as well order the data on the disc somehow, rather than
letting the server take care of it... So I always make sure that every table
has one.

> - If so, is it better to add an IDENTITY column (adding size to the
> table) or to pick an naturally ordered dimension key? A random key?
> The fact's composite key?
Some tips for clustered indexes:
- They assist with range queries and grouping. So try to use them for
columns that will be used for those kinds of operations (>, <, BETWEEN, etc,
or make it composite in the same order that you'll be grouping. If you do
composite, order the columns by selectivity, least selective first. This
will create a wider tree, which will result in somewhat quicker grouping.)
- Clustering on a random key is a very bad idea, because it will cause a
lot of page splits, leading to fragmentation. This will slow your data
loads. It will also give you no query benefits at all. So you'll actually
lose on this option.
- Clustering on an IDENTITY key or a DATETIME column that's
automatically set to the date the row is inserted will actually speed up
inserts as it will create a hotspot at the end of the table. So you'll
never have page splits when inserting new data. This can definitely help
speed your data load! Clustering on an IDENTITY will usually not help too
much with queries as, in my experience, most grouping and range operations
don't consider surrogates. Depending on your app, clustering on a DATETIME
as I described can help a lot, as a lot of queries will request data between
two dates, greater than a date, etc.
- Finally, clustering on the composite of your dimensions may be helpful
if you're grouping on them or requesting ranges. However, in the latter
case, remember that a composite index will only be used for searching if the
first column is part of the search criteria, so try to choose one of your
dimensions that will always be searched on (if that exists in your
warehouse).
I hope that answered your questions? Post back if you need some
clarification or further assistance.|||> It doesn't take up any extra disc space,
That's not true. The clustered index uses the datapages as the leaves, so
that is disk space you use already anyway, but the nodes of the index still
take up extra space on disk. That doesn't mean btw that it is not a good
idea to have a clustered index on every table. In almost all cases the
performance improvement that a clustered index provides more than offsets
the extra disk space used.
Jacco Schalkwijk
SQL Server MVP
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:%23rDxElvZEHA.2388@.TK2MSFTNGP11.phx.gbl...
> "Steven Clark" <stevec@.clarkdev.com> wrote in message
> news:d7740507.0407100710.2b1644b5@.posting.google.com...
> In my opinion, there is never a reason NOT to have one; it's a
freebie,
> basically, unlike clustered indexes. It doesn't take up any extra disc
> space, and you might as well order the data on the disc somehow, rather
than
> letting the server take care of it... So I always make sure that every
table
> has one.
>
> Some tips for clustered indexes:
> - They assist with range queries and grouping. So try to use them for
> columns that will be used for those kinds of operations (>, <, BETWEEN,
etc,
> or make it composite in the same order that you'll be grouping. If you do
> composite, order the columns by selectivity, least selective first. This
> will create a wider tree, which will result in somewhat quicker grouping.)
> - Clustering on a random key is a very bad idea, because it will cause
a
> lot of page splits, leading to fragmentation. This will slow your data
> loads. It will also give you no query benefits at all. So you'll
actually
> lose on this option.
> - Clustering on an IDENTITY key or a DATETIME column that's
> automatically set to the date the row is inserted will actually speed up
> inserts as it will create a hotspot at the end of the table. So you'll
> never have page splits when inserting new data. This can definitely help
> speed your data load! Clustering on an IDENTITY will usually not help too
> much with queries as, in my experience, most grouping and range operations
> don't consider surrogates. Depending on your app, clustering on a
DATETIME
> as I described can help a lot, as a lot of queries will request data
between
> two dates, greater than a date, etc.
> - Finally, clustering on the composite of your dimensions may be
helpful
> if you're grouping on them or requesting ranges. However, in the latter
> case, remember that a composite index will only be used for searching if
the
> first column is part of the search criteria, so try to choose one of your
> dimensions that will always be searched on (if that exists in your
> warehouse).
> I hope that answered your questions? Post back if you need some
> clarification or further assistance.
>
>|||"Jacco Schalkwijk" <jacco.please.reply@.to.newsgroups.mvps.org.invalid> wrote
in message news:%23cIz3GDaEHA.2544@.TK2MSFTNGP10.phx.gbl...
> That's not true. The clustered index uses the datapages as the leaves, so
> that is disk space you use already anyway, but the nodes of the index
still
> take up extra space on disk. That doesn't mean btw that it is not a good
> idea to have a clustered index on every table. In almost all cases the
> performance improvement that a clustered index provides more than offsets
> the extra disk space used.
Thanks for the clarification on that... I also didn't think about fill
factor, which could also create the impression of more disc space being
used.

Fact Table/Dimension and Multiple Level Dimension

Hi Guys,

I have two questions on Analysis Service for SQL SERVER 2005:-

a) Is it possible for me to use two tables from my database without Primary Key. As this use to work on my Analysis Service on SQL SERVER 2000.

b) I had a multiple Level Dimension in SQL SERVER 2000 with Key Column as one column in my table and name column as another in my table. I am not able to do the same in SQL SERVER 2005.

Urgent help is requried.

Regards,

Kaushal

a)You do not need primary keys in the source data base but I think you need to set logical primary keys in the data source view.

b)Key and name columns are still separated in Analysis Services 2005.

Regards

Thomas Ivarsson

|||

Hi Thomas,

Thanks for Reply.

a) Yes i do understand that, but then what happens if i want to use the columns used in Primary key as levels in a Dimension?

b) I am still not understanding how to take kare of Key and Name Columns?

Regards,

Kaushal

fact or dimension table

Hi
How does one decide if data should go into a fact or dimension table?
I have 1 primary key and lots of textual information (5 fields) which
suggests dimension.
However, the data is extremely unlikely to change, which suggets fact'
AntNeither of these suggests "fact" vs. "dimension".
A fact is a record of something that occurred -- a sales transaction, for
instance, is a fact. A dimension is information that can be used to
identify the fact -- when it occurred, who was involved, etc.
Facts and dimensions are decided based on the logical context of the data,
not on the basis of when or how the data operates.
Adam Machanic
Pro SQL Server 2005, available now
http://www.apress.com/book/bookDisplay.html?bID=457
--
"-ants-" <ants@.inurpants.com> wrote in message
news:T9udnVc4pNyZl8DZRVny1g@.pipex.net...
> Hi
> How does one decide if data should go into a fact or dimension table?
> I have 1 primary key and lots of textual information (5 fields) which
> suggests dimension.
> However, the data is extremely unlikely to change, which suggets fact'
>
> Ant
>|||"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:us8kK6UcGHA.3344@.TK2MSFTNGP03.phx.gbl...
> Neither of these suggests "fact" vs. "dimension".
Hi
could we talk about this via private email please? Is your email address
valid?
Ant|||I would be happy to dicuss this with you privately, but please keep in mind
that I consider private help to be a type of consulting service, so we'll
have to work out contractual terms first. If you'd like to proceed with
that, you can e-mail me at the e-mail address I've used here, minus
"_removetoemail_".
Otherwise, we can keep discussing here in the public group.
Also, you might want to read this book, which someone recently posted a link
to here:
http://www.redbooks.ibm.com/abstrac...47138.html?Open
It is a free book from IBM on dimensional modeling. I flipped through it
and it seems like a great intro text (especially given its price!

Adam Machanic
Pro SQL Server 2005, available now
http://www.apress.com/book/bookDisplay.html?bID=457
--
"-ants-" <ants@.inurpants.com> wrote in message
news:jt2dnek5tKugk8DZRVnyvQ@.pipex.net...
> "Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
> news:us8kK6UcGHA.3344@.TK2MSFTNGP03.phx.gbl...
> Hi
> could we talk about this via private email please? Is your email address
> valid?
> Ant
>

Fact Dimension Relationships on Partitioned Cube

Hi,

I have Fact Dimension Relationship on my cube, my concern is that the source of the dimension (Invoice Number) is just the 1st Partition of the cube. Should I only use a single partition on my cube to use the Fact Dimension Relationship?

What is the best practice to implement a Fact Dimension Relationship on a partitioned cube?

Thanks,

Renold

If you decided to build your fact dimension based on the same table as your partitions are, you should create a view joining all paritition tables and base your fact dimension off that view.

Otherwise the fact dimension is only going to show the invoice numbers that appear only in the first partition table.

Edward Melomed.
--
This posting is provided "AS IS" with no warranties, and confers no rights.

Fact Dimension

I have what I think is probably not a large fact table (approx 20 mil rows) of invoice line items. There are approx 4 mil uniquie invoiceID's listed in the fact table. I seem to have trouble getting the invoiceID to return as a dimension. I've tried creating a dimension based on the fact table and using this as a dimension. I've also tried pulling the invoiceID out in to a separate table in the dw, and creating a dimension off of it.

What happens is, I'll use some filter (typically based on date and ship location) to try and limit the number of invoices in play. Then, when I try to add the InvoiceID dimension the query usually fails due to lack of memory.

I should note, I use the cube browser for testing as well as some client tools like Tableau, Excel, and Proclarity. My users won't know MDX, so I hope a solution can be done in AS.

Sam

Hello! I think the way to solve this problem is to build a separate invoice number dimension and to create artifical levels in this dimension.

The first level can be the first character/number in the invoice number. On the second level you can use the two first characters/positions and so on.

With this technique you will avoid the problem of having ten thousand invoice number members(or more) being send down to the client.

It is the same as a cascading parameter in Reporting Services. Forze the user to only select a small number of the dimension members.

I ProClarity Professional there is a good search capacity that is seldom used. You will see it in the dimension tool.

HTH

Thomas Ivarsson

Fact and Dimension linking

Hi,

Am a bit of a beginner in this area, so this may sound a little dumb.

I am trying to develop my first cube based on consolidated sales data from 10 different companies, and have a problem wrt linking a dimension table to this fact table.

I have a dimension table which contains the products which can be manufactured by these 10 companies. There are scenarios where one product can be made in several of these companies.

I therefore have a table with all the sales line items (my fact table) which are uniquely identified by invoice no, invoice line num, and originating site.

The product table (dimension table) is uniquely identified by item code and site.

When I create my data source view I create a join between the two tables with sales.site = products.site and sales.itemcode = products.itemcode

If I then create a dimension from my products table and browse them in the cube, the sales data just gets repeated for each product and is not correctly split.

I think that the link I have between the dimension and the fact table is wrong, as when I set the relationship I can only choose one column to link on.

I'm kinda getting lost on this one, and could really do with some pointers. Restructuring the tables are not too much of a problem (if they are wrong) as the project has not been officially released yet.

Any help would be greatly appreciated.

Hello! In the cube editor you have a dimension usage tab where you configure the relation between each dimension and measure group(fact table). If you have created som dimension after the you have created the cube this relation will not be configured automatically.

It seems to me as you relations are correct in the data source view. You can have a problem with that the product key is defined without the site key in the dimension editor. You will have to make a collection of product key and site key. Check the dimension key for this property.

HTH

Thomas Ivarsson

|||

You might check the KeyColumns property for the key attribute of your Product dimension - a composite key {item code, site} should be defined. Then you should be able to join the fact table using this composite key.

http://msdn2.microsoft.com/en-us/library/ms175461.aspx

SQL Server 2005 Books Online

How to: Modify the KeyColumn Property of an Attribute

This procedure describes how to modify the KeyColumns property of an attribute. For example, you may want to specify a composite key rather than a single key as the key for the attribute.

...

|||The item code and site also have to be present in the fact table for this to work. Your message says that the invoice line num is present in the fact table - you have to be able to derive the product item code from this.|||

Thx everyone... after posting I sat down and read the book again to make sure I hadn't missed anything out. And you are right, the tables are OK in my DSV, but the dimension had not been configured correctly w.r.t. its attributes.

As soon as I had that done correctly, of course it all fell into place..

Thx again.

Sunday, February 26, 2012

Extremely simple query (I hope) but I can't solve it

Hi, I'm new in MDX and I have a request that I think is extremely simple, but I can't make it.

I have a dimension with a hierarchy, like this

[Dim1]

[Hier_Dim1]

All

Level1

SLevel12

SLevel13

Level 2

SLevel21

SSlevel211

SSLevel212

etc.

I want the values of a measure [Measures].[Total] on the childrens of SLevel21, I write this MDX

SELECT NON EMPTY { [Measures].[Total] } ON COLUMNS

, NON EMPTY { [Dim1].[Hier_Dim1].&[SLevel21].Children} ON ROWS

FROM [My Cube]

Result Set I expect is

SSLevel211 Total1

SSLevel212 Total2

But I have this

Level1 SLevel2 SSLevel211 Total1

Level1 SLevel2 SSLevel212 Total2

I want to use result set on a report that receive as a parameter the member of hierarchy selected, so the number of columns returned should't depend on parameter selected.

Regards

Julio Diaz

What you are seeing here is a "flattened" cellset. I don't know of anyway of suppressing these extra columns that are generated when the results are "flattened" (converted from a multi-dimensional to a 2 dimensional result), but what you can do is to create a calculated member which will give you a consistant column name that you can use in your report. So in the example below you would use the [Dim1_MbrName] column in your report to get the names of the children of the selected member.

WITH MEMBER [Measures].[Dim1_MbrName] AS [Dim1].[Hier_Dim1].CurrentMember.Name

SELECT NON EMPTY { [Measures].[Dim1_MbrName],[Measures].[Total] } ON COLUMNS

, NON EMPTY { [Dim1].[Hier_Dim1].&[SLevel21].Children} ON ROWS

FROM [My Cube]

|||

Thanks Darren, It works good!

Now I′m trying to pass [Dim1].[Hier_Dim1].&[SLevel21] as a parameter, but it is another topics my question was answered :)

Regards

Julio Diaz

Wednesday, March 21, 2012

Failed to add Time dimension to the cube

Wednesday, March 7, 2012

Fact Tables and Clustered Indexes

Fact Tables and Clustered Indexes

Fact Table/Dimension and Multiple Level Dimension

fact or dimension table

Fact Dimension Relationships on Partitioned Cube

Fact Dimension

Fact and Dimension linking

Sunday, February 26, 2012

Extremely simple query (I hope) but I can't solve it

Extracting Sql

Blog Archive

About Me