Nathan J Pledger

Program.X musings from the Isle of Man concerning ASP.NET, in particular accessibility, web standards and neat ideas.

Sitecore: Accessing Hierarchical Node structures from SQL

I'm often presented with the need to process nodes in Sitecore, which as they are hierarchical, can prove difficult to do other than via the Sitecore API. While the Sitecore API is excellent, it is a little cumbersome to create the environment when you only want the API for a single purpose. (eg. configuration files, required resources in the VS project, etc.). It's also quite slow when traversing down your node structure.

I've been working on a neat marketing channel to output the best selling products in a particular department within an e-Commerce site. So, in the "Bikes" department, I want to see the best sellers in that department, not any others and to do this I need to work from a particular node in Sitecore (representing the "Bikes" department) and work down on a JOIN from previously purchased items.

SQL is intrinsically not recursive, but we found the following SQL which might be of use:

WITH nodes (id, parentID, name, masterID) AS
(
SELECT id, parentID, name, masterID FROM database_sc53_Web..items WHERE id = '110D559F-DEA5-42EA-9C1C-8A5DF7E70EF9' -- root node
UNION ALL
SELECT r1.ID, r1.parentID, convert(nvarchar(256), r2.Name + '/' + R1.Name), r1.masterID
FROM database_sc53_Web..items r1
INNER JOIN nodes r2 ON R1.ParentID = R2.id
WHERE r1.masterID IN
('E461EFA2-CAC8-4A14-93C2-6966E89AAB5B',
'34E5E086-6BE9-480A-9EE0-A9CD64F52A76'))
select * from Nodes

Basically, you create a SELECT to retrieve the root node (indicated by the comment on the fourth line), and then use the SQL 2005 WITH construct to recurse via a JOIN into itself. In the WHERE, we have also added a limit on what nodes are retrieved, by their Master ID, but this could be their TemplateID or some other field available within the Items table.

This produces output similar to:

This is from the Home node, and runs through the entire site, which is over 10,000 pages. With the filter on the Master IDs, we return 9,057 rows within 1 second. (I know the above says 10 seconds, but the database server is a remote server so this screenshot is a little inaccurate). This runs fast enough and is efficient enough on SQL to allow us to run these queries "live" without a caching layer.

Obviously, I wouldn't recommend putting this in the Sitecore databases themselves, rather a database alongside it and do a cross-database query.

Technorati Tags: ,

Comments

Alex de Groot said:

Hello Nathan,

Please be aware that querying against the database directly is unsupported. We're quite unique in the world to say so, but this approach is highly unmaintainable.

Beside of that, the most searched/bought/used products... that's all about conversion.  By default conversion results are not saved in Sitecore but in an external system. Can't you use this system to retrieve the data?

Kind regards,

Alex de Groot

Sitecore Solution Architect

# July 31, 2008 11:15 AM

Nathan Pledger said:

Hi again Alex,

100% agree that Sitecore should not support this tactic - I would advise against it in usual circumstances.

The reason why we're using Sitecore to drive the extraction of sales/interest is due to the site hierarchy. As far as the user is concerned, the fact that they are in Bikes means that anything underneath is of interest to them. This cracks the one aspect of relevance. We have a number of web sites, with similar products, but structured according to the client/sales outlet, so Sitecore is the logical place to look for product hierarchy in the context of that site. eg. Bikes > MotoGp > James Toseland may be in Motorsport > Bikes > Manx Heros in another site.

We then use external systems (sales data, tagging of products, User-generated-content and user browsing history) to drive the impportance of products beneath the Bikes node and decide whether it should appear as a product of interest, ie. in the "top 10" based on the selected algorithm.

We do have a "Tagging" concept that will ultimately drive positioning of products within Sitecore, but we still feel the structure of Sitecore is the best way to identify relationships between product topics. I guess we could mirror the Sitecore node structure in a separate system, but this feels clunky.

Also, we see any minor revisions to v5 with v6 well on the roadmap is unlikely to pose breaking changes that we can't work around. And we would *never* write to the database!

Thanks for reading/commenting.

# July 31, 2008 11:33 AM
Leave a Comment

(required) 

(required) 

(optional)

(required) 


Please add 8 and 5 and type the answer here: