March 2008 - Posts

I've been interested in the semantic and accessible web for over five years, now, and the benefits I have developed on various web sites are clear to see. Semantic content is where web pages are 'marked-up' according to their structural significance, as opposed to their presentational significance. So, a heading is marked up as a heading, now a bit of text with bold and a larger font. The formatting of the heading is achieved using Cascading-Style-Sheets. Of course, I'm preaching to the converted I'm sure. Semantic content has key benefits not only for coding cleanliness, but also increases the accessibility of a page to disabled users, availability of the page to users of alternative platforms such as mobile phones and, most importantly, 'readability' by search engines.

Search Engine Optimisation is currently the sexy business of the web, and companies are making thousands of pounds from making out that they can understand how the major search engines aggregate content for users searches. How they can claim this, when much of the internal procedures and practices of the big search engines are hidden and closely guarded secrets, is quite unknown and I say to people who ask me that unless you have money to burn, it really isn't worth the investment. It is better to make sure you have a well structured, semantic and accessible web site from the outset than to waste money on a service which, ultimately, cannot be guaranteed.

In my experience of developing well structured, semantic sites, it can take a little longer to achieve "Page 1" or even "Top 5" search result status, but this is organic growth, which proves the quality of the content and therefore endorses the work and practices put behind the site development. Arrival at the prestigious position on the search result page in an organic manner can also help cement these development practices in business leaders' minds as being worth the extra effort or thought, as opposed to settling with randomly pushing the latest marketing drive or topic they feel users would or should be interested in. Thought at every point in the process of developing web content is essential.

Whatever "Web 2.0" actually means, some of it certainly involves user-generated content. User-generated content, despite what the seminars will badge as new "Web 2.0", is not a new development. Though the accessible, semantic web is relatively new. More and more sites are being re-developed to be platform portable and highly accessible, which is often an expensive project involving work from the ground up. As soon as you open up a site to user-generated content, however, you lose control of this high-quality of web content which makes the quality of your web-site markup so accessible and understandable to search engines, etc. Fair enough, comments to blogs are quite simple and are well aggregated by search engines. But this is because there is often very little control handed over to the user in adding their content.

In my experience of working with sites that involve user-generated content, it is very difficult to hand over the power of the web - including ability to format text, insert images and even obejcts such as Flash or video content - without compromising the quality of mark-up in a page. Two reasons are clear for this: the user is not aware (and should not need to be) of the need to structure their content in the correct and optimised manner and the tools available for this user-generated content are often poor.

A case in point is a major site I have recently been involved in. The site, based on the Isle of Man but clearly with a global reach allows users to post their own content in the form of advertisements, or requests for services along with ability to create their own profile. In order to implement this rich editing capability by users, I needed to be able to find a good editor that works like a word-processor, but creates very high quality mark-up for the web. Unfortunately, the options are very limited. Most rich-text editors for the web have the unfortunate reliance on the rich text controls within each browser. Therefore, when using a rich text editor on Internet Explorer, it exposes parts of the rich text functionality of the browser itself. As you can imagine, this doesn't work with XHTML, but a bizarre hybrid of XHTML+HTML+MS-HTML. This results in very poor markup. Paste your work from a word processor that claims to support web content, such as Microsoft Word, and you get even worse content. (Indeed, many editors have special parsers dedicated to extracting the rubbish these packages insert into the content.)

For example, take the following content of an advertisement:

  <p style="TEXT-ALIGN: justify">
    <b>
      <span style="FONT-SIZE: 11pt">
        <span style="TEXT-DECORATION: none">
          <u>
          </u>
        </span>
       

....

 

        <p style="TEXT-ALIGN: justify">
          <span style="FONT-SIZE: 11pt">
          </span>
        </p>
       

....

 

<p style="TEXT-ALIGN: justify">
          <u>
            <span style="FONT-SIZE: 11pt">
              <span style="FONT-FAMILY: Times New Roman">The Role</span>
            </span>
          </u>
        </p>

.... 

 

<p style="TEXT-ALIGN: justify">
          <span style="FONT-SIZE: 11pt; FONT-FAMILY: Symbol">
            <span>· </span>
            <span style="FONT-SIZE: 11pt">
              <span style="FONT-FAMILY: Times New Roman">…have at least 6 months to 1 year plus proven sales experience (ideally within recruitment, yet field sales, business development and account management are also desirable). </span>
            </span>
            <p style="TEXT-ALIGN: justify">
              <span style="FONT-SIZE: 11pt; FONT-FAMILY: Symbol">
                <span>· </span>
                <span style="FONT-SIZE: 11pt">
                  <span style="FONT-FAMILY: Times New Roman">…will be professional and organised to manage the workload and the needs and expectations of the clients. </span>
                </span>
                <p style="TEXT-ALIGN: justify">
                  <span style="FONT-SIZE: 11pt; FONT-FAMILY: Symbol">
                    <span>· </span>
                    <span style="FONT-SIZE: 11pt">
                      <span style="FONT-FAMILY: Times New Roman">…will have excellent communication skills and show proven negotiation experience and be enthusiastic to drive their team.</span>
                    </span>
                  </span>
                </p>
              </span>
            </p>
          </span>
        </p>
      </span>
    </b>
  </p> 

Anybody with the basics of knowledge of XHTML and semantic content can see that the mark-up there is quite poor. A real shame, as the content has been added in a logical manner, as bullet points. This example shows a number of violations of semantic XHTML, such as use of deprecated and meaningless tags (STRONG should be used instead of B), nested P tags (disallowed) and the bullet points are manual in that they are special characters of a particular font, not the XHTML UL/LI tags that should be used.

The fault doesn't lie with the user, or the site. The fault lies chiefly in the editor used to create this content. The editor is the radEditor from Telerik. Itself, a very functional and advanced editor. The editor claims XHTML support (though I dispute this) and is one of the better editors on the market. It has an attractive licensing package, and other than the poor XHTML output I have no complaints. This editor is one of the series of editors that rely on the built-in rich text capabilities of the browser, and any cleansing of the markup is performed in a mixture of server-side and Javascript code. To be fair, it is very difficult to apply automatic cleansing of human generated content. Therefore, with the best will in the world, these editors set themselves up to fail to achieve the most logical and smeantic content structure. Another editor does exist, XStandard, which does provide much higher quality content as it is an ActiveX control written from the ground up and deals in pure XML. Thie editor, however, is expensive to license in some scenarios and doesn't offer the cross-browser support that other editors provide.

So, here, the user as created their own content and posted it to the site, believing that their content will be effectively handled by search engines. While the site itself enjoys good search rankings and employs W3C compliant code in the surrounding mark-up, the weak nature of the generated markup lets down the content. It can be very frustrating as a developer to develop a site with high quality W3C compliant code, and then have dirty code be inserted by users via the available editors. So, it was with a heavy heart that I recommended that the owner of the site should withdraw any claims that the site is W3C compliant. This weakens it's marketing position, particularly when sold to a technical audience.

We have discovered a number of problems in user generated content:

  • Availability of quality tools is poor and unsupported
  • Knowledge of how to effectively present web content is poor within the user-base
  • Effectively merging static surrounding web site code with user generated code can be difficult
  • Controlling user content is difficult in order to minimise errors, accessibility concerns and quality

In order to be able to effectively publish user generated content, amid a Web 2.0 environment, and have it effectively aggregated and understood by various platform ans search engines, doesn't it mean that we have to exert a greater degree of control over the content - thereby detracting from the point of user-generated content in the first place? If submissions are moderated, and then cleaned up, users will feel as if their contributions are being edited which is clearly not an impression a site wants to portray.

Maybe a way of structuring users submission is needed. Provide the user with a template, into which users can add images, paragraphs, headings and other content in a semantic and controlled manner. The user can therefore create their content, how they want, but the content will be inserted in a well managed manner by virtue of the web site editing software. Maybe an idea for my next project ....

 

Got an eBCS Newsletter this morning that interested me, pointing me to www.datamigrationpro.com. It's a web-site dedicated to professionals (within and without the IT industry) who work with Data Migration. Seems a really good resource for best practice, experience and opportunities.

We all have to deal with data migration issues at some point. At the moment, I am working with some data that is 25 years old and is now in its third IT system, so the standard of data both in terms of input-quality and post-processing quality can be quite poor. My job is to develop a system that allows users to manage this data, but which, either silently or otherwise, corrects the data.

Of immediate interest to me was this article: 6 Misconceptions about Data Migration by Norma L. Davis http://www.datamigrationpro.com/?page=articles_norma_six

And she's right.

Posted by Nathan Pledger | with no comments

What other industry can give such power of functionality and access to data as IT? There are few professions that require such implicit trust and reliance on professionalism - and yet anyone can do it. If you asked the bloke on the street (who is asked everything, so must be quite knowledgeable) which professional could have access to their credit cards, personal data,medical information and their finance details within a single week they would be hard pushed to think of an IT-professional. Sometimes, it scares me the responsibility inferred upon us and we have nothing but trust to make sure we keep ourselves honest, or maybe a contract or non-disclosure agreement, but what are they really worth?

If you think of what happens to a doctor if they make a mis-diagnoses (often amidst much more stressful conditions than many of us are likely to experience), where they often lose their license to practice which inevitably ends their career, you have to wonder what do IT professionals have to not only make sure they operate within professional bouondaries but also that protects them?

I have always found the idea that computers are becoming easier to use a bit frustrating, not because it widens the use of IT and access to the Internet, etc. this is a great thing, but because it starts to become too easy for people to dabble. My view is that to work in IT, you need at least a recognised IT-related degree or similar qualification to even be recognised as being capable. If you come in to IT from another sector as part of your career progression, then years served is also a great qualifier. But qualification doesn't necassarily infer suitability.

What we need is a License to Practice. Doctors have them, Accountants have them, Social Workers have them. These people all have access to sensitive data and can cause signficant change in people's circumstances for better or worse. So can IT. If I worked in an e-commerce environment, I am trusted to maintain confidentiality and not to mis-use data when dealing with credit cards. If I worked as a contractor for the government, it would be easy for me to stumble across patient data, or other sensitive data, such as tax records. (With the British Governments performance, I'd probably be just as likely to stumble upon such sensitive data on a public bus the rate they keep losing CD's with our data on) Other than an employment contract and maybe a non-disclosure agreement, there is nothing to prevent me from serruptitiously mis-using such data.

So what if I get caught? The gamble of potentially accessing a lot of money either directly or indirectly by selling personal data to criminals or competitors, could pay off enough to mitigate potentially losing my job or being sued either by the state or my employer. It might be difficult to find a job, supposing that prospective employers can access your history (which would require some significant research on their part), but the hard times would pass. You'd be able to get back into the industry eventually, maybe even a different position with similar benefits - but a job none the less.

A License to Practice would work in two ways. First, it would work to prevent IT-professionals from leaking or mis-using data or functions of the data by threatening real and serious penalties. Such penalties could be set according to the indiscretion, but could range from fines through to revocation of the license, meaning that the individual could no longer apply for any position in IT as it would become easy for employers to search for the individual against the licensing body. Secondly, the License would protect us by the fact that it would infer professional trust on the IT-worker. If data is lost, and three people have had access to the data, one of which is a Licensed IT professional, it would be logical to think that the other two people would be potentially have less to lose. In a court setting, a legally accepted License would also protect us in terms of acting as a Witness, and also in the unfortunate event of us being tried for any charges brought against us.

A License to Practice would not be an impenatrable shield, behind which we can hide and act with impunity. It would come with certain obligations, which may be a requirement to renew the license, to pay a subscription fee (although this would certainly not be the only requirement), requirement to keep references current or requirement to keep ones qualifications current wither by academic study, or through professional qualifications.

This idea is not new, and is not my own. The British Computer Society, of which I am a member, has a long standing objective to provide a similar benchmark of professionalism which could be used as a requirement to operate either within a particular job or even the industry as a whole. I pay an annual subscription to the BCS to be a member, but this infers nothing on me other than my managers or colleagues I have worked with have recommended me for membership. As far as I can tell, it is quite easy to become a member. The fact I am a member will no doubt attract prospective employers and clilents. It at least shows that I take my position seriously. I am aware of the responsibility and risks associated with my work, and I pay £x hundred pounds a year to act as a guarantee that I am serious about it. There is a structure of memberships with the BCS. The next one I aim to go for is CITP, Chartered IT Professional. This requires a higher annual payment, but more importantly, requires a serious amount of work to go towards establishing qualification to operate (both in terms of academic qualification and years served) and to face an interview panel to establish suitability for the position. This is possibly a little higher than where I would see a License to Practice to operate, to be honest, as it is quite challenging to obtain - and with reason.

In my work up to now I have seen many examples of a requirement of a License to Practice in my own work. Most recently, for example, I was charged with the private membership details of users on a web-site. I was responsible for the data, as the technical role within the web-site, and as such I was required to treat the data with utmost security. This, I did, to the best of my ability, using security mechanisms like encryption and firewalls as only part of the solution, coupled with business procedures and limited access to ensure that not only was the data safe, but also that my professionalism was safe even though it made it quite difficult to do my own job. Other examples are quite frustrating, people "tinkering" with computers, resulting in me "finishing the job". If you weren't an electrician, you wouldn't tinker with the ring-main, so why tinker with your PC or web-site?

Whether we will ever see this I don't know. Without a body with more teeth than the BCS has, I don't see it happening. While the BCS is held in high regard by government and companies as a means of establishing a professional benchmark, without a serious IT event whereby data is lost or mis-used on a massive scale, I can't see any reason to require a Licence to Practice being seen as a necessity. That said, with the Government's ability to manage IT-projects and data currently under serious question - and their ill-thought out ID card plans - it may only be a matter of time before they fall over their IT obligations big-time.