I just read Dennis
post about his adventures in isolation level land
. He says he does not know a good reason to read uncommitted data because of dirty reads.
Well he should refrase this like: Using the isolation level read uncommitted data could result in a dirty read.
Reading uncommitted data can be very interesting. Not only because you can read data that "is not modified at all but not accessible because of a lock". For example a eventlog table, a table that contains statistics about page requests.
A side benefit is that a select normally also creates a lock on a table for itself. That is because a single query adheres the ACID rules. This means that a select will lock data while it is running. Sometimes you want non-blocking reads as for example Sahil Malik
writes. Lets say you have a log table in your database. You know you only do inserts. Each insert will do locking. Lets say you have some SELECT queries that will result in a table scan. You really don't want to have a table-lock while it is running because else the application will not be able to add new rows to the table. You know in advance that you will never read dirty data because you only do inserts. And that reminds me of a post of my own Change mssql isolation levels to read uncommitted data
from some months ago
So when to use this? Well if locking a table will stall other operations and reading uncommitted data isn't that interesting. You use this sort of queries mostly for reporting functionality or even just normal read operations. As in reading! Not read data and then update the data. You will need optimistic concurrency control for that and that requires a timestamp column to validate or a total record compare. With a timestamp.. you MUST be sure that the record data read in the first place is not dirty.You cant use the read uncommitted here. But the nice thing about the total record compare is that you can read uncommitted data because the OCC solution relies on the data and not on a timestamp. So it IS possible but in most environments you see timestamps because comparing one timestamp column is cheap.
I just read Paul Winson's post about the fact that MySpace uses ASP.NET. I know this for months but just creates more questions about how MySpace is build.
- Do they use c#, vb?
- Do they make use of the standaard system.web.* controls?
- What kind of application layers do they have?
- How do they manage state? Viewstate, cookies, sql, in memory?
- What kind of data caching do they use and in which parts?
- How do they manage data persistancy?
- How big is there web+sql farm?
I've heard rumours that the total processing power for MySpace
exceeds that of Google
What I read about persistancy is that they have choosen iBatis.NET
. They support three environments: Java, .net and Ruby. I have not yet tested this persistancy O/R framework but I am wondering if it also works with Mono
I just found the following information about the hardware in use for MySpace
MySpace's extensive IT architecture currently features 2,682 Web servers, 90 Cache servers with 16GB RAM, 450 Dart Servers, 60 database servers, 150 media processing servers, 1,000 disks in a SAN (storage area network) deployment, three data centers and 17,000MB per second of bandwidth throughput.
MySpace currently sets aside about 100 terabytes for MP3s and videos, and another 200TB for dynamic content.
MySpace is deploying Isilon Systems' software for MP3 and video
streaming, clustering systems together in order to spread files and
data across multiple storage nodes. The technology also reduces storage
capacity constraints, since new nodes can be added as necessary.
Originally starting off with a two-node 3PAR frame, MySpace has since
upgraded to an eight-node cluster. Each storage node delivers 600
megahertz per second, while each cluster spits out 10G bits per second.
So this seems like an impressive hardware setup :-)
Well that makes you wonder what will happen to this setup when MySpace will expand to China as is mentioned in the media the last couple of months.