Thursday, July 03, 2008

KLCI Trading suspended due to system failure

Bursa trading still suspended at 2.30pm

KUALA LUMPUR, July 3 — Trading on Bursa Malaysia will not resume at 2.30pm as notified earlier, Bursa Malaysia announced today.

It said that it will continue to update with notifications to the market as soon as it has any updates. The morning session was suspended following a multi-hardware failure in its core trading system.

Earlier, during the lunch break, Bursa Malaysia said trading will resume at 2.30pm from its back-up system at its alternate trading site.

43 comments:

Anonymous said...

SING -40, HANGSENG -450...

WHAT COMPUTER GLITCH? MORE LIKE P. BALA GLITCH.

EXPECTED TO DIP BELOW 1100 TODAY!

Jefus said...

http://www.theedgedaily.com/cms/content.jsp?id=com.tms.cms.article.Article_e69a7de1-cb73c03a-c8c7d600-f86fa74d

There is rising tension among investors due to escalating tension on the domestic political scene. Former deputy prime minister Datuk Seri Anwar Ibrahim is reportedly kicking off a nationwide rally in Petaling Jaya, Selangor, this Sunday to clear his name from fresh sodomy charges.

Anonymous said...

Tell me it ain't so, after spending hundred of millions for the system & another 70m+ on upgrading & the blardy system can down for the whole trading day?

Could it be just an excuse to justify another spending spree for a new upgrading program?

Or was that an interference to put a brake on the CI from skidding to 900?

---MadMonk

Anonymous said...

So many coincidences.

Anonymous said...

Oh my, year 2008 yet they dare to use the old day "multiple computer hardware failure" excuses.

Enterprise computer are not gravy train road system and building. There is great redundancy build in that Joe Public cannot imagine for their home computer.

KLSE trade are 100 times of 1990. But computer system has improve millions times compare to 1990 specification.

I afraid there it is bigger meltdown, perhaps more than 10% drops. If it is true, then it will be the 2nd time KLSE fall 10% in a day during AAB administration.

Anonymous said...

So many ? mark.

Anonymous said...

Hello mave,

From what I known, KLCI was suspended due to their hard drive's defaulty. There is no need to make speculation which link to other sort of cases (sodomy' case, Altantuya's case, etc.).

Maverick SM said...

Jefus,

I think the market is just volatile and it is also affected by the Dow Jones.

MadMonk,

We can't speculate but I was informed by my remiser that it was truly a computer glitch.

Hasilox,

Ya, coincidental...

Moo_t,

I am afraid you are right; that will be a sad tale.

Maverick SM said...

Wisdomthinker,

Did I relate the KLCI glitch to Altantunya or the Sodomy case? Tell me which part of my writing did I do so? I'm perplexed!!! It seems you are relating it, not me.

Jefus said...

Mav,

'No' to soldiers on street duty
Chan Kok Leong | Jul 3, 08 2:47pm
Unprecedented and dangerous was how human rights group Suaram executive director Yap Swee Seng described yesterday’s suggestion that the armed forces be roped in to maintain public order.
Unprecedented and dangerous

taken from Malaysiakini,....

what signal / vibes are they sending to the general public?

Moo-T is right, even with the 9/11, affected insurance companies had mirror servers in other continents as back up. It does not add up, the explanation is leaky.

Meltdown seems more probable, unfortunately, tomorrow is 04th July, we will have to wait till Monday to feel the full impact.

Maverick SM said...

Jefus,

I tend to share the same observation but I talk to some remisers and they are of the view that the server was down and the backup was having problems too. Let's wait for tomorrow. FYI, the future Index was down some 30 points; it is predicted that the KLCI would be down by between 20-30 points tomorrow.

Anonymous said...

wisdomthinker,

Hard drive failure are pathetic and big jokes. Entity like KLSE will use telco grade HA(high availability) server.

And HA server are build redundant. If you cheapo office server come with 10 hard disk, using mirroring, a HA server will have the same config, but double up. Now tell me how likely is 2 set of redundant hard disk failure? And HA provider will sign a SLA(service level agreement) to replace part in 2 hours! And they should do this when the customer report ANY part start failing.

BTW, HA server will NOT use same brand or same batch number of disk, this will cut the "serialise death of disks".

Bare in mind there is also hot standby system.

BTW, if someone paying me salaries of RM1 millions per year, I can make up better bull shit than stupid hard disk failure.

Anonymous said...

Doc Mave,

Spread enough lies to typical Jo then public will think it is truth. Remiser are not Information technology(IT) line of people.

Even for IT, small company IT fella has little knowledge why telco pay 8-15 times to deploy system that will run 7x24x365 that allow them to replace the part without shutting down. The world biggest ATM server mostly run on this redundant server since 1980.

And Malaysia big banks is using such server for ATM since the 80's.
Start from year 2005, price of this kind of HA(high availability) or Telco grade server falls rapidly.

The computing worlds took 30+ years to fine tune and improve the high availability server. With the SLA (service level agreement), hardware vendor will be paying tens-hundred penalties if the server cause more than 4 hours down time.

And don't forget that BursaSaham must compliance to have EXACTLY the same specification HOT-STANDBY server ready to take over. And live server and standby server can be connected through high speed broadband to synchronise the data.

Contradict to many people thinking
, so called "backup server" are actually hot-standby, and suppose to run 24x7x365 without shutting down. So there is no question about "bring up" the server, because the standby server will never shut down.

The issues is about switching the connection trunk to the standby server. Again, this is ridiculous because the technical difficulties of smooth switching has been solved 10 years ago.

Perhaps if someone bring a C4 blow up the KLSE main data center and the standby server at offsite, then it is statistically justify to say KLSE server come down because of hardware failures.

Maverick SM said...

Moo_t,

Thanks; and I now learn a lot more about a server system. I hope you can expound more about this subject here.

bayi said...

It was convenient to the authorities to have a technical glitch. Looks like it's getting convenient to live on falsehoods whenever it's convenient to do so, from Cabinet members, politicians, bourse administrators and what have you.

What a foundation after 50 years of independence!

Maverick SM said...

Bayi,

I believe it was the concern that the market will over-react towards such a glitch that Bursa could have decided to trade only by tomorrow. I think it wasn't a bad idea because the glitch could stop the foreign fund managers from a sell-down which would hurt local retail players. Somehow, we hope that Dow Jones will be positive tonight to provide some confidence to the bourse.

Anonymous said...

Speculate la all of you. We sold the system to Bursa in the mid 90s. It was a hard disk failure. It happened before market was opened today. Even a hard disk failure could not stop the system as it is an FT architecture (and not HA as some smart alec assumed.)

Spare hard disk was brought in and the faulty one replace. But alas, the data from the other redundant disk could not be copied over. A second hard disk spare was put in and same shit.

By then, too late, market opening was nearing. So Bursa decided to go with one hard disk instead. Power up and ouch, all systems go except for the middle-ware. By then it was too late, Bursa would not risk bring up the system without the middle-ware and announced that market could not open. But not to worry, Bursa has a whole back-up system (a 50% replica off-site.) Decision was made to bring up the DR site and personnel were sent there to power-up.

Err.. Still won't work, I believe it was the network issue or something like that. Whatever it was, the DR system as a whole was not functioning as it supposed to.

After lunch, the manufacturers (HP) responded that the copy process was aborted by a corrupted file on the redundant disk. Deleted the corrupted file and the data from one disk was copied over to the other, no problem. Redundancy feature working again.

So tomorrow, Bursa will be up. And that is the inside story. (Maverickysm, you can brag about this being on your site.)

Cause of the market suspension: Stupidity and complacency. Hard disk failure only triggered a series of chain reaction that proved embarrassing for Bursa. Why stupidity, if I am not wrong, I think it happened before many years ago. Of course, our engineers now were not the same as then maybe that's why they diden pick it up. But Bursa's people also forgot?

Btw, cost of DR site - between RM 20-40 million. DR site was around since 1997. How I know? Anwar Ibrahim pushed for it then, of course to our benefit.

After more than 10 years, we now know it dun work! Shame shame. (But to give Bursa some credit, they do conduct regular DR exercise simulating system shut down or failure.)

Maverick SM said...

Anon 11:12,

Thank you for the insider information; I need to have views from other I.T. experts to know more about the system in Bursa.

Anonymous said...

Based on the comments of anon 11:12pm,

a) Bursa's systems failed despite spending millions of RM
b) Hewlett Packard should be penalised

This cannot be brushed off as just "one of those things". Millions of RM are at risk and dependant on the system (and backup systems) and Disaster Recovery operations. What is the use of the DR process if it doesn't work ?

Nonetheless, I appreciate the insights by anon 11:12pm, even though I think a 2 hour shutdown would be tolerable but nothing more than that.

Maverick SM said...

Purple Haze,

Ya, I do agree with your points; somehow, I just felt that it could be a blessing in disguise for the retail players who may have to suffer greater losses if trading did recommence after 3pm as it was expected to be a sell-down by foreign fund managers; observed the future index and you can see a 30 point down. Let's hope Dow Jones will recover tonight and we may be able to see a more stable market tomorrow.

Anonymous said...

Computer glitch/harddisk kaput? my foot, looks more like they shutdown their bloody production server!

~bornyesterday

Unknown said...

Maverick

Suspension of KLSE Trading???? Much investment done on their IT Infrastructure though.

Something fishy. NOT if the IT Engineers having BUTTOCK Fest!

Only in Kuala Lumpur......

Jefus said...
This comment has been removed by the author.
Anonymous said...

Hello Mave,

Very sorry for the misunderstanding. I misintepret your statement. I now understand very clear why the KLSE trading suspended.

Maverick SM said...

Bornyesterday,

I won't know; but I don't see any disadvantage when the market was in a volatile period.

Pui-key-Mack,

Don't be naughty eh!!!

Maverick SM said...

Wisdomthinker,

Thank you for your humbleness. I hope I wasn't too harsh.

Anonymous said...

Anon 11:12's 'inside story' brings out more questions;

1)Knowing that the computerized trading system for BursaKL is mission critical, why the system is still “ … an FT architecture (and not HA as some smart alec assumed.)”
Any IT personnel worth his/her salt will strongly recommend a TRUE false-safe system, of which HA is one of the current trench.

Unless there is something else…

2)Why the remote DR cannot come in time as the NEED arises? What sort of DR is that? Is it the case of fire hydrant NOT connecting to the water main scenario? More money down the drain, just for superficial works so that some cronies/rent-seekers can profiteer?

3)“Of course, our engineers now were not the same as then maybe that's why they diden pick it up. But Bursa's people also forgot?” Whose engineers? HP? BursaKL?
First world infrastructure, third world maintenance, coupled with ‘know-who’ & nincompoop engineers? Typical Bolihland’s same-shit-different-day symptom!

4)All the above points to glaring mis-conduct & mis-management to the highest order! For what? So that some fat cats can continue to enjoy windfalls from rent-seeking at the expense of the country’s economic wellbeing & foreign investors’ confidences

Maverick SM said...

Anon 10:30am,

Good pointers. I hope someone answer it on behalf of Bursa.

Anonymous said...

This is Anon 11:12 pm responding to Anon 10:30 am.

1. Hello, FT is fault-tolerant system, better than that stupid HA architecture. (Wanna show off your IT skills show properly la.) Replacing faulty hardware is not a big issue. But when data cannot be copied over to replacement disk at 2:00 am Thursday morning, how? Bring another hard disk la. (That also got problem.) By 4:00 am, all decided to go ahead and run the system with one leg (satu kaki patah.) Not a problem with FT systems. Hardware cold-load (HP NSK term for reboot) and platform is up. Application's turn. And middle-ware gave problem. Also not a problem, becuz got DR facilities. And Bursa IT staff were deployed to DR site. There, they also cannot get system to come up. Hardware ok, but application cannot be brought up.

Around 11 am, HP came back with the cause: swap file checksum error preventing the replication services from mirror disk to primary disk. So deleted the offending file and rebooted again and we good to go for afternoon trade. Application start-up gave problem again. They had to recall their 'expert' who was sent to DR site back to oversee the start-up procedures. By the time all systems are up, it was already 3.30 pm or so. Apa mau trade lagi?

2. Bursa cannot get DR up? After reading no 1, why do you think? Still hardware failure?

3. Mis-conduct? Only if you classify incompetency in that category. Mis-management? Aren't all failures the result of one?

Put it this way, I would not like to be in YKK's shoes. (YKK? Those of you in the industry will know what I am talking about.)

In the final analysis, all precautions were taken to design a trading system that is resilient and reliable. From the fault-tolerant machines to disaster recovery facilities that would cater for hot standby or load sharing. In reality, the systems (and nothing to do with physical hardware or software) did not work.

Planning bagus, buat tak tahu! What to do?

Maverick SM said...

Anon 2:51pm,

Thank you; now I got a better picture of the whole situation. BTW, what is YKK? Please tell me!

I hope more I.T. experts can share their knowledge here about this subject matter so that I can collate these comments and make a posting about it. Please share your knowledge, please.

Monsterball said...

The organisation I work in also has a FT system from Hewlett Packard.
Not such a big beast like the Bursa has, but the availability and criticality of the system data is just as important to us.

A heavily used hard disk will crash at some point in time. The last time we had this, our mirror hard disk took over in less than 5 minutes. We lost the last 10 minutes of data, that's all. Had to be manually re-keyed in. Even then it was a pain for me (One of my responsibilities is Head of Internal Audit) to go and audit to make sure 100% of transactions had been accounted for. Have to sign off for Board of Directors.

Obvious to me there's been a major screw up on the Fault Recovery process at Bursa Malaysia. Its not a Hardware or Software fault, per se.

You normally can't do a Disaster recovery drill on the live system, but it can be practiced on a backup system during a holiday.
You make sure your procedures will work, and actually walk through it.

Internal Audit (that's me in my organisation) has to certify that it works, so my Arse is on the line if a failure actually occurs and the recovery procedures DOESN'T work.

Anonymous said...

Maverick,

YKK is not a politician or public figure. No point in raising it here. Neither are the people from the vendors and Bursa who were tirelessly working their ass off (under tremendous pressure)to get the system up. For example, decisions not to do the replication with verification in order so complete the replication in time for market opening contributed to the cause.

The point is:

1. For an organization who is providing a critical service to the nation and has acknowledged that with appropriate expenditures on capital equipment, this event should not have happened.

2. No matter how much money you throw on technology, it still takes humans to run and manage it properly.

3. Treat the people like you treat machines and you may end up with this kinda shit.

4. Errr call Discovery Channel and do an episode for 'Seconds from Disaster'? Then all can learn from this, instead of blaming hardware failure. Hardware failure was just the trigger that cause a chain of events which led to suspension of trade the whole day.

5. If we are to develop world class nation, then we have to develop world class standards. Malaysians have a habit of tolerating failure (just look at our PM) or at best hope that the person responsible follow the footsteps of Ong Ka Ting or Chua Soi Lek.

6. Let's begin with asking the whole Board of Bursa to resign. Not because its their fault, but because its their responsibility. (The Board are full of non-industry based personality. Filling it up with new non-industry based personality is not an issue.)

Anonymous said...

This is Anon 10:30 am responding to Anon 11:12 pm,

Just to clarify, I'm too in IT & I know what FT means.

Btw FT is not the holy grail for fault-save setup. Its just the front end of a complete system like HA.

Heard of RAID-5, RAID-1, RAID 0+1, mirroring, clustering, hot-swapped replication, data protection and disaster recovery plan?

I'm sure Anon 11:12 pm do! But no need to confuse the joe M'sians about technical jagons. I'm a seasoned professional, who likes to cover All Aspects of my projects.

The fact remains that THERE IS A FAILURE WITHIN A MISSION CRITICAL OPERATION, WHICH IS SUPPOSE TO BE BACKUPED WITH A REDUNDANT SECONDARY, WHICH ALSO FAILED TO PERFORM!

Just remember - HW & SW are just the means to make the system works. People & planning are still the keys.

SO why the failure? Why was it NOT anticipated in planning state. Wasn't anticipation of such failure part of the DR plan?

Was the DR plan failed because of planning, implementation, operation & personnel?

I opine that the DR failed because of all these. On top of that there are elements of corner cutting due to profiteering of the parties involved.

This is just a sample of the scenario played out in almost all the IT projects involved with the authorities. The foreign IT companies just play alone as more work means more profit, regardless of the nincompoop that they work with.

Same-shit-different-day! Disasters waiting to happened - with money down the drain, time wasted, confidence lost & yet make up story to point finger to protect the 'maruah'!

Anonymous said...

To dear anonymous 11:12pm ,

please don't confuse public with marketing jargon. HA(High Availability) and FT(Fault tolerant) are interchangeable jargon endorsed by different vendor.

I will be amazed if the bursasaham system are not implement with RAID 1+0/5+0. For FT/HA server, the hard disk are configure as RAID 1+1+0 or 1+5+0.

Remember, it is multiple tens millions annual budget for the computer.

I am surprise to see words like "can't bring up spare hard disk". My dear. A RAID 1+0 disk storage DON'T NEED to wait for the spare disk to, it JUST RUN. What the admin need to do is replace the spoil disk, or assign a spare to replace the spoil.

Well, well, unless somebody cut the corner about the hardware spending.
Instead of implemtn a RAID 1+0 , they cut the corner and pocket money and use the so called, WTF cheapo RAID 5 system.
Or even better, when the expert tell them to implement RAID 1+5, some smart people didn't heard it properly and cut it to RAID 5.


And for the Disaster recovery (DR site), they must be using interesting 10 years old Malaysia Boleh method.

As I already mentioned, the data can DIRECTLY SYN with the standby server with today infrastructure. So there is no F*CKING "cannot bring up the standby server application", because the standby are just f*cking running.

In addition, even banks must comply the DR standard to simulate DR procedure annually.

However, after reading more story from anony commenter, if the "multiple hardware failure" things "really" happens, then only "cutting corner while profiteering on the IT purchase decision" explain it.

And that is nothing new.

Anonymous said...

After digging the comment again, Anony 10:12 might shed some light

"Speculate la all of you. We sold the system to Bursa in the mid 90s. It was a hard disk failure"

Wink, wink. A mid 90's server. But wait, according to wikipedia about RAID
" The term RAID was first defined by David A. Patterson, Garth A. Gibson and Randy Katz at the University of California, Berkeley in 1987. "

In mid 90's, disk storage price dropping and RAID technology are getting popular.

Maverick SM said...

Kittykat46,

So the Anon was right that he would not want to be in the position of YKK as CIO. And now I hear you seems to be in that position.

Anon 4:45pm,

I tend to agree with you; but I think asking the whole Board to resign would not solve the problem as a new group of Board Directors who have no passion for the job and lack of competence will make the system worse or at best, the same. I think what is needed is a strong leadership, that is, the CEO must be prepared to put in a stable and reliable system and installing a team with the necessary experience and competency, together with the necessary tools and support system.


Anon 6:25pm,

You have a good point; I strongly agree that it is entirely due to lack of adequate planning and paying more with getting less because of mark-up and commissions. Any vendors can give you a good system if honesty and integrity is utmost present.

Moo_t,

In considering all the comments and analaysis made, I gather that we can conclude that the system is short-changed by the procurement management and somebody just make some money in the hope that such disaster would not happen; and it did.

Moo_t, you may be able to expand your knowledge here for the benefit of all; and I think we should treat this discussions as a discourse. Personally, I have a lot to learn from all of you.

Anonymous said...

Sorry, cannot respond earlier. Was on vacation.

Maverick,

Asking the Board to resign, as I said, was for them to accept responsibility. Khazanah can still re-appoint them or not accept their resignation if they feel they are not to blame. Likewise, Koh, Ong or Chua resigned because they have to take the responsibility. How else are we going to educate our children about ethics?



Moo_t and Anon 6:25 pm,

Btw FT is not the holy grail for fault-save setup. Its just the front end of a complete system like HA.

Heard of RAID-5, RAID-1, RAID 0+1, mirroring, clustering, hot-swapped replication, data protection and disaster recovery plan?

I'm sure Anon 11:12 pm do! But no need to confuse the joe M'sians about technical jagons. I'm a seasoned professional, who likes to cover All Aspects of my projects.

The fact remains that THERE IS A FAILURE WITHIN A MISSION CRITICAL OPERATION, WHICH IS SUPPOSE TO BE BACKUPED WITH A REDUNDANT SECONDARY, WHICH ALSO FAILED TO PERFORM!



I am surprise to see words like "can't bring up spare hard disk". My dear. A RAID 1+0 disk storage DON'T NEED to wait for the spare disk to, it JUST RUN. What the admin need to do is replace the spoil disk, or assign a spare to replace the spoil.


Why do I want to confuse the public with jargons? And I think it is you with the 'egg in the face.' Were you people involved in that problem? Or is it like all your other postings, pandai cakap tapi bukan tau pun.

We sell both FT and HA systems. We know the difference. And we sold the system to Bursa and was part of the problem there. Now, what's the basis for your comments or opinion on their problem again?

You people cannot seem to accept when a 'system' fails? Are your mindset so narrow to assume the system is just the hardware?

Bursa's suspension of trade on Thursday was a gross failure of the systems put in place to to guarantee non-disruption of trade. Be it, hardware, recovery, communication or standard operating procedures. All of them failed on the same day. Amazing isn't it?

Maverick, you seem to be envy of the vendors or Bursa that is going to make money out of this. What is your problem? Bottom line is: Bursa that have to take the responsibility for their systems to work 24/7 whether people make money or not. (Of course, we hope no one take advantage of this.)

Anonymous said...

Addendum..

When all systems fails, then we looking at human failure. Hence, when people fail, people should take responsibility!

Otherwise, stupid establishment only know how to talk only. Compliance and corporate governance, come time to take responsibility, all tepis sini tepis sana. Cakap tak serupa bikin!

Habis cerita!

Maverick SM said...

Anon 11:11pm,

I agree to your point of taking responsibility; but I do not agree that the board of directors should resign; for resignation do not provide a solution; it is important to deal with this problem head-on and come up with a solution that seek to eliminate the problem and ensure the system is efficient and effective.

On your last para you said that I seem to envy the vendors or Bursa; I don't and I don't have to; I don't have any interest in the business and have nothing to gain by being envious. I am an investor and my interest is that Bursa had a system that is efficient and efficacious to ensure that investors can remain confident with the system. I am a researcher and my interest in the discussions of this topic is to listen to all the experts and stakeholders who are well-verse in the I.T. system and that the discussions of this subject matter provides me with good information for my usage in case exercise and case studies with my students.

Anonymous said...

Maverick,

Come up with a solution? As I said, millions were spent on planning and design of a fail-safe solution for Bursa. But it did not work on July 3rd, 2008. It worked during trial runs and rehearsals. Apa mau buat lagi?

Bursa started out very long ago as KLSE, a not-for-organization managed by industry players to provide Malaysia with a secondary market for equities. Under such scenario it is easy to understand their mission and vision. Today, Bursa is a commercial organization managed by government appointees with an added objective, make money.

They make money from the participants of the industry including people like you and me who have interest in the equity markets directly or indirectly.

Bursa made money out this business model and it is a monopoly. How can we let this failure of their mission slide? Yes, we can tolerate mistakes but not when people continue to exploit wealth at the expense of fair market.

If people understand the consequences of failure, we all would live in a better community.

Bursa Board should resign to set an example of proper corporate governance and ethics. More importantly, their resignation or termination would provide a benchmark or standard to other corporations in Malaysia so that similar mistakes, not just at Bursa but other critical businesses like TNB or TM etc will not occur again.

You say you teacher. Teach properly then. Jangan cakap tak serupa bikin.

Maverick SM said...

Anon 9:52am,

You have good points; but I may not agree with your perception on the resignation idea. As for me, my work is to deal with post-graduates and under-graduates, not business bosses.

Anonymous said...

We were all once undergraduates. How did we learn if not from the teachers.

Maybe my standards are higher but since the jobs goes with the reward, I feel that its fair. Go do some research and find out how much the CEO of Bursa makes then tell me if they deserve mercy.

Points of view, I can understand and appreciate. Standards and code of conduct is something we cannot afford to compromise. Unless we want our future generation to suffer the same fate as us.

That's is why our country now is in this shit. Cuz we Malaysian, semua bolih bincang and diaturkan.

People can talk and blog about changes, yet they fail to look at the mirror and start changing themselves first.

Anonymous said...

miley cyrus nude [url=http://www2.iuav.it/moodle/user/view.php?id=3500&course=1]miley cyrus nude[/url] miley cyrus nude [url=http://blogcastrepository.com/members/wreetgh.aspx]miley cyrus nude[/url] miley cyrus nude [url=http://my.wsbtv.com/service/displayKickPlace.kickAction?u=14041727&as=6690]miley cyrus nude[/url]