Wednesday, May 20, 2009

Java data structures

The followings are random thoughts on some of the key Java data structures and their importance.

Primitive Types

What are the sizes of byte, short, int, long, float, double, boolean and char? int can be either 32 bits or 64 bits. char is 16 bits (to support UTF-16 encoding of Unicode code points).

Encoding
EBCDIC (mainly on mainframes) has different encoding scheme comparing to ASCII (UNIX, PC, etc). When you FTP a file from a EBCDIC machine to a ASCII machine, an explicit conversion will happen automatically.


Byte ordering
Byte ordering can be either big Endian or little Endian. If you migrate an application to a different processor family (e.g. from SPARC to x86) and the application code has logic that depends on byte locations, transformation is needed. If you store numbers in binary format (e.g. with 4 bytes int) or ASCII characters as Unicode code points in a file and transfer the file to a machine with different byte ordering, transformation is needed again.


String

String and StringBuffer. StringBuffer should be used if the value of the string could change.

Collections Framework
There are 3 major interfaces in the Collections framework. List, Map, Set. There are specific implementations for each of the 3 major interfaces. For example, there are HashSet, TreeSet and LinkedListSet implementations for the Set interface.

All the implementations in the Collections framework are designed to be unsynchronized for performance reason. However, we can use synchronization wrappers to return a synchronized (thread-safe) version of each collection implementation. You can also refer to the external link in my comment for an interesting article on how JVMs do synchronization optimization.

Vector and Hashtable
These are legacy implementations that are synchronized. ArraryList and HashMap are the corresponding unsynchronized versions of Vector and Hashtable.

Hashcode() and equal()
The default implementation uses the address of each key to do the hashing and testing for equality. If performance is important, a custom implemenation can be evaluated.

Friday, May 15, 2009

Under the hoods of sorting

We all use library routines for sorting data. I suspect for more than 99% of applications, there is no good reason to justify any custom implementation. I did some readings to refresh some important points for the popular sorting algorithms.

Popular sorting algorithms
The most popular sorting algorithms used now are still the same as 20 years ago. Merge sort, quick sort, heap sort and insertion sort have been used for a very long time. What has changed drastically during last 20 years is the amount of physical memory we have in computers (we are spoiled now). The abundance of memory makes these internal sorting algorithms even more popular. During the times when amount of memory was limited, we had to study external sorting algorithms which rely on external I/O for large data sets.

Time performance

It is important to understand the average and worse case performance of an algorithm. For example, merge sort, quick sort and heap sort all have average O(n log n) runtime. However,
performance of quick sort degrades to O(n2) for a sorted list, with the 1st element selected as pivot.. There are workaround solutions to reduce the chance of hitting the worse case scenario.

If the time consumed in sorting is relatively small (e.g. 5% of total), it might not be worthwhile for us to tune the performance. We should review the other 95% first. Only if there was no room for improvement (highly unlikely), then we should determine if it would still worth our money to tune the remaining 5%.

We should also be aware if stable sort is important or not. Some implementations trade off performance without making stable sort a requirement. Merge sort is a stable sort. Quick sort and heap sort are not.

Memory usage
Nowadays, in most situations, memory will not be the limiting factor. However, it is still important to know the memory requirement. For example, heap sort requires O(1), quick sort requires O(log n) and merge sort requires O(n). If memory is the primary factor, your decision might change, with all three algorithms having similar average time complexity.

Language support
Java uses merge sort, quick sort or insertion sort as default for different data types and array size. Perl moved from quick sort to merge sort (Perl 5.8) recently.

Most of us also do a lot of sorting in relational database using the SQL ORDER BY construct. The obvious question would be how the database sort compares to the in-memory sorts. We can easily do some research on the internet and verify by running on production level hardware for benchmarks. The available application server hardware to perform in-memory sorts could be very different from the database servers..

Takeaways
There is always benefit in understanding some details of a library function with respect to what we are trying to solve. This will help us understand better of what we are doing.

Sorting (or other data transformation and enrichment) using parallel CPUs and computers have gained popularity in recent years. There are a lot of parallel processing algorithms and tools available. MapReduce and Hadoop are two examples.

Please let me know if you have any thoughts or different views on this.

Wednesday, May 13, 2009

Loosely coupled parallel processing and scheduling

Batch processing or data integration is mostly about loosely coupled parallel processing and scheduling. As I mentioned before, we need to understand the business problem to make the project successful.

Segregation of data
This analysis should start with a high level data flow diagram. Understanding the source of data and their availability will help us make critical design decisions. Ideally, we should have data from different sources, usable as they become available. We can design our data flow architecture using parallel processing with different servers, databases, etc. Database replication can be used to move data. A simple extraction layer can also be built to get data out from the transaction oriented database for subsequent processing.

Critical path analysis
Once we decided on the data flow, we should understand the critical paths of our processing. There will always be a few steps that would make us nervous. These are the areas that we should focus on. We need to tune their performance, add instrumentation to trend the growth, explore relevant new technologies to improve processing time or reliability.

Optimal scheduling is also critical. Unnecessary dependencies can potentially cause a much bigger impact on the availability than a poorly optimized database query. In my past experiences, some jobs could actually start 1/2 hour earlier after removing all unnecessary dependencies. No one can tune these 10 minutes database query to save 30 minutes. On the other hand, missing dependencies could also happen and it could som
etimes cause data corruption. We should always review and strive to get the optimal scheduling dependencies.

Rerunability of individual job or group of jobs
Each step in processing flow should ideally be rerunnable. This means that if the job fails, we can just restart that and continue. It will be very difficult to make decisions during a production outage to determine if it is safe to rerun one or more jobs. Worse, if they are not safe to rerun, we need to come up quickly with some ad-hoc solutions for handling the failure.

For example, if we have a script inserting data to a database, this script should have a cleanup step to delete unnecessary data before the insert. We can run the job one or ten times and the same data should result in the database.


Use of technology
Fault tolerant or self recovery is important in some cases. Think about if our infrastructure will automatically disconnect to a faulty server and retry that piece of work on another server in a retry loop. This will save the manual support of responding to a failure and manually restarting a job.

Also, in-memory database and file-based processing should be used when appropriate. Relational database is a very powerful, simple and general solution. But it might not be the best tool for a very specific problem. For example, if you need to sequentially process all the data, a file-based solution will be faster than a database solution. There is no overhead of indexing, managing transactions that a general purpose DBMS needs to do.

If multithreading is the right technology, we should always refresh the important considerations.

Tuesday, May 12, 2009

Concurrency basics

Please refer to multi-core computing for some very basic overview on the topic.

Multi-processing or multi-threading

It is easy to scale using multi-processing when there is no need to share data. For example, we can have a splitter of an input file and have multiple formatting jobs running. At the end, a merge function need to consolidate the data back into one unit. We need a scheduler to manage the dependencies and scheduling, either by a script file or a tool.

On the other hand, we can have a program that has different threads for different purposes and controlling all the dependencies and scheduling of the threads within the program. Starting a thread is also much faster and less resource intensive than starting a new process.

Threading considerations
Let's review some key areas.


Tread safety
This is the most important aspect of concurrent processing. We should not have race conditions, or data corruptions. The classic example on race condition is the simultaneous deposit/withdraw used in many textbooks. Moreover, we cannot have thread A handling the data for client A overlap and corrupt data by thread B of Client B. Think clearly on global, class, instance and local variables and their scopes and uses for each thread.

Locking mechanism

Semaphores, Mutex, Monitor, Lock.. Each one of them has different uses and its pros and cons in concurrency.
When we are using a library to implement concurrent data access, we should understand which locking mechanism they used and why.

Deadlock and livelock

We need to avoid both. We can use prevention (e.g. by ordering) or detection (and kill to recover) mechanisms.


How many threads to run

In general we should have the number of threads less than or equal to the number of CPUs. However in some cases, we can increase the number of threads somewhat if we know some threads are not CPU bound and can be interrupted for I/O or other activities.
Do some testings and performance benchmarking and determine the optimal number.

Operating system support
It is important for the kernel to have native support for multi-threading. Sometimes, the language or tool opt for user level threads instead of kernel level threads. User level threads in general cannot utilize the multiple CPUs as the kernel treats it as one process. There may also be different threading library implementations for the same operating system. In Linux, there are LinuxThreads and NPTL. In LinuxThreads (found in older Linux versions), each thread actually has a unique PID, so we need to take that into consideration in coding.

Code review, testing and logging

It is not easy to spot and catch every possible threading issues. Reviewing codes carefully, focusing on the business logic, variables scope and locking mechanism can help. Performance testing may detect some bugs due to luck. Since it is almost impossible to reproduce any production issues which involved the timing of execution of different threads, detail logging may be desired for important critical sessions.


Java
The most widely used concurrency control in Java is the "synchronized" keyword, which is a basically a monitor. However, we should only synchronized the critical sections of the codes. In the extreme case, if you synchronized one big function, all threads running that function will be serialized. You can also increase concurrency if you use wait() and notify() correctly.

Monday, May 11, 2009

Bank stress tests results

We have all heard about the headline numbers of additional capital required for 10 of the 19 bank holding companies. It is important to understand the basic assumptions and the evaluation process to get some perspectives on these numbers; the $75 billion additional capital required, the potential additional loss of $600 billion till end of 2010.

Assumptions
I am a little bit surprised that the conditions of the stress tests were not clearly listed together in headlines with the results. The macroeconomic scenarios
(considered to be worse than expected) used for the stress tests were:
  • Unemployment: 8.9% in 2009 and 10.3% in 2010
  • GDP: -3.3% in 2009 and 0.5% in 2010
  • House Price: -22% in 2009 and -7% in 2010
With these macroeconomics scenarios, the loss rates of the 12 categories of loans were projected.

Process and Methodology

With the above mentioned assumptions, each bank used its own risk models and calculated its potential lose. The same were done to project revenue, profits and cash flows to determine available capital. The 180 people federal team then audited each bank's models and results, and requested additional data for clarity. An important observation that I had - the final published results of some banks differed drastically from just 2 weeks ago. This showed how uncertain these "estimates" were.

I also read in an article that trading exposure of only 5 banks were included in the stress tests. $100 billion of trading assets seemed to be the threshold.

Questions
I only spent a couple of hours following the news on the stress test results. I am interested to know some basics of "how" these numbers were established. If I have a real need to know (which I don't now), I could drill down and analyze more on the followings.
  • Are the worst case scenarios really bad enough? A peak unemployment of 10.3% and GDP growth of 0.5% in 2010 does not seem like extreme dire scenario now. April 2009 unemployment rate already equaled the more adverse scenario in 2009 in the stress test.
  • How was the potential loss of each loan category for each bank estimated and approved by the federal team? This is the heart of the problem, how much of different assets have in each bank's book and how to value them.
  • I am also assuming the stress tests did not include any future trading risk of the 5 companies. The potential loss was all based on their existing exposure. Each company can and will change their strategy in the future. Also, is it true that only trading exposure of banks of $100 billion trading assets were included? What if a bank had $80 billion of trading assets and the bank (hypothetically) lose them all next year, that risk were not considered in the stress tests? I must be wrong here.
  • What assumptions were used to evaluate the revenue, market share, profitability and growth of each company? If we look back to the last few quarters, there were way too much surprises in the sector.

Friday, May 8, 2009

Apple: from iPod to iPhone

I have only used 2 types of smart phones - Blackberry and iPhone, so I may not have a full complete view on this.

iPhone 3G
I now use my iPhone for cell phone calls, VoIP phone calls, email, iPod music player, GPS, news reader (Bloomberg, NY Times, Wall Street Journal), checking weather, internet radio, calendar appointments, some occasional casual games (with good motion sensing controls), and entertaining my 2 daughters when needed.

The web browser is nice, but I use it less frequent now as most websites have a native application available. I also use the camera, it is a decent backup. Then, there is a voice recorder, Chinese/English translator, Wi-Fi keyboard/mouse to control my Mac mini hooked up to my TV. I also started reading some very basic Amazon kindle books on the iPhone (but the screen is just too small for serious reading).

Vision by Apple
An extremely user friendly touch screen, multi-touch input method is the heart of the iPhone or iPod touch. Adding an always on 3G Internet connection and the well thought out App Store, the possibilities are just endless. The included iPod was an early hit, and it will continue to be.

Technologies

There are now so many similar devices that used solely the touch screen for inputs, but it was Apple that pioneered this niche. Who would have thought that there are now more than 17 millions iPhones (adding ~15 millions iPod Touches) worldwide in less than 2 years.
The user interface is the key technology differentiator for the iPhone. The business model for the App Store is amazing. It attracts so many individual developers to publish very low cost applications (many are free). Recently, Apple is heavily promoting the 1 billion application downloads from about 40,000 applications.

Wish List
Adding a turn-by-turn voice prompt GPS and a video camcorder to the iPhone would make it the "one and only" device you needed to carry. Both features should be feasible as software only updates. Improve battery life will also help, as all of us are using the device more and more. Background apps, if it is done right without impacting battery life, could also open up a lot of different opportunities for the iPhone to act as an agent to remind you on everything.

Of course, with Google (Android), Palm (Pre) and Research in Motion (Blackberry) also targeting the same market, more innovations will come. We as consumers will benefit from this healthy competition.

Thursday, May 7, 2009

How to lead a team

I worked for a company, like many other companies, that has an annual anonymous 360 degrees performance review process. I have inputs from my direct reports, peers and managers throughout the years. I have been managing different teams for the past 9 years. I have worked with independent consultants, outsourced consultants (both on-site and offshore), full time employees and summer internship employees.

Flexibility and Integrity
A manager, in many ways, should be called a coach. Like a professional sports coach, a good manager uses his/her experiences to guide and motivate everyone and to get the best out of the team. It is a little bit more complex in the business world as we are generally dealing with people with diverse cultural backgrounds, different personal goals and values, and experience and skill levels. There are also differences by age groups in how they perceive work and family. There is no simple answer on what work best, the coach needs to adapt based on the team structure, the company culture and your upper management style. Consultants and employees generally have different objectives and motivations and need different coaching styles.

These days, every company has very aggressive schedules and deadlines. Every level of management feel the same pressure and trying to do more with less. We have to lead the team as fair as possible while balancing the company needs.

Skills and knowledge
Let's start with good observation and listening skills. It is so important to know your team well, what each person's strengths, interests, and their longer term ambitions are.

Motivation is also at the top of the list. It is best of both worlds when an employee is motivated. He/she will automatically line up his/her interest with yours and the company. Like getting a "winner" in professional sports, this is the type of person that performs best when needed. Giving credits and not taking credit away from anyone is a key principle that I use. I find it very useful in building up trust, which will in turn motivate the whole team and create success and recognition for everyone.

A quick note on remote and global teams. From my past experiences, it took commitments and personal sacrifices to motivate the remote team. You had to show the remote team members that you genuinely treated them as part of the team. Many companies have offshore teams, but the real productivity and success is in the hand of the day to day execution of the direct manager.

I am a true believer of "lead by example" and delegation. You should not preach on something you would not do yourself if you were in their roles. If you are a team lead or manager, it is ineffective for you to have the time to know the details of all the work of everyone. You have your own responsibility. So, having a big picture view, knowing the status of the team's work, and have the ability and interest to drill down on the details as needed may be a winning combination.


As a manager, you need to be able to explain the vision to the team and to help them make good decisions. In career development, you should help them grow by pointing out the important skills they need to acquire to succeed. Open and honest feedback, if done right, will help the employee grows.

Matching up assignments to available team members require careful plannings. There are assignments that most people would like to do, and there are always some areas that are perceived as grunt work, but are critical and have to be done.

Be a good team member
Besides working with your team, remember you are also part of your manager's team. All the points we discussed here also apply to your manager. You have to play the role of a good team member too.You need to be flexible and support the goals with your management.
Put yourself in their shoes and you will probably understand and appreciate their actions more.

Wednesday, May 6, 2009

OTC derivatives

Nowadays, when you turn on CNN, CNBC or reading the news on the internet, the chances are you will hear something about TARP and the bank stress tests. You may also hear about CDO, CDS etc. These financial products certainly get some bad representation. Let's try to get some basics of what they are and some of their properties.

OTC and Derivatives
The common exchange-traded financial products, like stocks, bonds, futures, currencies are traded through an broker via an exchange. The type of an exchange can be physical (e.g. part of NYSE) or electronic (e.g. NASDAQ). One of the main advantage for using an exchange is to eliminate counter-party risks. It also improves market liquidity.

When we have an over-the-counter (OTC) contract, it is purely between 2 parties. Broker A can sell a contract tracking the performance of one or more financial products to Client B. They are binded by the terms of the legal agreement. Because of this counter-party risk, there is always the need of collateral. Periodic mark-to-market and contract resets are also used to determine if there is any need of additional collateral postings.

Derivative is a general term that say a product is traded based on some other underlying product(s), or it's value is derived from some product(s). A call option on IBM is a derivative that is derived from IBM. Futures and Swaps are some other example derivatives.

Similar to exchange-traded derivatives, OTC derivatives can be used to speculate or hedge. The leverage ratio of these contracts can also vary. The buyer may not need to come up with all the principles for the contract, the seller can determine what risk it is willing to take and only require a percentage to be posted as collateral.

It is how a company uses these financial instruments that makes them risky. For example, if a company buys a Credit Default Swap (CDS) to hedge the chance of a company default event for one for their junk bond holding, it will stabilize its portfolio. Of course, now they have to think about if they want to hedge the default risk of the counter-party writing the CDS.

Integrated Reporting of OTC Derivatives
There are lots of non-standard attributes for different OTC Derivatives products. If we understand what some of the key attributes and their meanings are, we can design our processing flows accordingly. I am not going to cover all the different attributes here, but would like to point out that there are usually 3 main components. The "exposure" part, the "interest" part, and the "collateral" part. In contrasts, the exchange-traded products usually only contain the "exposure" attributes.

For example, if we do not need the collateral and interest rate information on the OTC derivatives for some custody or accounting reports, they can be optional attributes for those applications.

Tuesday, May 5, 2009

From HTTP/HTML to Web 2.0

The internet was revolutionary. It changed our lives as individuals. Companies changed strategies to adapt to this new channel. Let's look at how the technologies evolved since the 1990s.

Early Days
Netscape made the first widely available public web browser in late 1994, named Navigator. It started the internet revolution then. The worldwide network used for email and other electronic file exchanges served as the internet backbone. This universal connectivity was critical for the success of the internet.

The browsers and the world wide web servers exchange information through the HTTP protocol. HTTP was a stateless asynchronous communication channel. The transport was mostly over TCP/IP. The rendering was done in a very simple markup language, HTML. Besides rendering text and images, HTML was mostly about hyperlinks and forms. The beauty of HTML was the simplicity.

The Document Object Model (i.e. sandbox) of the popular browsers were also critical in the exponential growth of the internet. What you did from your browsers were totally safe, the code cannot do anything harmful to your computer or secretly retrieve information from your computer.

Mass adoption
In the mean time, consumers started to subscribe to dial-up connections via cable companies or AOL, hooking up directly to the internet for the first time from their home. Everyone had gradually caught on to the internet. Companies recognized that they can use the platform to launch new businesses and extend their existing business. Then it came the dot-com boom, and eventually dot-com crash. Amazon, Ebay and Yahoo are a few of the successful companies born during that period and still remain as dominant forces now.

Extensions like applets, ActiveX controls, Netscape Plug-ins, Netscape Communicator channels, Adobe Flash were introduced to achieve a richer and more user friendly environments. There were a lot of hypes that the browsers would took over the operating systems during the dot-com boom also. The main driver for these new technologies were to gain respective market share for their company. You can make your own judgment on how most of these non-standard technologies sustained over time.

Web 2.0

Nowadays, the internet is used mostly as an interactive and collaborative tool.. The request/response model and the simplistic HTML rendering that worked very well in the early days also showed signs of aging. AJAX, or Asynchronous Javascript And XML, is the buzzword now, with some credibility. It enables web pages to have more dynamic contents without the need of hitting "refresh" (i.e. think about type-ahead form suggestions and continuous updates to stock quotes). It also allows layers of information on top of each other (i.e. think about Google Maps). Mashup also seems to be gaining good momentum.

Tools also played a big role in the Web 2.0. On the server and content side, now we have so many frameworks to choose from. Aspect-Oriented Programming style of Spring framework combines inversion of controls and dependency injections. The details will be in another post. Microsoft has the ASP.NET and C# framework, mainly to help people in the Microsoft camps to do their work easier. Then there is the high level Rudy on Rails. Lastly don't forget the still popular Struts, servlets MVC frameworks. There are also a lot of web servers, database access and caching tools and technolgies to choose from also. There is no one size fits all in selecting the right technologies. Each company will need to access their unique requirement and their existing infrastructure and IT staffs' skill sets to determine the best technologies to use.

On the presentation side, sharing information are becoming bigger and better. New blogs, tweets, social networking sites are spawning up very quickly. With the availability of these user friendly sites, wider group of users can effectively share knowledge and information. This is certainly good use of technologies. I am very interested to see what we have in 5 or 10 years from now.

Monday, May 4, 2009

Data integration challenges

Data is everywhere and is expanding at a very fast pace. Companies are dealing with gigabytes and terabytes of data.

Business Needs
Every business wants the final processed data to be accurate and delivered on time. As a technologist working very closely with the business to serve our clients, I fully understand that.

It is the responsibility of the technology department to point out the complexity (i.e. cost) and potential problems (i.e. cost again) so that contingency plans can be agreed upon and established. This will take some serious efforts and trust to achieve as each team have different domain knowledge and perspectives. If the business team does not understand what is technically feasible or not, they will naturally try to ask for more. If the technology team does not understand the business drivers, they will focus on the wrong problem to solve. Ultimately, only if the business and technology staffs can truely work together as one team, the results can be substantially better and can create a huge competitive advantage for the company.

We absolutely do not want to over-engineer for the not-so-critical functions or exceptional cases. However, we want to make sure we use the best and effective technology to handle the most critical scenarios.

Technologies
We need to think about data formats, processing methods, hardware and network bandwidth, scalability, data integrity checks, contingency sources, etc.. Let me touch upon a little bit of each.
  • Formats - XML, flat files, relational database, other industry standards (e.g. FIX, SWIFT, FpML)
  • Processing methods - when we should use pre-processors, in-memory databases, replications, DOM vs SAX parsers for XML. Archiving, compression, purging schemes. Design from transaction processing to data warehouse. Database normalization and performance tuning.
  • Hardware/Network bandwidth - is CPU, memory or I/O the bottleneck? NAS/SAN or local disk implications.
  • Scalability - how can the infrastructure scale to the anticipated growth (2x, 10x, or 100x)? Based on realistic projections, we can make very significant different design decisions. Can we just horizontally scale the application by adding more instances or hardware? Some designs will NOT allow us to do that.
  • Data Integrity checks - sanity checks, row count checks, mandatory vs optional fields.
  • Contingency - critical path analysis, checkpoints and how to partial re-run batch, alternative data source or algorithms.
It takes a lot of plannings to make things right. And it takes only one unanticipated problem to create a critical problem. It is also an on-going project for any data integration process. Volume growth, new data sources, and new use cases will test your initial design and see how flexible and extensible it really is.

I will drill down into more details for some of the technical considerations listed above in future posts.

Friday, May 1, 2009

Netbook has a winning formula

I have been using a 1.66GHz, 10 inch Netbook since March. Besides using it on the road, the Netbook is also my favorite computer at home for most simple tasks.

Understanding the basics
Battery life and weight should rank at the top of most people's priority list for simple on-the-road computing device. So, what about a "laptop" with 7 hours battery life (with Wi-Fi on), and weight only about 3 pounds?

Creative packaging
The model of my netbook is ASUS Eee PC 1000HE. It uses a low power consumption, hyper-threading Intel processor. It uses Windows XP instead of Vista, and has a 1024-pixels horizon screen (wide enough for most web pages). The chiclet (Apple style) keyboard is very nice too at 92% full size. The performance button can overclock or underclock the CPU as needed.. The 2 fingers scrolling on the touch pad is very user friendly. To round up a nice netbook, ASUS includes a 160GB hard drive, 1 GB memory, 802.11n Wi-Fi and Bluetooth.

Drawbacks and workarounds
I have some slight disappointment with the 600 pixels vertical display. It is acceptable once I switched to the full screen mode for most programs. To work around the missing DVD drive, I mounted the DVD drive of my Vista laptop to this netbook over Wi-Fi, for the occasional use to install programs.

Conclusion

This new invention is quickly gaining more market share over traditional laptops. It will continue to get better and be more powerful.

Thursday, April 30, 2009

Cloud Computing

I attended a financial services industry forum on Cloud Computing today. This is a very interesting concept.

Since I have no actual work experience in implementing any production cloud computing application, I will list out some very general points.

What is cloud computing?
It varies by whom you talk to. One way for me to look at cloud computing is as a Platform-as-a-Service. It hides all the infrastructure details inside a cloud and let the application team focus on the business logic they need to address. The cloud can be located inside or outside of the company. If you decide to use a public cloud, you can think of outsourcing the infrastructure. Amazon's EC2, using virtualization to support a specific instance of a specific OS is one example of a popular public cloud.

Attractiveness of cloud computing
The capacity needs can be dynamic. If a company needs to run a computational intensive process once in a while, they can "rent" some CPU cycles from the computing farm once a month.

For the public cloud model, each company also needs to review whether a particular service level agreement is acceptable. For example, are you satisfied if you can get a refund of the service fee if a company does not meet its SLA of 99.95% uptime per year, or about 4 hours downtime per year? Can you afford even longer (however unlikely) outage without any direct control in fixing the problem? Will having a different backup cloud as a business continuity plan good enough? Is it justified for you to manage your own infrastructure?

Getting back to basics
We still need to address the same problems we have. What are the business use cases, processes, and data. What functions we should move to the "cloud" and why it will be a more effective solution. We also need to address the typical questions on data transfer throughput, latency, authentication, privacy etc... Having cloud computing framework can give you another tool as a means to an end. The end can be cost saving, reducing operating complexity or load balancing between multiple applications. The due diligence will be, what is the advantages of picking this technology over others.

I will discuss my view on this topic with more specific examples and technical details in a future blog.

Wednesday, April 29, 2009

Hedge Funds and Prime Brokerage

This is a very high level overview of Hedge Funds and Prime Brokerage, with information that you can find on the web.

What is a hedge fund?
A hedge fund can use leverage, trade multiple asset classes, and have short positions. They are usually targeting high net worth individuals and charge both a service fee and a performance fee.

Growth of hedge funds
There are many different measures, but roughly total assets for all hedge funds grew to a peak of around $2 trillion in 2008, from about $0.3 trillion in 2000. For comparison, US GDP is about $14 trillion and TARP at $0.7 trillion.

The biggest hedge funds are huge. The top 10 global hedge funds each have over $10 billion in assets.

Prime Brokerage business model
In the highest level, Prime Brokers help their clients, the hedge funds, in clearing and settlement, consolidated reporting, financing and securities lending. There are other services like capital introduction, technology start up consulting, fund administration etc.

Revenues for Prime Brokers are mainly earned from spreads in long and short balance, transaction fees for clearing, and securities lending. A Prime Broker can also help bring in additional revenues to the company's various trading desks with the understanding of the clients needs.

Technology needs
Technology is an integral part of the Prime Brokerage business.

A Prime Brokerage firm uses web, FTP, FIX, SWIFT, FpML and other standards for client interfaces. It leverages firm wide referential data like pricing, product and account information. It interfaces with front offices, middle offices and back offices for position keeping, pricing and settlement. It works with other firm wide system for risk, compliance and funding. PB technology also provides margin, client risk, accounting, custody and data integration functions. It provides post-execution trade processing and allocations. It interfaces with the firm's general ledger and client sub ledger systems. All of these are done to support the global markets with large daily trading volume. Data snapshots are also needed to generate month-end reports.

It is very critical for a prime broker to provide accurate and timely information to all of their global clients in their local timezone. For example, a London based hedge fund should get the consolidated reports and data feeds for all their positions by local 7am, or 2am EST. A Tokyo based hedge fund will need their data by 7am, or 6pm EST. Different strategies and technical designs can be used to address this business requirement by different Prime Brokers.

In a nutshell, technology projects in PB are done to satisfy any of the following business needs
  • Improve operation efficiency
  • Scalability of systems to maintain SLAs with clients ongoing volume growth
  • Improve business offerings to retain existing clients and win new business
  • Responding to new industry initiative and trends
  • Internal risk measurement and control
  • Support and serving existing clients
Prime Brokerage is an exciting and demanding client driven business.

Tuesday, April 28, 2009

64-bit and multi-core computing

All software needs to be run on hardware. CPU can be 64 bits and multi-core. In order to fully utilize the upgraded hardware, we will need OS support and some knowledge in high level application coding to take advantage of the new hardware.

64-bit computing
32-bit computing is fine if you don't have any specific needs. In my very simplified view, there are 2 main considerations to move to 64-bit computing.
  • Addressable virtual memory address space. This is my #1 reason. If ~2GB is good enough for your application (after subtracting the memory footprint of the OS and VM from 4GB), stay put until there is another reason.
  • If you need bigger storage or can benefits from faster computation of larger 64-bit registries.
Multi-core computing
  • Performance is the most critical consideration here. Given the slow down of CPU clock speed improvement, and machine language processing efficiency, we have to rely on multi-processing or multi-threading to improve performance.
Another important side note, different processors with different architecture will perform differently with the same clock speed. Mainly due to how many cycles it needs to process different instructions

There are a whole new list of concurrency issues that we need to be aware of when we start to do multi-core concurrent processing. This is very different than using time-slicing to achieve
virtual concurrency in single-core processors. If you are interested, look up more on kernel threads, LWP and the (old) green threads.

Monday, April 27, 2009

Application development and team structure

Application development, or software engineering, is a creative activity that happened to require a lot of different technical skills.

Structure of application development team
In a typical project, there will be roles like business analyst, project manager, developer, quality assurance tester, and support engineer, etc. All of these are usually classified as IT job functions. There are additional business job functions associated with typical application development projects. For example, product development, graphic designs and marketing.

Each company or industry tackles this differently. Usually, smaller companies triy to have a person doing more than one function, while bigger companies have different teams for each function.

Most of the time, the structures are standardized within the company or organization and will not vary that much across different projects.

Divide and Conquer?
No matter what role(s) you are playing, understanding the overall structure will help you appreciate what others are doing and let you leverage any help from them.

Different skills and personalites are needed for each function. A business analyst needs to have excellent business domain knowledge, a project manager needs to be very organized, a software developer needs to understand the necessary technologies, a tester needs to be thorough, a support engineer needs to be able to handle pressure and be creative, etc...

But we should remember, bringing in more people also means more interactions and communications. So, it is very critical to make sure the added expertise outweighs the overhead of having multiple people understanding the common background of a project, which is inevitable. Setting up clear responsibilities and accountabilities are also critical.

Personally, I believe that having a person (or team) involved with multiple functions will be more effective overall, mainly due to the fact of having an "end-to-end" responsibility. However, it will be more difficult to find people with all the right skill sets (and willingness) to do multiple functions. It is also much more difficult to scale to larger projects.

From Telecommunication to Banking

Fresh out of college, I joined AT&T. Like a lot of college graduates at that time, I was attracted to the company due to Bell Labs and their invention of UNIX and the C Programming language.

The position that I started with was a System Engineer in the Billing Planning team for Long Distance Services. The main role for this position was analysis and planning.

After about 6 years with the company, an exciting opportunity came up for me to work for Goldman Sachs. The reason I was hired obviously was my UNIX and C++ programming skills.

I ended up spending my next 12 years with the company, acquiring all the business knowledge that I need in the fascinating Prime Brokerage business, servicing hedge fund clients.

Each person's career journey is unique. For me, looking back, every step along the way prepares me for the next challenge and helps me appreciate them more.

AT&T / Lucent Technologies Bell Labs
In order for me to write requirements to introduce new enhanced long distance services, I learned about all the high level network architecture, 4ESS/5ESS network switches, Call Detail Recording message layouts. I did not fully appreciate the work initially. However, this position gave me a unique early career big picture perspective.

I moved on to work on software development for the Digital Cross Connect system. I was excited to work on the Database and concurrent data access using various locking schemes. I then transferred to the Hybrid Fiber Coax system, doing C++ coding, using a real-time OS.

Goldman Sachs
I remember my excitement, with a little bit nervousness when I started with Goldman Sachs. It was nothing close to what I had been doing, including company culture. I settled down quickly and a new opportunity came up. I was asked to work on the client-facing website for Prime Brokerage. It was the early days for the Internet. I studied all the latest technologies - HTTP protocol, CGI, HTML, authentication, PGP encryption and successfully released the first firm wide integrated offering. The new technologies enabled reporting and analytics for our hedge fund clients, many still using dial-up connections to Goldman Sachs.

Fast forward to last few years, I learned about Contract For Differences, Credit Default Swaps and many other financial products, and how we can integrate all these products to the core Prime Brokerage offerings. I was also the first in the organization to lead an offshore India development team in delivering the core offerings.

All of these roles are very different. That's the beauty of it. I used previous experience and mastered more new knowledge along the way. Like a snowball that grows bigger and bigger with much diversified skills inside.

Friday, April 24, 2009

Technology: More revolutions or evolutions?

Let's use 20 years, 1990 to 2010. Give it some thought before you read on. I will focus more on the software side of technology here. My view may not be the one you have.

If you look at the Computer Science curriculum at Universities, fundamental courses like Operating Systems, Programming Languages, Networking, Databases, Software Engineering that were taught in 1990s are still the core offerings now at school.

My main point of this discussion is it is always important for us working in the software industry to have a solid understanding of the fundamentals. They are like a good foundation of building a nice house. It is also good to know that most of our knowledge has a very long shelf life if you believe that software technologies are mostly evolutions. We always learn new technologies to add to our solid foundations.

Nowadays, to improve productivity, there are a lot of high level tools, languages, frameworks that help shield all the glory details. This works perfectly most of the time for day to day tasks. But the fundamentals are always critical when you try to address some specific and difficult problems. I will spend more time in subsequent posts for some important examples in my mind.

Revolutions
  • Internet. By internet, I mean HTTP and HTML. Even with all the latest Web 2.0 advances, we are not replacing HTTP and HTML.
  • Smart Phones/Wireless. They are really highly portable "computers". Who would have imagined 20 years ago of what we can do nowadays on a "portable telephone"?
There are definitely other revolutionary software related technologies, but what I am getting into is that there are not a lot of them, considering the long time frame.

Evolutions
  • Operating Systems. Think about UNIX, from System V at AT&T to open source Linux now. How much fundamental changes are in the Kernel design, and the Shell?
  • Programming Languages. C#, Ruby, Java, are they all similar to C++ by using object oriented concepts.
  • Networking. TCP/IP, ethernet. They are still used for in networking.
  • Databases. Relational technology, how different it is now than past decade?
So when the next new hot technology come to your mind, ask yourself this question. Is this truly a revolution? Or is it just an evolution that can help us solve the same problem better and more effectively?

Hardware is getting so much cheaper and more powerful that it enables all these revolutions and evolutions to happen. In my mind, if microprocessors power remains constant, the software industry will sure need more "revolutions" to make all the progress that we made in past 2 decades. However, this trend is changing in my mind, so concurrency will become much more important in the future to leverage the multi-core processors power. Another future topic.


All of these tie back to my philosophy, using the right technology to innovate a business or solve a problem. Next week, I will touch on some examples on the other 2 areas, business knowledge and management.

I am glad to hear your different viewpoints. Have a good weekend!

Thursday, April 23, 2009

Role of technology in business

Welcome! In this blog, I am going to discuss my own experience about how to use technology effectively as business differentiators. It is obvious that good use of technology can create new business opportunities, improve operation efficiencies, contribute to the bottom line and increase competitive advantage.

I am very fortunate to have the chance to work on many business critical initiatives with talented colleagues and managers. I would like to say thank you to everyone that I have worked with. I have picked up tremendous amount of business domain knowledge and have learned about various project and team management styles also.

So, what are the areas I will touch on?

  • Significance of different technologies. Without good understanding, we cannot effectively evaluate how and when to use them as tools to create business values.
  • Project and team management. How to get things done is obviously critical.
  • Business domain knowledge. Ultimately, the reward of the hard work of a successful software project is to deliver something important and of good value with respect to the time we spend.
  • Something fun about consumer technologies (once in a while!). Recently I am impressed with the iPhone and the Netbook that I purchased. This also shows how quickly technology landscape can change.
Everything I am going to write will be based on my own principles and values. I welcome any comments.

Significance of different technologies
This list will change over time. Every couple of years, new buzzwords come up and old ones fade. I will discuss my views on some of these technologies in the future blogs. Java/J2EE, ASP.NET/C#, CGI, Servlets, Struts, Spring, Ruby on Rails, inversion of control, dependency injection, Web 2.0, AJAX, XHTML, multi-processor architecture, concurrency and thread safety, UNIX family (Solaris/Linux/Mac OS X), POSIX, XML, Sybase performance, database normalization, Object Oriented Programming, modeling tools (UML, ER diagrams), SOA, SaaS, Web Services (SOAP, REST), cloud computing, grid computing, ETL, OLAP, OLTP, data warehouse, NAS, SAN, NFS, ext3, 32-bit versus 64-bit computing. Going on and on, you name them.

Project and team management
How do we fairly and objectively size a software project? How do we motivate the whole team and maximize the moral and productivity? When should we use agile, scrum, extreme programming, and traditional waterfall model? What team structure works best to satisfy different functions of business analysis, project management, application development, quality assurance and code migration, support and customer feedback? What about the concept of technical debt? How to help individual team member grow? Company culture and geographic composition of team are also interesting to understand.

Business domain knowledge
Are we using a technology because it is cool? Remember when everyone wants to use EJB? We need to be able to explain why a particular technology is the best tool to a business problem. Without a deep understanding on what we want to achieve and why we are doing it, we may focus our energy and precious time ineffectively. We can get the best technical expert in tuning the performance of a complex database and queries, but do we need all the data, and can we optimized the amount of data? What is the impact and fallback plan if we do not get the latest pricing for some exotic products versus pricing for some very common stocks? How will portfolio margining help the business? I might also write on industry specific news and their technology implications.

Thank you everyone for taking the time to read my first blog. Please let me know if you have any feedback.

Sincerely,

Timur King