Business Needs
Every business wants the final processed data to be accurate and delivered on time. As a technologist working very closely with the business to serve our clients, I fully understand that.
It is the responsibility of the technology department to point out the complexity (i.e. cost) and potential problems (i.e. cost again) so that contingency plans can be agreed upon and established. This will take some serious efforts and trust to achieve as each team have different domain knowledge and perspectives. If the business team does not understand what is technically feasible or not, they will naturally try to ask for more. If the technology team does not understand the business drivers, they will focus on the wrong problem to solve. Ultimately, only if the business and technology staffs can truely work together as one team, the results can be substantially better and can create a huge competitive advantage for the company.
We absolutely do not want to over-engineer for the not-so-critical functions or exceptional cases. However, we want to make sure we use the best and effective technology to handle the most critical scenarios.
Technologies
We need to think about data formats, processing methods, hardware and network bandwidth, scalability, data integrity checks, contingency sources, etc.. Let me touch upon a little bit of each.
- Formats - XML, flat files, relational database, other industry standards (e.g. FIX, SWIFT, FpML)
- Processing methods - when we should use pre-processors, in-memory databases, replications, DOM vs SAX parsers for XML. Archiving, compression, purging schemes. Design from transaction processing to data warehouse. Database normalization and performance tuning.
- Hardware/Network bandwidth - is CPU, memory or I/O the bottleneck? NAS/SAN or local disk implications.
- Scalability - how can the infrastructure scale to the anticipated growth (2x, 10x, or 100x)? Based on realistic projections, we can make very significant different design decisions. Can we just horizontally scale the application by adding more instances or hardware? Some designs will NOT allow us to do that.
- Data Integrity checks - sanity checks, row count checks, mandatory vs optional fields.
- Contingency - critical path analysis, checkpoints and how to partial re-run batch, alternative data source or algorithms.
I will drill down into more details for some of the technical considerations listed above in future posts.
No comments:
Post a Comment