TRDDC Home

Student Projects 2006-2007: Topics

Work at TRDDC is focused in different groups, each specializing in a key area. Projects of an interdisciplinary nature are also carried out. With expertise in process engineering, software engineering tools and technologies, advanced techniques, and in systems engineering methodologies, TRDDC provides solutions within TCS and for major clients. 

The research work is academically rigorous; researchers from TRDDC regularly present their work at international symposia and publish their papers in reputed journals.

For the year 2006-07, TRDDC plans to offer projects for final year students of engineering in the following three  areas:

  •  Software Engineering

  • Business and IT alignment

  • Business and Data Analytics

1. Consistency Checking of Requirements Document

A requirements document consists of a data dictionary modeled as a business entity diagram and a detailed description of the business operations written in English. The objective is to check that the words used in the English description are consistent with the words in the data dictionary. This requires identifying nouns and verbs in the English text and checking if an entry exists for the noun or verb or its variant in the data dictionary. The tool should also check if a verb has been used consistently.


2. Development of predicate abstraction prototype tool for C programs

Many of the software engineering requirements emanating from quality assurance require some sort of model checking over the source code. Given an infinite state space of a general program, it is impossible to perform any model checking. For this purpose, an abstract of the program is required with respect to the properties of interest. The objective of this project is to create an abstraction of a given program (which will be another program) with respect to properties of interest represented as predicates over program entities.

The tool is expected to have input as the predicate in which one is interested.. These predicates would be defined in terms of existing program variables. The object is to generate another program, which is equivalent to the original one with respect to the predicates. That is, every possible path in the original program corresponds to a path in the generated program and predicate values computed in both cases are the same. The project will involve using our existing static data flow analysis capabilities to come with an as precise abstracted program as possible. This will be on similar lines as the c2bp tool of SLAM tool kit of Microsoft.

3. Design Abstractions

To represent software systems as abstract design models (devoid of program details) is a known but complex problem. With a focus on legacy systems, we are working towards defining and arriving at abstractions that represent different aspects of software systems. Example abstractions for online programs are GUI, Navigations, Services, Calculations. We aim to extract the abstractions as much automatically as possible. Given the source code of the system, challenge is to use different program analysis techniques (static data-flow analysis, constraint analysis, etc.) and combine them to build the abstractions.

Aspects of a software system that we wish to explore and build prototype extraction tools for, in this year are:

  • Use-Cases: With a basic conceptualization of ideas completed, we wish to define extraction methods based on static program analysis and implement them to develop prototype tools.
  • Components: Using prototype tools of design model extraction, we wish to explore the re-factored ‘design’ of a legacy system to identify component boundaries.

4. Java Program Analysis

To support continuous evolution of Java programs and prevent them from becoming legacy, the programs must continue to exhibit certain properties, for example, modularity, security, and performance. We propose to apply program analysis techniques on Java programs to identify programming patterns that exhibit such properties. With the evolved (modified) program, we would like to check and measure these properties, and ensure that the properties are retained.

The specific work areas that we would like to explore in this year are:

  • Java analysis: Specialize the generic program analyzer to analyze Java programs for control flow and data flow properties.
  • For few specific properties of Java programs in the area of security, we would like to build prototype tools to measure the properties and compare them for multiple versions of a set of Java programs.

5. Implementation of an architectural framework for semantically correct systems integration

Enterprises are witnessing increased thrust on collaboration and integration of existing applications to provide value-added services across the entire supply chain. Future enterprise systems are likely to be assembled from customized, off-the-shelf offerings and harvested legacy systems into a service-oriented architecture. Traditional organization of an enterprise, as a set of functionally distinct departments, leads over a time to a set of isolated applications providing point solutions each constructed for a specific purpose with context-specific built-in assumptions hard-coded in their implementation. We propose an architectural framework for integrating these disparate applications into a semantically consistent framework. The framework is based on a component abstraction that augments the existing component abstraction with data models, process models, constraints, assertions, and pre and post-conditions. The framework has three layers:

  • Enterprise layer that specifies the desired integrated system,

  • Application layer that specifies existing applications being integrated, and 

  • Integration layer that specifies the integration requirements of these applications.

A set of properties that need to be satisfied for semantically correct integration are proposed along with a set of verification techniques. The proposed architecture provides a foundation for a systematic method for executing systems integration projects.
6. Business IT Alignment

Information Technology is meant to deliver the information systems needed to achieve business objectives of an enterprise. The business domain is made up of sub domains, namely, its strategies and infrastructure to implement those strategies. Similarly the IT domain is made up of its sub-domains—its strategies and information systems—which implement those strategies to support business. These domains must be aligned so that they support each other and hence business objectives can be achieved effectively.

One of the reasons for not having this alignment today is that their specification does not exist in a single container and does not have bi-directional trace-ability. So having a single container with all specifications and with bi-directional trace-ability will help better alignment. However the actors operating within these domains have different needs and competencies. The challenge is to build a single specification mechanism with bi-directional trace-ability with usability for different kinds of actors. Ajax Web development technique provides a way to develop such a specification mechanism. 

This project would involve building a prototype Web application to capture models of multiple domains with bi-directional trace-ability, using Ajax approach. The domains of interest are business infrastructure domain and information systems domain. The modeling notations to be used for each one need to be fixed as part of the project. The required trace-ability and usability needs will have to be defined and the Web application implementing them should be developed.
7. Player performance tracking system

This project involves developing an application which implements the methodology devised by TRDDC for evaluating player performances in cricket. It involves using ball-by-ball statistics available as public-domain information and converting it to player performance indicators. It also includes development of new algorithms for normalizing these indicators over time and across matches. Another important aspect of the project is developing intuitive visualizations for these performance indicators.

Skills required/to be learnt: C++/JAVA programming, visualization technologies (e.g. Flash), statistical analysis, technical report writing.
8. Intelligent data analytics backboard

This project involves the development of a generalized architecture for facilitating data analysis. It aims at providing a visual application to the data analytics experts for rapidly analyzing a given set of data. It involves the development of a formal syntax for forming analytics execution chains, their validation, a drag-and-drop interface for data analytics library functions blocks (some of which are already available) and more importantly, an advisory system for the data analytics expert for suggesting the next steps given the nature of the data.

Skills required/to be learnt: C++/JAVA programming, statistical analysis, artificial intelligence techniques, formal specification, technical report writing.
9. Parallel Support Vector Machines using high-performance computing

Current software and/or hardware configuration limits the use of high computations and storage handling for building large SVMs. This problem deals with leveraging the state-of-the-art in the area of high performance computing (HPC) to address the computational limitations of SVMs. The solutions include parallel processing of SVMs, multi-threading, use of new software and hardware resources such as different compilers, platform-specific optimized libraries, etc.


10. Data reduction techniques for faster SVM model selection

This problem addresses the time taken to conduct model selection for SVM. The long time taken for model building is a limitation particularly in applications where large-scale databases are frequently updated. Down-sampling the large data and/or limiting the search space for parameter optimization are commonly used approaches to speed up model building. However, the speed-accuracy trade-off needs to be investigated. This problem would be aimed at investigating various techniques for data reduction to expedite model selection.


11. Data visualization

This is a general data analysis problem that involves investigation of various techniques used to display and exploration of multidimensional data. This would also include researching various tools available for data visualization. There are two broad approaches to visualization of data. One approach deals with aggregation of elements in the data into some new information using methods such as PCA, hierarchical clustering etc. The second approach deals with mapping the data elements to a two or three-dimensional space in some way. We are more interested in the second approach to data visualization. The task would be to research few such techniques.


12. Dimensionality reduction and variable selection

This problem investigates various techniques to reduce the dimensionality of data by retaining the most valuable variables. Choosing the most important attributes is key to building an efficient classification or regression model. The task would be to implement techniques like Principal Component Analysis (PCA), correlation etc., and analyze their performance by comparison on high-dimension data.

General relevant areas: Machine Learning, Pattern Recognition, Artificial Intelligence, Data Mining, Mathematical Modeling, Data Visualization, Parallel Processing, High Performance Computing, Cascading, and Multi-Threading.
13. Integration Verification

Consider an application A that wants to use component B. We are concerned about how to verify that A is using B properly and that the two are integrated properly. Whether B has been tested properly, or whether A functions correctly are separate concerns and are not addressed in this note. Moreover, it is assumed that adequate methods have been used to test the component B before it is actually used to with application A.

Initial, we only look at the situation where functions in A invoke functions in B. For now, we do not look at situations where A and B only communicate through a common state. We also assume that B in turn does not call functions in A. These dimensions of the problem will be considered after we have addressed the basic problem first.

The following are some of the common problems in integration:

  • Functions are invoked without proper parameter values

  • Function results are not interpreted correctly

  • Errors and exceptions are either not handled or improperly handled

  • Called function behaves differently from what is expected by the caller

  • Functions are not always invoked in proper sequence –leading to unexpected behaviour – often not repeatable

The impact of such problems is felt in many ways:

  • Defects may appear unexpectedly – even after an exhaustive test

  • When a defect is encountered, it is not easy to identify whether it is due to a fault in the calling program or the called function

When the defects are due to incorrect calling sequences, the cause of the problem is hard to locate.
14. Data Generation for Load and Performance Testing

Load and performance testing requires very large amounts of data – in terms of the database as well as inputs. This data has to be consistent so that the application under test will process it correctly. Also, the data has to satisfy a target profile (for example, in an hour there are 20000 withdrawal transactions, 5000 balance inquires, and 10 cheque book requests where there are 2 million accounts). Currently, this set up process can take several months. The objective is to reduce this time period and effort significantly.

This project will explore methods to set up such a synthetic database that has data consistent with the application needs and is according to a given profile.  Typical test data generation methods will not work in a real life situation. There is no product in the market that can do this.

This project will give very good exposure to databases, and needs of performance testing in a real life environment.
15. Stub Generation

Very often when we are testing our piece of code, we need someone else’s code, which may not have been developed yet. So, instead of waiting we would like to create a stub for the unavailable code and use it in our testing.

The key challenge in this project is how do we generate stubs that behave intelligently as though the actual code was available. How do we write simple specifications for a stub to meet our testing needs? This builds upon our earlier work in test data generation.
16. Mock Database

Imagine that you are a developer using MasterCraft or any model-based development tool. Lets say you have modeled your classes, written your queries and are ready to test your services or functions written in Java.

Are you faced with following issues?

  • Do you have to wait long for someone to setup a database environment for you?

  • How do you ensure that the data created in the database is done correctly?

  • Do you have to spend a lot of time to restore your database after one round of testing?

  • How do you avoid conflicts with other people testing?

  • Do you have a hard time creating data to satisfy all possible test conditions?

  • Are you able to test your exception handling mechanisms well?

What if you did not have to setup a database, but your database queries returned meaningful results as you test the service or function?  Also, what if a log is created of what database reads and writes took place to make it easy to verify what the function did.

Perhaps the initial testing that a developer does could be completed very rapidly. Perhaps a more elaborate integration type of testing could be done in a real database environment.