Projects
![]()
R Dugad, K Ratakonda
Given a video sequence, identify those frames that represent changes in the parts of the scene being imaged
Read more...
![]()
Combine spatial and temporal filtering to suppress noise in each frame of a video sequence
Linear Transforms over arbitrary supports
![]()
R Dugad, K Ratakonda
We present a novel iterative approach to define any multidimensional linear transform over an arbitrary shape given that we know its definition over a hypercube
![]()
R Dugad, K Ratakonda
To develop a watermarking method that does not use the original image for watermark detection and resilient to image quality degradation
Transform Domain Magnification or Superresolution
![]()
R Dugad, K Ratakonda
To develop fast algorithms for magnification or demagnification of a compressed image by a given amount directly in the compressed image format
![]()
Given a video sequence with missing frames, generate the missing frames by interpolating nearby available frames
Read more...
Segmentation Based Video Coding
![]()
K Ratakonda , S Yoon
Given a video sequence and a hierarchical segmentation representing the natural multiscale spatiotemporal structure, identify the interframe redundancy for efficient video coding which is adaptive to the desired level of coded detail
Structure Based Image Denoising
![]()
P Ishwar, P Moulin, K Ratakonda, M Singh
Given an image and its hierarchical segmentation representing its natural multiscale structure, use the knowledge of the image regions to smooth out the noise in the region interiors without blurring borders
Structure Based Image Magnification or Superresolution
![]()
K Ratakonda
Given an image and its hierarchical segmentation representing its natural multiscale structure, use the explicit, known geometry for image scaling, e.g., image expansion for superresolution
Details:
Structure Based Image Compression
![]()
K Ratakonda
Given an image and its hierarchical segmentation representing its natural multiscale structure, use the compactness of the structural description for best image compression
Video Encoding using Coset Codes

A. Sehgal, A. Jagmohan, N. Ahuja
This project deals with scalable coding and robust Internet streaming of predictively encoded media. We frame the problem of predictive coding as a variant of the Wyner-Ziv problem in Information theory. Subsequently, LDPC based coset code constructions are used to compress the media in a scalable, error-resilient manner.
Read more...
Compression of Image-based Rendering Data


A. Jagmohan, A. Sehgal, N. Ahuja
The design of compression techniques for streaming of image-based rendering data to remote viewers. A compression algorithm based on the use of Wyner-Ziv codes is proposed, which satisfies the key constraints for IBR streaming, namely those of random access for interactivity, and precompression.
Read more...
Multiscale Texture Element Detection
In an image containing texture elements at a range of scales, detect all elements, their relative locations and mutual containment relationships.
Given the image of a homogeneously textured planar surface at unknown orientation relative to the camera, and the output of a multiscale image region detector, estimate the surface orientation.
Non-Lambertian Surface Reconstruction and Reflectance Modeling
Non-lambertian surfaces causes difficulties for many stereo systems. We describe methods to recover both 3D surface shape and reflectance models of an object from multiple views. We use an iterative method, based on multi-view shape from shading, to estimate shape and reflectance models. The estimated models can be used to generate objects in new views and under new lighting conditions using computer graphics techniques.
3D Surfaces and Illumination from Stereo and Shading
Given multiple images of a scene, estimate the scene surfaces, illumination and/or reflectance map.
3D surface from multiple views
Given multiple calibrated pictures of a real world object captured from different viewpoints, reconstruct a three-dimensional model of the object.
Dense Stereo Maping Using Kernel Maximum Likelihood Estimation
A robust stereo matching algorithm using kernel representation of the probability density functions (pdf’s) of the sources that generate the stereoscopic images. Matching is done using either a Maximum Likelihood framework or using correlation in the pdf domain and an MRF prior to model the disparity function.
Surfaces from Binocular Spatial Stereo
![]()
Given multiple images of a scene, taken from multiple cameras and different viewpoints, find the 3D depth map and surfaces
Bandwidth Selection for Kernel Density Estimators

A regression-based model which admits a realistic framework for automatically choosing bandwidth parameters which minimizes a global error criterion. This is used for automatic segmentation of images at any input resolution scale (for e.g., the wavelet decomposition scale).
Read more...
Estimation and Segmentation of Images Using Parametric Image Models

Models of spatial variation in images are central to a large number of low-level computer vision problems including egmentation, registration, and 3D structure detection. Often, images are represented using parametric models to characterize (noise-free) image variation, and, additive noise. However, the noise model may be unknown and parametric models may only be valid on individual segments of the image. Consequently, we model noise using a nonparametric kernel density estimation framework and use a locally or globally linear parametric model to represent the noise-free image pattern. This results in a ovel, robust, redescending, M- parameter estimator for the above image model which we call the Kernel Maximum Likelihood estimator (KML). We also provide a provably convergent, iterative algorithm for the resultant optimization problem. The estimation framework is empirically validated on synthetic data and applied to the task of range image segmentation.
Tele-collaboration in interactive augmented environments
To demonstrate the featured capabilities of the HMPD technology, and explore its application for distance collaboration in interactive augmented environments
Head-mounted projective display technology
To optimize a novel visualization device referred to as head-mounted projective display (HMPD), and develop a multi-user interactive workbench with tele-presence capability
Read more...
Object Category Modeling using Interest Points
![]() |
An automatic object detection, localization and segmentation system is proposed for object categories. Object categories are modelled as templates of patches around interest points, encoding both location and appearance information. The automatic segmentation algorithm integrates the localization information with the edge information in the image. |
Extracting subimages of an unknown category from a set of images
| Given a set of images, possibly containing objects from an unknown category, determine if a category is present. If a category is present, learn spatial and photometric model of the category. Given an unseen image, segment all occurrences of the category. |
Sparse Lumigraph Relighting by Illumination and Reflectance Estimation from Multi-View Images
A novel relighting approach that does not assume that the illumination is known or controllable. Instead, we estimate the illumination and texture from multi-view images captured under a single illumination setting, given the object shape.
![]()
J Cocatre-Zilgien, F Delcomyn, Z Ding, J Hart, G Kremesec, L Lu, M Nelson, J Reichler, K Tan, Design and implementation of a pneumatic, six-legged robot with the geometry, number of leg joints, and joint functionality modeled after the American Cockroach
Videoshop: A New Framework for Video Editing in Gradient Domain
Read more...
![]()
J Chuang, Y Hwang, R Ruff
Given a mobile object required to move from a source location/orientation to a destination location/orientation, compute a path that it can follow and the orientation and velocity values it must assume along the path to efficiently and smoothly move from the source to the destination.
Read more...
Sketch-Based Object Selection in Images
| To assist humans in referring to specific parts of an image, and performing desired operations on these parts, through natural-like interpersonal communication, e.g. by freely drawing sketches over the image which mean specific editorial operations such as move, expand and delete. |
| GIST (Gesture Interpretation using Spatio-Temporal analysis) project is an attempt to recognize and interpret sign gestures of American Sign Language from a video sequence based on an integrated method of motion segmentation, shape, size and color. A multi-scale motion segmentation based on Ahuja’s New Transform is applied to a video sequence to get motion regions and their correspondence across frames. Regions of interest, such as fingertip, palm and elbow, are extracted from motion segmented images by formulating and solving a constraint satisfaction problem. From these joints, pixel trajectories are extracted. A spatio-temporal analysis based on time-delay neural network is applied to classify these patterns. The ultimate goal of GIST is to allow content-based video retrieval based on video clips and better understanding of motion segmentation. |
| We present a probabilistic method to detect human faces using a mixture of factor analyzers. One characteristic of this mixture model is that it concurrently performs clustering and, within each cluster, local dimensionality reduction. A wide range of face images that consists of faces in different poses, faces in different expressions and faces under different lighting conditions is used as the training set to capture the variations of human faces. In order to fit the mixture model to the sample face images, the parameters are estimated using an EM algorithm. Experimental results show that faces in different poses, with facial expressions, and under different lighting conditions are detected by our method. |
![]()
N Bridwell, J Chuang, F Kishino, Y Kitamura, H Takemura, R Yen, R Chien
Given a set of objects moving in a known fashion and a set of still obstacles, detect or predict collisions between specific pairs of objects.
Efficient Algorithms and Architectures
![]()
A Choudhary, S Das, C Debrunner, J Patel, M Sharma, S Swamy
To develop computationally efficient, e.g., divide-and-conquer or DSP chip oriented, algorithms for different classes of computer vision algorithms, and to define special purpose, e.g., parallel multiprocessor, architectures that efficiently execute the algorithms
Read more...
![]()
M Aggarwal
To improve the Throughput of Flexible-Precision DSPs via Algorithm Transformation
Learning to Recognize 3D Objects
![]() |
A learning account for the problem of object recognition is developed within the PAC (Probably Approximately Correct) model of learnability. The key assumption underlying this work is that objects can be recognized (or, discriminated) using simple representations in terms of “syntactically” simple relations over the raw image. Although the potential number of these simple relations could be huge, only a few of them are actually present in each observed image and a fairly small number of those observed is relevant to discriminating an object. We show that these properties can be exploited to yield an efficient learning approach in terms of sample and computational complexity, within the PAC model. No assumptions are needed on the distribution of the observed objects and the learning performance is quantified relative to its past experience. Most importantly, the success of learning an object representation is naturally tied to the ability to represent it as a function of some intermediate representations extracted from the image.We evaluate this approach in a large scale experimental study in which the SNoW learning architecture is used to learn representations for the 100 objects in the Columbia Object Image Database (COIL-100). Experimental results exhibit very good generalization and robustness properties of the SNoW-based method elative to other approaches. SNoW’s recognition rate degrades more gracefully when the training data contains fewer views and it shows similar behavior also in some preliminary experiments with partially occluded objects. |
Out-of-Core Tensor Approximation of Multidimensional Matrices of Visual Data
An algorithm for memory (core) efficient tensor approximation that obtains a compact representation of multidimensional visual data for efficient image-based rendering. The algorithm manages with a small memory size. We apply it to 6D Bidirectional Texture Functions (BTFs), 7D Dynamic BTFs and 4D temporal volume sequences.
| To develop methods to tell the identity of a person from a frontal image and evaluate its performance with state-of-the-art methods |
Predictive Multiple Description Coding using Wyner-Ziv Codes

Two-channel predictive multiple description coding is posed as a variant of the Wyner-Ziv coding problem. Practical code constructions are proposed within this framework, and the performance of the proposed codes is compared with conventional approaches, for communication of a first-order Gauss-Markov source over erasure channels with independent failure probabilities.
Learning for Object Recognition
| A learning algorithm accounting for the problem of object recognition is developed within the PAC (Probably Approximately Correct) model of learnability. We evaluate this apporach using the COIL-100 database and exhibit its advantages over conventional methods. |
Learning of Low-level Spatiotemporal Structural Patterns
| Given an image or a video sequence, a prespecified set of low level, spatial and/or temporal descriptors of the image/video structure, and a higher level interpretation of the structure, use computational learning methods to derive a succinct relationship between the interpretation and the low level structural description. |
Image Ensembles/ Video analysis Using Image-As-Matrix Representation
![]() |
The goal of this project is to explore new algorithms based on multilinear algebra for representation of multidimensional data in computer vision. |
Facial Expression Decomposition
![]() |
New algorithms for facial image analysis based on multilinear algebra. We learn the expression subspace and person subspace from a corpus of images based on Higher-Order Singular Value Decomposition, and investigate their applications in facial expression synthesis, face recognition and facial expression recognition. |
A New Omni-directional Stereo Vision System Using Single Camera
We describe a new omnidirectional stereo imaging system that uses a concave lens and a convex mirror to produce a stereo pair of images on the sensor of a conventional camera. The light incident from a scene point is split and directed to the camera in two parts. One part reaches camera directly after reflection from the convex mirror and forms a single-viewpoint omnidirectional image. The second part is formed by passing a subbeam of the reflected light from the mirror through a concave lens and forms a displaced single viewpoint image where the disparity depends on the depth of the scene point. A closed-form expression for depth is derived. Since the optical components used are simple and commercially available, the resulting system is compact and inexpensive. This, and the simplicity of the required image processing algorithms, make the proposed system attractive for real-time applications, such as autonomous navigation and object manipulation. The experimental prototype we have built is described.
Read more...
![]()
In developing the new opto-geometric configurations, we have found that certain classical models and approaches cease to be adequate. For example, the long-established Gaussian model of image formation fails to adequately predict the acquired images, and the optical and geometric phenomena ignored in the traditional characterization of the most focused scene point make the traditional methods of focus analysis unacceptable. We have the old models with new, more rigorous, and satisfactory models. These new models are also useful in contexts other than next generation camera designs – they are useful in improving the performance of currently “acceptable” systems, and in extending the applicability of computer vision methods to many scenarios and applications which were out of reach otherwise.

We have developed a camera which is capable of acquiring very large field of view (FOV) images at high and uniform resolution, from a single viewpoint, at video rates. The FOV can range from being nearly hemispherical, to being nearly omni-directional, barring some small scene parts being obstructed by image sensors themselves. The camera consists of multiple imaging sensors and a hexagonal prism made of planar mirror faces. Each sensor is paired with a planar face of the prism. The sensors are positioned in such a way that they image different parts of the scene from a single virtual viewpoint, either directly or after reflections off the prism. A panoramic image is constructed by concatenating the images taken by different sensors. The resolution of the panoramic image is proportional to the number of sensors used and therefore a multiple of that of an individual sensor. Further, the resolution is substantially uniform across the entire panoramic image.
We propose a novel depth sensing imaging system composed of a single camera along with a parallel planar plate rotating about the optical axis of the camera. Compared with conventional stereo systems, only one camera is utilized to capture stereo pairs, which can improve the accuracy of correspondence detection as is the case for any single camera stereo systems. The proposed system is able to capture multiple images by simply rotating the plate. With multiple stereo pairs, it is possible to obtain precise depth estimates, without encountering matching ambiguity problems, even for objects with low texture. Given the large number of resulting images, in conjunction with the estimated depth map, we show that the proposed system is also capable of acquiring super-resolution images. Finally, experimental results on reconstructing 3D structures and recovering high-resolution textures are presented.
Multiview Double Mirror Pyramid Panoramic Cameras
Panoramic images and video are useful in many applications such as special effects, immersive virtual reality environments, and video games. Among the numerous devices proposed for capturing panoramas, mirror pyramid-based camera systems are a promising approach for video rate capture, as they offer single-viewpoint imaging, and use only flat mirrors that are easier to produce than curved mirrors. Past work has focused on capturing panoramas from a single viewpoint.
In this work, we have extended our work on the Double Mirror Pyramid Panoramic Camera, that acquires panoramic images from a single viewpoint, to multiple viewpoints.
High-Resolution Double Pyramid Panoramic Cameras

High-resolution panoramic capture is highly desirable in many applications such as immersive virtual environments, tele-conferencing, surveillance, and robot navigation. In addition, a single viewpoint for all viewing directions, a large depth-of-field (omni-focus), and real-time acquisition are desired in some imaging applications (e.g. 3D reconstruction and rendering). The FOV of a conventional camera is limited by the size of its sensor and the focal length of its lens. For example, a typical 16mm lens with 2/3″ CCD sensor has a
FOV. The number of pixels on the sensor (640 x 480 for NTSC camera) determines the resolution. The depth-of-field is limited and is determined by various imaging parameters such as aperture, focal length, and the scene location of the object.
Many approaches have been presented to achieve various subsets of these properties: wide FOV, high resolution, large depth-of-field, a single viewpoint, and real-time acquisition. Among these, mirror-pyramid (MP)-based camera systems offer a promising approach to capturing high-resolution,
Read more...
![]()
Standard imaging sensors have limited dynamic range and hence are sensitive to only a part of the illumination range present in a natural scene. The dynamic range can be improved by acquiring multiple images of the same scene under different exposure settings and then combining them. We have developed a multi-sensor camera design, called Split-Aperture Camera, to acquire registered, multiple images of a scene, at different exposure, from a single viewpoint, and at video-rate. The resulting multiple exposure images are then used to construct a high dynamic range image.
There are three main steps to composing the high dynamic range image. First, we transform the recorded intensities by each sensor into the actual sensor irradiance values. This mapping can be obtained using radiometric calibration techniques applicable to normal cameras. Second, since the irradiance at corresponding points on different sensors can be different, we need a correction factor to represent a scene point by a unique value independent of the sensor where it gets imaged. This factor is spatially variant and it is different for different sensors. The third and the last step is fusing the intensity transformed images into a single high dynamic range mosaic. For every pixel on a canvas (an empty image of same dimensions as any of the sensors), we have a set of transformed intensity values one from each of the images. We discard the values from images in which those locations were either saturated or clipped. Since, the values not discarded may be noisy, we combine them to obtain the final value.
Read more...
Omnifocus Imaging Using Graph Cuts

We discuss how to generate omnifocus images from a sequence of different focal setting images. We first show that the existing focus measures would encounter difficulty when detecting which frame is most focused for pixels in the regions between intensity edges and uniform areas. Then we propose a new focus measure that could be used to handle this problem. In addition, after computing focus measures for every pixel in all images, we construct a three dimensional (3D) node-capacitated graph and apply a graph cut based optimization method to estimate a spatio-focus surface that minimizes the summation of the new focus measure values on this surface. An omnifocus image can be directly generated from this minimal spatio-focus surface. Experimental results with simulated and real scenes are provided.
Panoramic Imaging with Infinite Dynamic Range
![]()
![]()
Most imaging sensors have a limited dynamic range and hence can satisfactorily respond to only a part of illumination levels present in a scene. This is particularly disadvantageous for omnidirectional and panoramic cameras since larger fields of view have larger brightness ranges. We propose a simple modification to existing high resolution omnidirectional/panoramic cameras in which the process of increasing the dynamic range is coupled with the process of increasing the field of view. This is achieved by placing a graded transparency(mask) in front of the sensor which allows every scene point to be imaged under multiple exposure settings as the camera pans, a process anyway required to capture large fields of view at high resolution. The sequence of images are then mosaiced to construct a high resolution,high dynamic range panoramic/omnidirectional image.Our method is robust to alignment errors between the mask and the sensor grid and does not require the mask to be placed on the sensing surface. We have designed a panoramic camera with the proposed modifications and have discussed various theoretical and practical issues encountered in obtaining a robust design. We show with an example of high resolution, high dynamic range panoramic image obtained from the camera we designed.
Near Omnifocused Imaging of Scenes with limited motion
Nicam achieves omnifocus panoramic imaging only for static scenes since the camera pans across the visual field. As panning takes time, successive images of the same moving object are from different effective viewpoints. Fusion of these images for omnifocus leads to registration problems. A straightforward extension to image dynamic scenes in omnifocus would require elimination of panning. We have developed an intermediate solution, which retains panning but yields objects imaged in less than perfect focus.
Recall that the mosaicing the set of images taken by Nicam as it pans requires correspondence of a scene point across the set. The presence of moving objects upsets the correspondence between images in the sequence, resulting in a distorted appearance of the moving objects in the final mosaic (see part (a) of the figure). We avoid these artifacts and create large depth of field mosaics of scenes with moving objects. The basic idea is to combine the sequence of images in a manner such
Read more...
Omnifocus Nonfrontal Imaging Camera
![]()
The concept of omnifocus nonfrontal imaging camera, OMNICAM or NICAM, initiated a new chapter in imaging and digital cameras. NICAM has introduced hitherto nonexistent imaging capabilities, in addition to overcoming some problems with previous methods. NICAM is capable of acquiring seamless panoramic images and range estimates of wide scenes with all objects in focus, regardless of their locations. To understand the impact of NICAM, first consider imaging with conventional cameras.The camera’s field of view is generally much smaller than the entire visual field of interest. Consequently, the camera must pan across the scene of interest, focus on a part at a time, and acquire an image of each part. All the resulting images together then capture the complete scene. As byproduct of focusing, the range of the objects in the scene can also be estimated. Usual methods for focusing as well as range estimation from focusing mechanically relocate the sensor plane, thereby varying the focus distance setting in the camera. When a scene point appears in sharp focus, the corresponding depth and focus distance values satisfy the lens law. The depth for the scene point can then be calculated length and the focus distance.




