Monthly Archives: September 2016

Visualization technology developed at University

Anders Ynnerman, professor of scientific visualization at Linköping University and director of Visualization Center C, together with colleagues from Linköping University, Interspectral AB, the Interactive Institute Swedish ICT, and the British Museum, describes in the article the technology behind the visualization.

The Geberlein Man, who was mummified by natural processes, and the collaboration with the British Museum constitute the framework for the article, which focusses on the development of the technology used in the visualization table, which has received a great deal of attention.

“It was challenging to obtain sufficiently high performance of the visualization such that visitors can interact with the table in real-time, without experiencing delays. Further, the interaction must be both intuitive and informative,” says Anders Ynnerman.

Several thousand images of the mummy taken by computer tomography (CT) are stored in the table. In this case, 10,000 virtual slices through the complete mummy have been imaged, each one as thin as 0.3 mm. Rapid graphics processors can then create volumetric images, 3D images, in real-time to display what the visitors want to look at.

The degree of reflection and absorption of the X-rays by the mummy is recorded by the CT scanner and converted with the aid of a specially developed transfer function to different colours and degrees of transparency. Bone, for example, gives a signal that is converted to a light grey colour while soft tissue and metal objects give completely different signals that are represented by other colours or structures

“The table displays 60 images per second, which our brain interprets as continuous motion. Sixty times each second, virtual beams, one for each pixel on the screen, are projected through the dataset and a colour contribution for each is determined. We use the latest type of graphics processor, the type that is used in gaming computers,” says Patric Ljung, senior lecturer in immersive visualization at Linköping University.

This makes it possible for visitors to interact with the table. The desiccated skin of the mummy can be peeled away in the image and only the parts that consist of bone displayed. When this is done, it becomes clear that the Gebelein Man was killed by a stab through the shoulder.

The principles that have determined the design of the table are also described in the article. The design arose in close collaboration between the personnel at the museum and Interactive Institute Swedish ICT, working within the framework of Visualization Center C in Norrköping.

The design is minimalist and intuitive. The display table must be rapid, and no delay in the image can be tolerated. It must be able to withstand use by the six million visitors to the museum each year, and much emphasis has been placed on creating brief narrative texts with the aid of information points. Simple and self-explanatory icons have been used, and several favourable viewpoints and parameters have been preprogrammed in order to increase the table’s robustness.

“Allowing a broader public to visualize scientific phenomena and results makes it possible for them to act as researchers themselves. We allow visitors to investigate the same data that the researchers have used. This creates incredible possibilities for new ways to communicate knowledge, to stimulate interest, and to engage others. It’s an awesome experience — watching the next generation of young researchers be inspired by our technology,” says Anders Ynnerman.

Links data scattered across files

The age of big data has seen a host of new techniques for analyzing large data sets. But before any of those techniques can be applied, the target data has to be aggregated, organized, and cleaned up.

That turns out to be a shockingly time-consuming task. In a 2016 survey, 80 data scientists told the company CrowdFlower that, on average, they spent 80 percent of their time collecting and organizing data and only 20 percent analyzing it.

An international team of computer scientists hopes to change that, with a new system called Data Civilizer, which automatically finds connections among many different data tables and allows users to perform database-style queries across all of them. The results of the queries can then be saved as new, orderly data sets that may draw information from dozens or even thousands of different tables.

“Modern organizations have many thousands of data sets spread across files, spreadsheets, databases, data lakes, and other software systems,” says Sam Madden, an MIT professor of electrical engineering and computer science and faculty director of MIT’s bigdata@CSAIL initiative. “Civilizer helps analysts in these organizations quickly find data sets that contain information that is relevant to them and, more importantly, combine related data sets together to create new, unified data sets that consolidate data of interest for some analysis.”

The researchers presented their system last week at the Conference on Innovative Data Systems Research. The lead authors on the paper are Dong Deng and Raul Castro Fernandez, both postdocs at MIT’s Computer Science and Artificial Intelligence Laboratory; Madden is one of the senior authors. They’re joined by six other researchers from Technical University of Berlin, Nanyang Technological University, the University of Waterloo, and the Qatar Computing Research Institute. Although he’s not a co-author, MIT adjunct professor of electrical engineering and computer science Michael Stonebraker, who in 2014 won the Turing Award — the highest honor in computer science — contributed to the work as well.

Pairs and permutations

Data Civilizer assumes that the data it’s consolidating is arranged in tables. As Madden explains, in the database community, there’s a sizable literature on automatically converting data to tabular form, so that wasn’t the focus of the new research. Similarly, while the prototype of the system can extract tabular data from several different types of files, getting it to work with every conceivable spreadsheet or database program was not the researchers’ immediate priority. “That part is engineering,” Madden says.

The system begins by analyzing every column of every table at its disposal. First, it produces a statistical summary of the data in each column. For numerical data, that might include a distribution of the frequency with which different values occur; the range of values; and the “cardinality” of the values, or the number of different values the column contains. For textual data, a summary would include a list of the most frequently occurring words in the column and the number of different words. Data Civilizer also keeps a master index of every word occurring in every table and the tables that contain it.

Then the system compares all of the column summaries against each other, identifying pairs of columns that appear to have commonalities — similar data ranges, similar sets of words, and the like. It assigns every pair of columns a similarity score and, on that basis, produces a map, rather like a network diagram, that traces out the connections between individual columns and between the tables that contain them.