Category Archives: Internet

When the computer science is effective data storage

In recent years, the massive generation of data coupled with frequent storage failures has increased the popularity of distributed storage systems such as Dropbox, Google Drive or Microsoft OneDrive, which allow data to be replicated in different, geographically dispersed, storage devices. A significant advancement in this field has been achieved through the recently concluded Marie-Curie Intra European Fellow (MC-IEF) project ATOMICDFS, conducted in the premises of IMDEA Networks Institute. The project has been led by Dr. Antonio Fernández Anta, Research Professor at the Institute, as the Principal Investigator, and Dr. Nicolas Nicolaou, as the Marie-Curie Fellow.

Due to the dissemination of data in multiple hosts, one of the major problems that distributed storage systems face is maintaining the consistency of data when they are accessed concurrently by multiple operations. In more simple terms, a scenario to resolve could be: what value should a reader in Australia retrieve when a writer concurrently changes the value in Spain? Conventional Distributed Storage Systems fail to provide strong consistency guarantees in such instances, due to the high cost that consistent operations inflict in the system. The algorithms developed by ATOMICDFS provide the means of minimizing such a cost, demonstrating that consistent storage systems can be practical. In addition, the project proposes solutions to allow the manipulation of large shared objects (such as files).

ATOMICDFS makes a big step towards a new generation of highly reliable, highly consistent, highly collaborative, practical, and global, distributed storage systems, and a small, albeit decided, step towards a future global computing platform. With this project IMDEA Networks places Europe amongst the worldwide leaders in this research area.

Building Highly Consistent Distributed File Systems

One of the key ideas developed in ATOMICDFS is the notion of ‘coverability’. On top of atomic guarantees, coverability defines the exact properties that version-dependent objects (such as files) must possess in a highly concurrent environment. For example, once a file is written whilst on storage, no subsequent operation may write an older version of the same file. To improve the speed of the operations on the storage, the research team focused on improving the communication as well as the computation costs inflicted by each operation. The new algorithms manage to match the optimal communication performance while at the same time they reduce the computation cost by an exponential factor. Simulations of the proposed algorithms clearly illustrate the performance gains of the new algorithms over previously proposed approaches.

Another factor that the team needed to investigate for improved operation latency was the reduction of the size of each message exchanged on the network. To reduce the message costs, ATOMICDFS introduced two file manipulation techniques. Firstly, they proposed a simple division of the file into data blocks and secondly, the use of a journal (log) of file operations. These techniques allowed operations to be applied on parts of the files instead of on the file object as a whole, and thus enabled faster operations without compromising consistency.

Visualization technology developed at University

Anders Ynnerman, professor of scientific visualization at Linköping University and director of Visualization Center C, together with colleagues from Linköping University, Interspectral AB, the Interactive Institute Swedish ICT, and the British Museum, describes in the article the technology behind the visualization.

The Geberlein Man, who was mummified by natural processes, and the collaboration with the British Museum constitute the framework for the article, which focusses on the development of the technology used in the visualization table, which has received a great deal of attention.

“It was challenging to obtain sufficiently high performance of the visualization such that visitors can interact with the table in real-time, without experiencing delays. Further, the interaction must be both intuitive and informative,” says Anders Ynnerman.

Several thousand images of the mummy taken by computer tomography (CT) are stored in the table. In this case, 10,000 virtual slices through the complete mummy have been imaged, each one as thin as 0.3 mm. Rapid graphics processors can then create volumetric images, 3D images, in real-time to display what the visitors want to look at.

The degree of reflection and absorption of the X-rays by the mummy is recorded by the CT scanner and converted with the aid of a specially developed transfer function to different colours and degrees of transparency. Bone, for example, gives a signal that is converted to a light grey colour while soft tissue and metal objects give completely different signals that are represented by other colours or structures

“The table displays 60 images per second, which our brain interprets as continuous motion. Sixty times each second, virtual beams, one for each pixel on the screen, are projected through the dataset and a colour contribution for each is determined. We use the latest type of graphics processor, the type that is used in gaming computers,” says Patric Ljung, senior lecturer in immersive visualization at Linköping University.

This makes it possible for visitors to interact with the table. The desiccated skin of the mummy can be peeled away in the image and only the parts that consist of bone displayed. When this is done, it becomes clear that the Gebelein Man was killed by a stab through the shoulder.

The principles that have determined the design of the table are also described in the article. The design arose in close collaboration between the personnel at the museum and Interactive Institute Swedish ICT, working within the framework of Visualization Center C in Norrköping.

The design is minimalist and intuitive. The display table must be rapid, and no delay in the image can be tolerated. It must be able to withstand use by the six million visitors to the museum each year, and much emphasis has been placed on creating brief narrative texts with the aid of information points. Simple and self-explanatory icons have been used, and several favourable viewpoints and parameters have been preprogrammed in order to increase the table’s robustness.

“Allowing a broader public to visualize scientific phenomena and results makes it possible for them to act as researchers themselves. We allow visitors to investigate the same data that the researchers have used. This creates incredible possibilities for new ways to communicate knowledge, to stimulate interest, and to engage others. It’s an awesome experience — watching the next generation of young researchers be inspired by our technology,” says Anders Ynnerman.

Links data scattered across files

The age of big data has seen a host of new techniques for analyzing large data sets. But before any of those techniques can be applied, the target data has to be aggregated, organized, and cleaned up.

That turns out to be a shockingly time-consuming task. In a 2016 survey, 80 data scientists told the company CrowdFlower that, on average, they spent 80 percent of their time collecting and organizing data and only 20 percent analyzing it.

An international team of computer scientists hopes to change that, with a new system called Data Civilizer, which automatically finds connections among many different data tables and allows users to perform database-style queries across all of them. The results of the queries can then be saved as new, orderly data sets that may draw information from dozens or even thousands of different tables.

“Modern organizations have many thousands of data sets spread across files, spreadsheets, databases, data lakes, and other software systems,” says Sam Madden, an MIT professor of electrical engineering and computer science and faculty director of MIT’s bigdata@CSAIL initiative. “Civilizer helps analysts in these organizations quickly find data sets that contain information that is relevant to them and, more importantly, combine related data sets together to create new, unified data sets that consolidate data of interest for some analysis.”

The researchers presented their system last week at the Conference on Innovative Data Systems Research. The lead authors on the paper are Dong Deng and Raul Castro Fernandez, both postdocs at MIT’s Computer Science and Artificial Intelligence Laboratory; Madden is one of the senior authors. They’re joined by six other researchers from Technical University of Berlin, Nanyang Technological University, the University of Waterloo, and the Qatar Computing Research Institute. Although he’s not a co-author, MIT adjunct professor of electrical engineering and computer science Michael Stonebraker, who in 2014 won the Turing Award — the highest honor in computer science — contributed to the work as well.

Pairs and permutations

Data Civilizer assumes that the data it’s consolidating is arranged in tables. As Madden explains, in the database community, there’s a sizable literature on automatically converting data to tabular form, so that wasn’t the focus of the new research. Similarly, while the prototype of the system can extract tabular data from several different types of files, getting it to work with every conceivable spreadsheet or database program was not the researchers’ immediate priority. “That part is engineering,” Madden says.

The system begins by analyzing every column of every table at its disposal. First, it produces a statistical summary of the data in each column. For numerical data, that might include a distribution of the frequency with which different values occur; the range of values; and the “cardinality” of the values, or the number of different values the column contains. For textual data, a summary would include a list of the most frequently occurring words in the column and the number of different words. Data Civilizer also keeps a master index of every word occurring in every table and the tables that contain it.

Then the system compares all of the column summaries against each other, identifying pairs of columns that appear to have commonalities — similar data ranges, similar sets of words, and the like. It assigns every pair of columns a similarity score and, on that basis, produces a map, rather like a network diagram, that traces out the connections between individual columns and between the tables that contain them.

The invisible chaos of superluminous supernovae

Sightings of a rare breed of superluminous supernovae — stellar explosions that shine 10 to 100 times brighter than normal — are perplexing astronomers. First spotted only in last decade, scientists are confounded by the extraordinary brightness of these events and their explosion mechanisms.

To better understand the physical conditions that create superluminious supernova, astrophysicists are running two-dimensional (2D) simulations of these events using supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) and the Lawrence Berkeley National Laboratory (Berkeley Lab) developed CASTRO code.

“This is the first time that anyone has simulated superluminous supernovae in 2D; previous studies have only modeled these events in 1D,” says Ken Chen, an astrophysicist at the National Astronomical Observatory of Japan. “By modeling the star in 2D we can capture detailed information about fluid instability and mixing that you don’t get in 1D simulations. These details are important to accurately depict the mechanisms that cause the event to be superluminous and explain their corresponding observational signatures such as light curves and spectra.”

Chen is the lead author of an Astrophysical Journal paper published in December 2016. He notes that one of the leading theories in astronomy posits that superluminous supernovae are powered by highly magnetized neutron stars, called magnetars.

How a star lives and dies depends on its mass — the more massive a star, the more gravity it wields. All stars begin their lives fusing hydrogen into helium; the energy released by this process supports the star against the crushing weight of its gravity. If a star is particularly massive it will continue to fuse helium into heavier elements like oxygen and carbon, and so on, until its core turns to nickel and iron. At this point fusion no longer releases energy and electron degeneracy pressure kicks-in and supports the star against gravitational collapse. When the core of the star exceeds its Chandrasekhar mass — approximately 1.5 solar masses — electron degeneracy no longer supports the star. At this point, the core collapses, producing neutrinos that blow up the star and create a supernova.

This iron core-collapse occurs with such extreme force that it breaks apart nickel and iron atoms, leaving behind a chaotic stew of charged particles. In this frenzied environment negatively charged electrons are shoved into positively charged positrons to create neutral neutrons. Because neutrons now make up the bulk of this core, it’s called a neutron star. A magnetar is essentially a type of neutron star with an extremely powerful magnetic field.

In addition to being insanely dense — a sugar-cube-sized amount of material from a neutron star would weigh more than 1 billion tons — it is also spinning up to a few hundred times per second. The combination of this rapid rotation, density and complicated physics in the core creates some extreme magnetic fields. The magnetic field can take out the rotational energy of a neutron star and turn this energy into energetic radiation. Some researchers believe this radiation can power a superluminous supernova. These are precisely the conditions that Chen and his colleagues are trying to understand with their simulations.

“By doing a more realistic 2D simulation of superluminous supernovae powered by magnetars, we are hoping to get a more quantitative understanding about its properties,” says Chen. “So far, astronomers have spotted less than 10 of these events; as we find more we’ll be able to see if they have consistent properties. If they do and we understand why, we’ll be able to use them as standard candles to measure distance in the Universe.”

He also notes that because stars this massive may easily form in the early cosmos, they could provide some insights into the conditions of the distant Universe.

The potential of metal grids for electronic components

Nanometer-scale magnetic perforated grids could create new possibilities for Computing. Together with international colleagues, scientists from the Helmholtz Zentrum Dresden-Rossendorf (HZDR) have shown how a cobalt grid can be reliably programmed at room temperature. In addition they discovered that for every hole (“antidot”) three magnetic states can be configured. The results have been published in the journal “Scientific Reports.”

Physicist Dr. Rantej Bali from the HZDR, together with scientists from Singapore and Australia, designed a special grid structure in a thin layer of cobalt in order to program its magnetic properties. His colleagues from the National University in Singapore produced the grid using a photolithographic process similar to that currently used in chip manufacture. Approximately 250 nanometers sized holes, so-called antidots, were created at regular intervals — with interspaces of only 150 nanometers — in the cobalt layer. In order to be able to stably program it, the Singapore experts followed the Dresden design, which specified a metal layer thickness of approximately 50 nanometers.

At these dimensions the cobalt antidot grid displayed interesting properties: Dr. Bali’s team discovered that with the aid of an externally applied magnetic field three distinct magnetic states around each hole could be configured. The scientists called these states “G,” “C” and “Q.” Dr. Bali: “Antidots are now in the international research spotlight. By optimizing the antidot geometry we were able to show that the spins, or the magnetic moments of the electrons, could be reliably programmed around the holes.”

Building blocks for future logic

Since the individually programmable holes are situated in a magnetic metal layer, the grid geometry has potential use in computers that would work with spin-waves instead of electric current. “Spin-waves are similar to the so-called Mexican waves you see in a football stadium. The wave propagates through the stadium, but the individual fans, in our case the electrons, stay seated,” explains Dr. Bali. Logic chips utilizing such spin-waves would use far less power than today’s processors, because no electrical current is involved.

Predicted to win Super Bowl

“Increasingly, we are seeing NFL coaches and executives embracing analytics to improve their overall knowledge of the game and give them data-driven competitive advantages over their opponents. I believe this study is yet another step in that direction,” said Konstantinos Pelechrinis, an associate professor in Pitt’s School of Information Sciences.

Pelechrinis’s study, published in PLOS, analyzed 1,869 regular and postseason games from 2009 to 2015. Through in-depth analysis, he identified key in-game factors — turnover differential and penalty yardage, among others — that directly correlate with winning probability. The analysis found that committing one fewer turnover than the opposition presented a 20 percent gain in winning probability. A 10-yard advantage in penalty yardage correlated to a 5 percent difference.

He then used a probability model to create a Football Prediction Matchup (FPM) engine to compare teams. Pelechrinis compared the Patriots’ and the Falcons’ performances in those key in-game factors during the 2016 regular season. Finally, Pelechrinis ran 10,000 simulations of the game in order to draw his conclusion: The Atlanta Falcons have a 54 percent probability of prevailing in Super Bowl 51.

“I believe both die-hard football fans and casual viewers will be in for an exciting game this Sunday. The Patriots and the Falcons are two dynamic, high-scoring football teams that perform extraordinarily well in the key areas of the game that most impact winning,” said Pelechrinis. “However, we are confident that it will be the Atlanta Falcons walking away with that franchise’s first Vince Lombardi Trophy.”

When Pelechrinis ran his model on the 2017 NFL Playoffs, the FPM had an accuracy rate of 90 percent. Pelechrinis said the system can reliably foretell the outcomes of all NFL games with an accuracy of 63 percent. This rate is comparable to existing state-of-the-art prediction systems and outperforms expert NFL analysts more than 60 percent of the time.

In addition to predicting upcoming game matchups, an expanded version of the study explored strategic on-field decision-making. Most notably, it found that coaches are overly conservative in key situations such as fourth-down conversions and point-after-touchdown options, which reduces the team’s winning probability percentage. Pelechrinis points to fourth-down conversion attempts when deep within an opponent’s territory as a prime example of coaches being too cautious with their in-game decisions.

“When faced with, let’s say, a fourth-and-1 from an opponent’s 25-yard line, conventional football wisdom says a field-goal attempt — potentially resulting in three points — would be a coach’s best option. The research shows that continuing to pursue a touchdown — eventually resulting in six to eight points — would be best for maximizing this scoring opportunity as well as the overall goal of winning the game,” Pelechrinis said.