Monthly Archives: October 2016

Differentiate between people with the same name

This conundrum occurs in a wide range of environments from the bibliographic — which Anna Hernandez authored a specific study? — to the law enforcement — which Robert Jones is attempting to board an airplane flight?

Two computer scientists from the School of Science at Indiana University-Purdue University Indianapolis and a Purdue University doctoral student have developed a novel-machine learning method to provide better solutions to this perplexing problem. They report that the new method is an improvement on currently existing approaches of name disambiguation because the IUPUI method works on streaming data that enables the identification of previously unencountered John Smiths, Maria Garcias, Wei Zhangs and Omar Alis.

Existing methods can disambiguate an individual only if the person’s records are present in machine-learning training data, whereas the new method can perform non-exhaustive classification so that it can detect the fact that a new record which appears in streaming data actually belongs to a fourth John Smith, even if the training data has records of only three different John Smiths. “Non-exhaustiveness” is a very important aspect for name disambiguation because training data can never be exhaustive, because it is impossible to include records of all living John Smiths.

“Bayesian Non-Exhaustive Classification — A Case Study: Online Name Disambiguation using Temporal Record Streams” by Baichuan Zhang, Murat Dundar and Mohammad al Hasan is published in Proceedings of the 25th International Conference on Information and Knowledge Management. Zhang is a Purdue graduate student. Dundar and Hasan are IUPUI associate professors of computer science and experts in machine learning.

“We looked at a problem applicable to scientific bibliographies using features like keywords, and co-authors, but our disambiguation work has many other real-life applications — in the security field, for example,” said Hasan, who led the study. “We can teach the computer to recognize names and disambiguate information accumulated from a variety of sources — Facebook, Twitter and blog posts, public records and other documents — by collecting features such as Facebook friends and keywords from people’s posts using the identical algorithm. Our proposed method is scalable and will be able to group records belonging to a unique person even if thousands of people have the same name, an extremely complicated task.

“Our innovative machine-learning model can perform name disambiguation in an online setting instantaneously and, importantly, in a non-exhaustive fashion,” Hasan said. ” Our method grows and changes when new persons appear, enabling us to recognize the ever-growing number of individuals whose records were not previously encountered. Also, some names are more common than others, so the number of individuals sharing that name grows faster than other names. While working in non-exhaustive setting, our model automatically detects such names and adjusts the model parameters accordingly.”

Machine learning employs algorithms — sets of steps — to train computers to classify records belonging to different classes. Algorithms are developed to review data, to learn patterns or features from the data, and to enable the computer to learn a model that encodes the relationship between patterns and classes so that future records can be correctly classified. In the new study, for a given name value, computers were “trained” by using records of different individuals with that name to build a model that distinguishes between individuals with that name, even individuals about whom information had not been included in the training data previously provided to the computer.

Making artificial and real cells talk

The classic Turing test evaluates a machine’s ability to mimic human behavior and intelligence. To pass, a computer must fool the tester into thinking it is human — typically through the use of questions and answers. But single-celled organisms can’t communicate with words. So this week in ACS Central Science, researchers demonstrate that certain artificial cells can pass a basic laboratory Turing test by “talking” chemically with living bacterial cells.

Sheref S. Mansy and colleagues proposed that artificial life would need to have the ability to interact seamlessly with real cells, and this could be evaluated in much the same way as a computer’s artificial intelligence is assessed. To demonstrate their concept, the researchers constructed nano-scale lipid vessels capable of “listening” to chemicals that bacteria give off. The artificial cells showed that they “heard” the natural cells by turning on genes that made them glow. These artificial cells could communicate with a variety of bacterial species, including V. fischeri, E. coli and P. aeruginosa. The authors note that more work must be done, however, because only one of these species engaged in a full cycle of listening and speaking in which the artificial cells sensed the molecules coming from the bacteria, and the bacteria could perceive the chemical signal sent in return.

Developing monitoring system for seniors

“When faced with problems of the elderly in our closest family, it is us who experience major stress, not them,” says Egidijus Kazanavicius, Professor at Kaunas University of Technology (KTU), Director at the Centre of Real Time Computer Systems. Kazanavicius is heading the team of researchers from KTU and Lithuanian University of Health Sciences (LSMU), who are developing the monitoring system for seniors: upon registering a fall of a person, the system sends a notification to the carers.

“Falls are the leading cause of death in the elderly population and are very common problem in geriatrics, symptomatic to a wide variety of health conditions. Besides causing physical injuries, falls lower person’s self-confidence to move independently, and are often a reason of various psychological problems,” says Dr Vita Lesauskaite, researcher at LSMU.

Collaborating, KTU and LSMU researchers created a prototype of a monitoring system for seniors GRIUTIS, consisting of a set of fixed sensors placed in premises, and of the software. When sensors register a change in a person’s behaviour or in his or her position, the alert is being sent to their family and / or carers.

The next step for the researchers is patenting of technologies and product commercialisation. It is planned that the senior monitoring system GRIUTIS will be used in geriatrics clinics as soon as the next year. Lithuanian Research Council has allocated funds for the realisation of the project.

When the computer science is effective data storage

In recent years, the massive generation of data coupled with frequent storage failures has increased the popularity of distributed storage systems such as Dropbox, Google Drive or Microsoft OneDrive, which allow data to be replicated in different, geographically dispersed, storage devices. A significant advancement in this field has been achieved through the recently concluded Marie-Curie Intra European Fellow (MC-IEF) project ATOMICDFS, conducted in the premises of IMDEA Networks Institute. The project has been led by Dr. Antonio Fernández Anta, Research Professor at the Institute, as the Principal Investigator, and Dr. Nicolas Nicolaou, as the Marie-Curie Fellow.

Due to the dissemination of data in multiple hosts, one of the major problems that distributed storage systems face is maintaining the consistency of data when they are accessed concurrently by multiple operations. In more simple terms, a scenario to resolve could be: what value should a reader in Australia retrieve when a writer concurrently changes the value in Spain? Conventional Distributed Storage Systems fail to provide strong consistency guarantees in such instances, due to the high cost that consistent operations inflict in the system. The algorithms developed by ATOMICDFS provide the means of minimizing such a cost, demonstrating that consistent storage systems can be practical. In addition, the project proposes solutions to allow the manipulation of large shared objects (such as files).

ATOMICDFS makes a big step towards a new generation of highly reliable, highly consistent, highly collaborative, practical, and global, distributed storage systems, and a small, albeit decided, step towards a future global computing platform. With this project IMDEA Networks places Europe amongst the worldwide leaders in this research area.

Building Highly Consistent Distributed File Systems

One of the key ideas developed in ATOMICDFS is the notion of ‘coverability’. On top of atomic guarantees, coverability defines the exact properties that version-dependent objects (such as files) must possess in a highly concurrent environment. For example, once a file is written whilst on storage, no subsequent operation may write an older version of the same file. To improve the speed of the operations on the storage, the research team focused on improving the communication as well as the computation costs inflicted by each operation. The new algorithms manage to match the optimal communication performance while at the same time they reduce the computation cost by an exponential factor. Simulations of the proposed algorithms clearly illustrate the performance gains of the new algorithms over previously proposed approaches.

Another factor that the team needed to investigate for improved operation latency was the reduction of the size of each message exchanged on the network. To reduce the message costs, ATOMICDFS introduced two file manipulation techniques. Firstly, they proposed a simple division of the file into data blocks and secondly, the use of a journal (log) of file operations. These techniques allowed operations to be applied on parts of the files instead of on the file object as a whole, and thus enabled faster operations without compromising consistency.