It is time for us to start "look at the data" to solve our problems, says one of the world's leading experts in data science.
In 2006, the then head of the Department of Computer Sciences of the Carnegie Mellon University (EE.UU.), Jeannette Wing, published an influential essay entitled Computational Thought, in which he defended that we would all benefit from the use of the conceptual tools of computer science to solve problems in all areas of human activity.
Wing herself had no intention of studying computer science.In the mid -1970s, he enrolled in the MIT (USA.UU.) To devote himself to electrical engineering, inspired by his father who was a professor in that field.When he discovered his interest in computer science, Wing called his father to ask him if it was a passing fad.At that time there were not even textbooks on the subject.His father assured him that it wasn't.Wing changed race and never looked back.
Which was also a corporate vice president of Microsoft Research and current executive vice president for research at Columbia (USA.UU.), now leads the promotion of data science in multiple disciplines.
Anil Ananthaswamy recently spoke with Wing about his ambitious plan to promote "reliable artificial intelligence (AI), one of the 10 research challenges that has set his attempt to ensure that artificial intelligence systems are more fair and less biased.
Do you consider that there is a transformation in how computing is made today?
Absolutely.Moore's law took us very far.We knew we were going to touch the roof with her, and that's why another parallel computing became relevant.But cloud computing was a phase change.The first distributed file systems were a kind of cloud computing birth, since the files were not on the computer, but elsewhere on the server.Cloud computing is based on that and amplifies it even more: the data is not even close to the computer and computing is done far from us.
The next change had to do with the data.For a long time, we look at the cycles for things to work faster: in processors, CPU, GPU and other parallel servers.We ignored the part of the data.Now we have to look.
That is the field of data science.How would I define it?What are the challenges of using the data?
I have a very concise definition.Data science is the study of extracting data value.
You cannot simply give me a lot of unprocessed data so that I press a button and remove your value.It all starts with compilation, processing, storage, management, analysis and visualization of the data and then follows the interpretation of the results.I call it the data life cycle.Every step of that cycle is a lot of work.
By using Big Data, concerns of privacy, security, equity and bias often arise.How are these problems addressed, especially in AI?
I have a new research plan that I am promoting.I call it reliable, for the decades of progress that we achieve in reliable computing.Regarding reliability, we generally refer to security, reliability, availability, privacy and functionality.During the last two decades, we have advanced a lot.We have concrete methods that can ensure the accuracy of a code fragment;We have security protocols that increase the security of a specific system.And we have certain notions of privacy that have been regularized.
Reliable AI goes further and does so in two ways.Suddenly, we talk about resistance and equity.The resistance that refers to the fact that, if the input is disturbed, the output is not disturbed so much.And we talk about interpretability.When it came to computing, we never thought about these things before.
[In addition,] AI systems are probabilistic.The computer systems of the past are basically deterministic machines: they are turned on or off, true or false, yes or no, zeros or some.The results of our AI systems are basically likely.If I tell you that your radiography indicates that you have cancer, it is, for example, a probability of 0.75 that that little white spot I saw is evil.
Therefore, at present, we have to live in this world of probabilities.From a mathematical point of view, probabilistic logic and a lot of statistics and stochastic reasoning, etc..A computer scientist is not formed to think that way.Artificial intelligence systems have really complicated our formal reasoning about these systems.
Reliable AI is one of the 10 research challenges that you have determined for data scientists.Causality seems to be another big problem.
I think causality is the next border for AI and automatic learning.At the moment, algorithms and automatic learning models are good to find patterns, correlations and associations.But they can't tell us: did this cause that?Or if I did this, what would happen?Therefore, there is another entire area of activity on causal inference and causal reasoning in computing science.Statistics experts have been analyzing causality.Sometimes they get a little angry with the computer community for thinking that "Oh, this idea is new".So I want to give credit to statistics specialists for their fundamental contributions to causality.The combination of Big Data and causal reasoning can really create advances in the field.
Does data science get excited about?
Everyone is going crazy with data science, because they see that their fields are transformed by the use of data science methods in the digital data that generate, produce, collect, and so on.It is a moment that creates a lot of illusion.