Text
Multivariate Statistical Machine Learning Methods for Genomic Prediction
Thanks to advances in digital technologies like electronic devices and networks, it is
possible to automatize and digitalize many jobs, processes, and services, which are
generating huge quantities of data. These “big data” are transmitted, collected,
aggregated, and analyzed to deliver deep insights into processes and human behavior. For this reason, data are called the new oil, since “data are to this century what oil
was for the last century”—that is, a driver for change, growth, and success. While
statistical and machine learning algorithms extract information from raw data,
information can be used to create knowledge, knowledge leads to understanding,
and understanding leads to wisdom (Sejnowski 2018). We have the tools and
expertise to collect data from diverse sources and in any format, which is the
cornerstone of a modern data strategy that can unleash the power of artificial
intelligence. Every single day we are creating around 2.5 quintillion bytes of data
(McKinsey Global Institute 2016). This means that almost 90% of the data in the
world has been generated over the last 2 years. This unprecedented capacity to
generate data has increased connectivity and global data flows through numerous
sources like tweets, YouTube, blogs, sensors, internet, Google, emails, pictures, etc.
For example, Google processes more than 40,000 searches every second (and 3.5
billion searches per day), 456,000 tweets are sent, and 4,146,600 YouTube videos
are watched per minute, and every minute, 154,200 Skype calls are made, 156 million emails are sent, 16 million text messages are written, etc. In other words, the
amount of data is becoming bigger and bigger (big data) day by day in terms of
volume, velocity, variety, veracity, and “value.”
No other version available