What do cows have to do with Data Science?

Automation and Data Science make one think of high-tech production environments or mechanical engineering. But in agriculture, too, the intelligent analysis and use of data can bring enormous additional value to light. An example from the field: In dairy cows, for example, algorithms derived from sensor data help predict infections long before the farmer perceives them. This avoids contagion and consequent economic damage in advance.

Katharina Kober, Data Scientist at Körber Digital, tells us more: “Our customer uses fully automated milking systems, also called milking robots. The cows can move freely in the barn and, if necessary, go into the milking station. The machine is attached to their udders without human involvement. This is comfortable for the cows and the farmer, but has a catch: possible udder diseases such as mastitis are not detected promptly by the farmer.”

And that can be expensive, because udder diseases not only affect the well-being of the cows, but also the milk quantity and quality. Milk from diseased cows has an increased cell count, which leads to lower milk prices at the dairies. According to the Upper Austrian Chamber of Agriculture, the loss per sick cow is more than 600 euros. “Data Science allows us to monitor the animals on the basis of data, rather than visually,” says the data expert, “and protects farmers from losses.” The task was therefore: on the one hand, to detect sick animals at an early stage to avoid infecting the entire herd; and on the other hand, triggering as few false alarms as possible.

Valuable data straight from the milk

For the early detection of mastitis, sensors on the milking robot perform measurements on the milk. Unfortunately, the number of cells - the most important indicator of inflammation - cannot be determined directly. However, it is possible to measure milk temperature, milk yield or the conductivity of the milk. “And that’s where Data Science comes in: we can use the measured data to predict the cell count.”

Healthy or ill: complex decisions made by classification

Körber Digital Data Science experts use the Random Forest classification method to analyse and evaluate the data. Why classification? Katharina Kober explains the background. “A well-known classification problem is, for example, the spam filter in our email mailboxes. Using the words in emails from unknown senders, algorithms distinguish between spam and non-spam classes. We also have a classification problem with the cows. Here, the question is whether the milk temperature, milk yield and conductivity data indicate an increased cell count and thus a disease.”

Random Forest: Many decision trees are smarter than one

Like any forest, a Random Forest consists of trees, in this case decision trees. “Decision trees are algorithms that we use to classify data objects automatically. A decision tree should tell us: if this is the milk temperature, and this milk quantity and conductivity are added, then the cell number is probably in this range.”

A Random Forest increases the precision of this classification: Instead of just one decision tree, an entire forest is used. The special feature: “This is where machine learning is used: the individual decision trees grow in the Random Forest. Thanks to so-called randomisation, they vary in structure and are not interconnected. Each tree therefore classifies something different,” says the data expert. “Definitive classification by the Random Forest then follows the principle of swarm intelligence: Random Forest makes the decision that is most often estimated by the trees.”

Sharper than the human eye: early detection via Data Science

Back to the cows and daily practice. In the first step, the Körber Digital Data Science experts clean up the measured data. Outlier Detection corrects or removes corrupted values such as a break in the milking process or misaligned milking cups. Next, a Random Forest Classifier is applied to this data, which assigns a cell class to the respective milking sessions: the lower the cell number, the lower the cell class.

The results are very informative: "With sensor measurements and machine-learning algorithms, we recognise udder diseases before they become visible,” says Katharina Kober. “We can therefore avoid infecting the entire herd sooner then would be possible with a purely visual inspection by the farmer.”

The principle applies to many manufacturing challenges

“Such improvements and automation could be achieved in many areas of manufacturing. For this purpose, Digitization must be developed further, and Data Science, with its machine-learning algorithms, must be increasingly used in practice. It is important that manufacturing companies and mechanical engineers increasingly engage with the possibilities of Data Science, because the example mentioned here is only a small part of what can be achieved with data. We can, for example, very well imagine that the principle presented here can also be used for prediction and monitoring in quality control,” concludes Katharina Kober.