bigdata

Let's start by calculating the minimum, maximum, and average values of a set of values.

statistic 2

In the representation in the figure I wanted to give a functional logical position to the topics discussed by putting them in relation to the ultimate goal, which in turn is the starting point for the subsequent goals.

Note that the figure does not define the relationship between the arguments, it indicates the logic of the path developed in this post and in the following: the "statistics" and "probability" arguments provide the tools to analyze the collected data; By means of these analyzes it is possible to formulate "hypotheses" from which the subsequent scenario can be inferred by "inference".

In the first step of this path, you must have a set of sample data on which to perform a statistical analysis. The example I chose is the set of weights of young students of a primary school belonging to the same year.

From the Swift manual, Collection Types chapter we find that we have three types of containers:

  1. Set: a collection of unordered unique values.
  2. Dictionary: an unordered collection of association of key-value.
  3. Array: an ordered collection of values.

Thinking on which container is best suited for collecting our data we must exclude the Set and Dictionary. The first since our set could have doubles with students of equal weight and the Set container must have unique values. The second could be useful if I wanted to use an arbitrary key to uniquely identify each student, such as the ID code, but this is not the case: in this analysis, it is not relevant to identify each individual student but to make a job on the whole.

I then use the array container and I fill it with the weights of 90 students (for example, three prime classes). To fill the array with almost random values, use the arc4random_uniform() function to return values between 15 and 35 (don't forget to insert at the beginning of the code: "import Foundation").

var studentsWeight = [Int]()

for index in 0...89 {

    studentsaWeight.append(Int(arc4random_uniform(20)+15))

 


arc4random_uniform(20) returns a random value between 0 and 20, this value is added to 15 so the range is between 15 and 35. This is an example of the result obtained in the casual generation of studentsWeight array:

[32, 27, 21, 24, 31, 33, 26, 28, 18, 27, 20, 23, 29, 20, 15, 26, 27, 30, 20, 24, 33, 28, 31, 28, 29, 17, 33, 24, 17, 31, 31, 29, 18, 23, 18, 29, 16, 24, 24, 27, 28, 24, 33, 27, 26, 24, 30, 24, 28, 25, 17, 16, 30, 15, 20, 27, 19, 24, 24, 30, 32, 30, 18, 34, 28, 29, 28, 16, 34, 15, 30, 15, 27, 31, 25, 34, 19, 24, 24, 18, 22, 18, 20, 31, 15, 34, 34, 32, 18, 22] 


A first easy step to do with this set of data is to figure out what is the lower and upper weight. These values are derived with their respective functions.

let minimumWeight = studentsWeight.min()

let maximumWeight = studentsWeight.max()

 

These are the values that are obtained from the sample array above.

minimumWeight: 15
maximumWeight: 34


The min() and max() functions are part of the array type. While this information may seem trivial for statistical purposes it is possible to calculate the measurement of the dispersion which is the range between the maximum and the minimum (34-15 = 19); in the specific case, the dispersion of the population weight of the three classes of children is 19Kg.

Più interessante potrebbe essere la media dei pesi degli studenti dell'anno in corso, se venisse paragonata ai valori medi ricavati negli anni precedenti potrebbe fornire un'informazione sulle abitudini alimentari. La media viene anche definita come misura della tendenza centrale, riporto questa definizione per enfatizzare l'importanza del valore medio. Il tipo array non dispone di una funzione apposita per calcolare la media, è però possibile aggiungerla tramite le extensions.

More interesting could be the average of the student weights for the current year because if compared to the average values obtained in previous years could provide information on eating habits. The average is also defined as the measure of the central tendency, I'm carrying this definition to emphasize the importance of the average value. The array type does not have a dedicated function to calculate the average, but it can be added through extensions.

extension Collection where Iterator.Element == Int, Index == Int {

    var average: Double {

        if isEmpty {

            return 0

        }

        return Double(reduce(0, +)) / Double(endIndex-startIndex)

    }

}

 

let weightAverage = studentsWeight.average

 

From the above values, we obtain this average.

 weightAverage: 25.1

If there is still no confidence with Swift, the above code may seem disarming. Collection is a protocol to which set, dictionary, and array types conform. You can extend the Collection type by adding a computed property that calculates the average of the collection elements. The average variable, if the Collection is not empty, returns a calculated Double value by dividing the value obtained by reduce function with the number of items in the Collection. I remember that reduce reduces all array elements to a single value in this case by the sum operation (indicated by the + symbol); the sum is then divided by the number of elements, calculated here by subtracting the final index of the array to the initial one.

The average is calculated In the last row; this is made possible because the studentsWeight variable has been previously defined as an array, and therefore conforms to the Collection protocol, than it automatically acquires the new computed property average.

Finally we consider the statement where that it places constraints on the elements managed by the procedure. Iterator is a type that Swift uses to cycle on elements of a set, in this case an array. The constraint that the where imposes is that the element of the returned array during each cycle is of the Int type, as well as the array index. In practice, the above-defined extension can be executed every time the constraint is met. Do not specify this constraint makes the computed property average not compliant and consequently generates a Swift error.

One last observation. By relaunching the program several times the random generation of values will lead to the creation of different arrays but that, in many cases, will have the same minimum and maximum values in common. It follows that minimum and maximum are not particularly useful tools for analyzing a set of values. The media provides more effective help to deduce the tendency of a set of values. In the next posts, we will discover new tools that have this purpose.