I want gnuplot to do the stats function only for a given range of the data.
My data looks like:
24.12.2014-08:00,34,35,44
25.12.2014-08:00,33,35,44
26.12.2014-08:00,32,32,48
27.12.2014-08:00,31,36,41
28.12.2014-08:00,34,35,44
I now have this in my plot script:
...
set datafile separator ","
stats 'out.csv' u 2 prefix "A"
set xdata time
set timefmt "%d.%m.%Y-%H:%M"
set format x "%d.%m"
set xrange["24.12.2014":"28.12.2014"]
set label 1 gprintf("Max = %g", A_max) font "-Bold" at "24.12.2014",A_max-1
...
but this calculates stats for all Dates. But I only want range from 26.12 to 28.12 for the stats calculations and the whole range for my actual chart, because I want to split my chart in different time periods stats.
The stats function does not like time data†, but you can force it to work with time data using the various functions for manipulating times. Two methods for doing this are provided.
startrange = strptime("%d.%m.%Y","26.12.2014")
endrange = strptime("%d.%m.%Y","29.12.2014")
validdate(x) = (curdate=strptime("%d.%m.%Y-%H:%M",x),curdate>=startrange&&curdate<endrange)
stats 'out.csv' u (validdate(strcol(1))?$2:1/0) prefix "A"
Which produces
* FILE:
Records: 3
Out of range: 0
Invalid: 2
Blank: 0
Data Blocks: 1
* COLUMN:
Mean: 32.3333
Std Dev: 1.2472
Sample StdDev: 1.5275
Skewness: 0.3818
Kurtosis: 1.5000
Avg Dev: 1.1111
Sum: 97.0000
Sum Sq.: 3141.0000
Mean Err.: 0.7201
Std Dev Err.: 0.5092
Skewness Err.: 1.4142
Kurtosis Err.: 2.8284
Minimum: 31.0000 [1]
Maximum: 34.0000 [2]
Quartile: 31.0000
Median: 32.0000
Quartile: 34.0000
on your sample data (the first two lines are out of range and the last three are not). Here we force out of range values to be invalid, thus we show 0 out of range.
The way that this works is that we use the strptime function which converts a date into an internal representation (in gnuplot 5, this is the number of seconds since the Unix Epoch, and is the number of seconds since Jan 1st, 2000 in versions prior). The first two lines thus get the internal value of midnight on December 26th, 2014 and midnight on December 29th, 2014 (we adjust to the next day so that we can fit all of December 28th in range).
The valid date function converts the date of interest to an internal representation and compares it to these markers. We return 1 (true) if it is in range and 0 (false) if it isn't. Note that the first comparison uses greater than or equal to to test if the date is at least equal to midnight of the start date and the second uses strictly less than to check if the date is before the start of the next day. If you have specific times in mind on those days, slight modifications can be made.
Finally, we run the stats command on a conditional value. If the date in the first column (we need to use the strcol function to load it as a string to feed to the validdate function) is in range, we use the second column value. If the date is not in range, we use the invalid value 1/0. The stats function will not use the invalid values in its analysis.
Additionally, if it is more convenient, we can accept the start and end dates as parameters in the function:
validdate(x,start,end) = (startrange=strptime("%d.%m.%Y",start),endrange=strptime("%d.%m.%Y",end),curdate=strptime("%d.%m.%Y-%H:%M",x),curdate>=startrange&&curdate<endrange)
and call the stats function like
stats 'out.csv' u (validdate(strcol(1),"26.12.2014","29.12.2014")?$2:1/0) prefix "A"
Gnuplot has a timecolumn function which can read a column as a time and date. This gives us an alternative method which is simpler, but not necessarily as powerful.
We can do
set timefmt "%d.%m.%Y-%H:%M"
stats [startrange:endrange] 'out.csv' u (timecolumn(1)):2
This will read the first column as a time using the timefmt.‡
This version works similarly to the above, except the endrange value is accepted instead of rejected (the above version is more powerful if we need more complex tests of our dates and times) and the discarded values are listed as "Out of range" instead of "Invalid".
We can also specify the start and end range inline using
stats [strptime("%d.%m.%Y","26.12.2014"):strptime("%d.%m.%Y","29.12.2014")] 'out.csv' u (timecolumn(1)):2
† Note that you MUST NOT be in time mode to use the stats function, otherwise it will just complain. Thus, the above must be ran before calling set xdata time
, or after restoring normal mode with set xdata
.
‡ In version 5, the timecolumn function can also take an additional argument which specifies the format to use (like timecolumn(1,"%d.%m.%Y-%H:%M")
instead of using the timefmt command, which is not necessary in this case)
Note that in version 5, only the two argument form is documented and the one argument form is mentioned in the documentation only as the previous format, but not as an acceptable alternative. The one argument form continues to work for now, but, as it is listed only as a previous format and not an acceptable alternative format, it is possible that the one argument form may stop working in some later version. However, I would expect this to be unlikely, as gnuplot tends to preserve backwards compatability, and the one argument form is useful in cases like the above (so the time format specification can only occur in one place in the script).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With