Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Post-Process a Stata %tw date in R

Tags:

date

r

stata

The %tw format in Stata has the form: 1960w1 which has no equivalent in R. Therefore %tw dates must be post-processed.

Importing a .dta file into R, the date is an integer like 1304 (instead of 1985w5) or 1426 (instead of 1987w23). If it was a simple time series you could set a starting date as follows:

ts(df, start= c(1985,5), frequency=52) 

Another possibility would be:

as.Date(Camp$date, format= "%Yw%W" , origin = "1985w5") 

But if each row is not a single date, then you must convert it.

The package ISOweek is based on ISO-8601 with the form "1985-W05" and does not process the Stata %tw.

The Lubridate package does not work with this format. The week() returns the number of complete seven day periods that have occurred between the date and January 1st, plus one. week function

In Stata week 1 of any year starts on 1 January, whatever day of the week that is. Stata Documentation on Dates

In the format %W of Date in R the week starts as Monday as first day of the week.

From strptime %V is

the Week of the year as decimal number (00--53) as defined in ISO 8601. If the week (starting on Monday) containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise, it is the last week of the previous year, and the next week is week 1. (Accepted but ignored on input.) Strptime

Larmarange noted on Github that Haven doesn't interpret dates properly:

months, week, quarter and halfyear are specific format from Stata, respectively %tm, %tw, %tq and %th. I'm not sure that there are corresponding formats available in R. So far they are imported as integers.

Is there a way to convert Stata %tw to a date format R understands? Here is an Stata file with dates

like image 366
Kvasir EnDevenir Avatar asked Feb 01 '26 01:02

Kvasir EnDevenir


1 Answers

This won't be an answer in terms of R code, but it is commentary on Stata weeks that can't be fitted into a comment.

Strictly, dates in Stata are not defined by the display formats that make them intelligible to people. A date in Stata is always a numeric variable or scalar or macro defined with origin the first instance in 1960. Thus it is at best a shorthand to talk about %tw dates, etc. We can use display to see the effects of different date display formats:

. di %td 0
01jan1960

. di %tw 0
 1960w1

. di %tq 0
1960q1

. di %td 42
12feb1960

. di %tw 42
1960w43

. di %tq 42
1970q3

A subtle point made explicit above is that changing the display format will not change what is stored, i.e. the numeric value.

Otherwise put, dates in Stata are not distinct data types; they are just integers made intelligible as dates by a pertinent display format.

The question presupposes that it was correct to describe some weekly dates in terms of Stata weeks. This seems unlikely, as I know no instance in which a body outside StataCorp uses the week rules of Stata, not only that week 1 always starts on 1 January, but also that week 52 always includes either 8 or 9 days and hence that there is never a week 53 in a calendar year.

So, you need to go upstream and find out what the data should have been. Failing some explanation, my best advice is to map the 52 weeks of each year to the days that start them, namely days 1(7)358 of each calendar year.

Stata weeks won't map one-to-one to any other scheme for defining weeks.

More in this article on Stata weeks

like image 176
Nick Cox Avatar answered Feb 02 '26 16:02

Nick Cox



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!