Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gsub string before first number with upper and lowercase characters

Tags:

r

Remove everything after the first number. The data I have looks like:

[1] NA                                   "ITEM 1. BUSINESS"                  
[3] "ITEM 1A. RISK FACTORS"              "ITEM 1B. UNRESOLVED STAFF COMMENTS"
[5] "ITEM 2. PROPERTIES"                 "ITEM 3. LEGAL PROCEEDINGS"       

I am trying to keep so that I have

NA           ITEM1
ITEM1A      ITEM1B
ITEM2       ITEM3

(or even keeping the spaces between ITEM 1, ITEM 2 etc.)

I have tried the following without any luck.

x <- toupper(x)
x <- gsub("[^[:alnum:][:space:]]","", x)
x <- gsub(" ", "", x)
x <- substr(x, start = 1, stop = 7)
x <- gsub("\\[digits]*","", x)

Also tried:

    y <- str_extract(x, "Item")
y <- str_extract(toupper(words$item), "ITEM")

Data:

c(NA, "ITEM 1. BUSINESS", "ITEM 1A. RISK FACTORS", "ITEM 1B. UNRESOLVED STAFF COMMENTS", 
"ITEM 2. PROPERTIES", "ITEM 3. LEGAL PROCEEDINGS", "ITEM 4. MINE SAFETY DISCLOSURES", 
"ITEM 5. MARKET FOR REGISTRANT’S COMMON EQUITY, RELATED STOCKHOLDER MATTERS AND ISSUER PURCHASES OF EQUITY SECURITIES", 
"ITEM 6. SELECTED FINANCIAL DATA ", "ITEM 7. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIAL CONDITION AND RESULTS OF OPERATIONS ", 
"ITEM 7A. QUANTITATIVE AND QUALITATIVE DISCLOSURES ABOUT MARKET RISK", 
"ITEM 8. FINANCIAL STATEMENTS AND SUPPLEMENTARY DATA", "ITEM 9. CHANGES IN AND DISAGREEMENTS WITH ACCOUNTANTS ON ACCOUNTING AND FINANCIAL DISCLOSURE", 
"ITEM 9A. CONTROLS AND PROCEDURES", "ITEM 9B.  OTHER INFORMATION", 
"ITEM 10. DIRECTORS, EXECUTIVE OFFICERS AND CORPORATE GOVERNANCE", 
"ITEM 11. EXECUTIVE COMPENSATION", "ITEM 12. SECURITY OWNERSHIP OF CERTAIN BENEFICIAL OWNERS AND MANAGEMENT AND RELATED STOCKHOLDER MATTERS", 
"ITEM 13. CERTAIN RELATIONSHIPS AND RELATED TRANSACTIONS, AND DIRECTOR INDEPENDENCE", 
"ITEM 14. PRINCIPAL ACCOUNTING FEES AND SERVICES", "ITEM 15. EXHIBITS, FINANCIAL STATEMENT SCHEDULE", 
"Item 1.    Business", "Item 1A.    Risk Factors", "Item 1B.    Unresolved Staff Comments", 
"Item 2.    Properties", "Item 3.    Legal Proceedings", "Item 4.    Mine Safety Disclosure", 
"Item 5.    Market for Registrant’s Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities", 
"Item 6.    Selected Financial Data", "Item 7.    Management’s Discussion and Analysis of Financial Condition and Results of Operations", 
"Item 7A.    Quantitative and Qualitative Disclosures About Market Risk", 
"Item 8.    Financial Statements and Supplementary Data", "Item 9.    Changes in and Disagreements with Accountants on Accounting and Financial Disclosure", 
"Item 9A.    Controls and Procedures", "Item 9B.    Other Information", 
"Item 10.    Directors, Executive Officers and Corporate Governance", 
"Item 11.    Executive Compensation", "Item 12.    Security Ownership of Certain Beneficial Owners and Management and Related Stockholder Matters", 
"Item 13.    Certain Relationships and Related Transactions, and Director Independence", 
"Item 14.    Principal Accountant Fees and Services", "Item 15.    Exhibits and Financial Statement Schedules(a)(1) and (2).  The following documents have been included in Part II, Item 8. Report of Ernst & Young LLP, Independent Registered Public Accounting Firm, on Financial Statements Consolidated Statements of Financial Position — As of December 31, 2017 and 2016 Consolidated Statements of Income — Years Ended December 31, 2017, 2016 and 2015 Consolidated Statements of Comprehensive Income — Years Ended December 31, 2017, 2016 and 2015 Consolidated Statements of Shareholders’ Equity — Years Ended December 31, 2017, 2016 and 2015 Consolidated Statements of Cash Flows — Years Ended December 31, 2017, 2016 and 2015 Notes to Consolidated Financial Statements", 
"Item 1.  Business.", "Item 1A.  Risk Factors.", "Item 1B.  Unresolved Staff Comments.", 
"Item 2.  Properties.", "Item 3.  Legal Proceedings.", "Item 4.  Mine Safety Disclosures.", 
"Item 5.  Market for Registrant's Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities.", 
"Item 6.  Selected Financial Data.", "Item 7.  Management's Discussion and Analysis of Financial Condition and Results of Operations. ", 
"Item 7A.  Quantitative and Qualitative Disclosures About Market Risk.", 
"Item 8.  Financial Statements and Supplementary Data.", "Item 9.  Changes in and Disagreements with Accountants on Accounting and Financial Disclosure.", 
"Item 9A.  Controls and Procedures.", "Item 9B.  Other Information.", 
"Item 10.  Directors, Executive Officers and Corporate Governance.", 
"Item 11.  Executive Compensation.", "Item 12.  Security Ownership of Certain Beneficial Owners and Management and Related Stockholder Matters.", 
"Item 13.  Certain Relationships and Related Transactions, and Director Independence.", 
"Item 14.  Principal Accounting Fees and Services.", "Item 15.  Exhibits, Financial Statement Schedules.", 
"Item 16. Form 10-K Summary.", "Item 4.    Mine Safety Disclosures", 
"Item 4A.    Executive Officers", "Item 5.    Market for the Registrant's Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities", 
"Item 6.    Selected Financial Data", "Item 7.   Management's Discussion and Analysis of Financial Condition and Results of Operations", 
"Item 8.   Financial Statements and Supplementary Data", "Item 15.    Exhibits, Financial Statement Schedules"
)
like image 861
user8959427 Avatar asked Nov 19 '25 02:11

user8959427


1 Answers

Here's another way to do it. We can use the \\U flag along with perl = TRUE to capitalize everything:

s1 <- gsub("^(.*?)\\..*","\\U\\1", test, perl = T)
s2 <- gsub("\\s+", "", s1)

[1] NA       "ITEM1"  "ITEM1A" "ITEM1B" "ITEM2"  "ITEM3"  
 "ITEM4"  "ITEM5"  "ITEM6"  "ITEM7"  "ITEM7A"

My first expression breaks off "item" based based on where the period is.

like image 196
Mako212 Avatar answered Nov 21 '25 17:11

Mako212



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!