String Parsing in c++

Cosc 2150
String parsing in c++ with
regular expressions.
String parsing
• One of main tasks that a program may need to
do is take a string and parse it to determine the
next step in the program.
—Command line applications
—Search applications (like bing and google).
—Most network applications, send and receive data as
—to many more to even begin to name.
How to parse
• As with all things in c/c++ you can do it any
number of ways.
—Develop a functions and algorithms to parse a string
—Use the methods functions in the string class
– String parsing.
—Use the sscanf functions
– More like regular expressions.
—Use the regex stl
– Which is regular expressions.
+ Requires visual studio 2010 or gcc 4.3.0+
Reading a line of input.
• The standard cin reads to a space and then
—This is not always the functionality we want.
• getline function: (2 methods)
—cin.getline(c_str, 256)
– Reads to end of line marks or number of characters, which
ever comes first.
– Requires a c_str, instead of a string.
– Example:
char stuff[256];
Cin.getline(stuff, 256);
—But this is still not the method we want since it
requires c-strings.
Reading a line of input (2)
• Getline second method, which is the method we
want to use, since it returns a string.
—Part of the string class
—getline(cin, string)
string stuff;
getline(cin, stuff)
Regular Expressions
• Regex for short.
—Likely the most powerful way to do any string
• Use:
—Create a pattern that you want to match with
—Run the “match”
—If returns true, then the string matched the pattern
– Also can get all the matches into an array to use as well.
• Problem:
—Regex patterns can be very complex and we don’t
have to time (about 6 lectures) to cover the entire
regex set. This will only cover the very basics.
Code for regex
• Include the regex stl, which is part of tr1
#include <regex>
• Define the pattern
—Note pattern is a variable!
std::tr1::regex pattern ( … string pattern…);
• object that will contain the sequence of submatches (optional)
std::tr1::match_results<std::string::const_iterator> result;
code for regex
• regex_match to match the full string
If (std::tr1::regex_match(string, result, pattern))
• If true there was a match
—if capturing matches, result should have matches. if
result.size() >0 or if !result.empty()
• regex_search to match any part of a string
If (std::tr1::regex_search(string, result, pattern))
—same as match, with result.
– assume we are using regex_search unless other noted.
• Matching text
regex ex1(“ello”); //matches anything with ello
//such as “hello world”
• alternation
regex ex2(“Fred|Wilma|Pebbles”);
//true if string contains Fred, Wilma, or Pebbles
• alternation and grouping
regex ex3(“(p|g|m|s|b)et”);
//true if contains contains: pet, get, met, set, or bet
//note () are also used to capture the match
pattern (2)
• single character or’d matching, using []
—regex ex4(“[0-9]”); //match a single digit
– note the dash is a range operator ie 0 to 9
—regex ex5(“[a-zA-Z0-9.]”);
– match any one “character” a through z or 0 to 9 or the
• match quantifiers
—+ 1 or more times
—? zero or 1 time
—* zero or more times
—regex ex6(“[0-9]+”); //find 1 or more digits
—regex ex7(“[a-z]*”); //find zero or more characters
pattern (3)
• matching quantifiers {}
—{min number, max number}
—regex ex8(“[0-9]{1,3}”);
– find 1 to 3 digits
• regex ex9(“fo*ba?r{1,2}”);
—matches f, 0 or more o's, b, 0 or 1 a, then 1 or 2 r’s
—match: fobar, fbr, fbrr, fooobr, fooobarr, etc…
pattern (4)
• metasymbols
—Match any thing using the period
– regex ex10(“.+”); //find 1 or more ascii character
– ie “123”, “ atr”, “\t there” all match
+ unless the string is empty, this will match.
—\d match a Digit
– regex ex11(“\\d+”); //match 1 or more digits
—\D match a Non-digit
—\s match whitespace
—\S match a Non-whitespace
—\w match a Word character
[ \t\n\r\f]
[^ \t\n\r\f]
– regex ex12(“\\w+”); //match 1 more word character
—\W match a Non word Character
pattern (5)
• capturing the matches
—use the () around the part you want to capture
—regex ex13(“(\\w+)”);
– find 1 or more word characters and capture the resulting
—regex ex14(“(\\w+)\s+(\\w+)”);
– find 1 or more word characters, then white space, then 1 or
more word characters. Capture the word character matches
– example: “hi there”
– result[1]=“hi”, result[2]=“there”
—regex ex15(“(\\d+) (.*)”);
– What does this capture? How much this be useful with the
• tr1::regex pattern1("(\\d+) (.*)")
• tr1::regex pattern2("load M\\((\\d+)\\)");
• tr1::regex_match(str,result,pattern1);
—result[1] =
• tr1::regex_match(str,result,pattern2);
Regex reference
• Patterns
Converting strings to integers (1)
• Can use the sscanf function:
#include <cstdlib>
#include <cstdio>
int GetIntVal2(string strConvert) {
int intReturn =0;
//if sscanf fails, because no digits, intReturn is already set to zero.
return (intReturn);
Converting strings to integers (2)
• Use the atoi method
int GetIntVal(string strConvert) {
int intReturn;
// NOTE: You should probably do some checks to ensure that
// this string contains only numbers. If the string is not
// a valid integer, zero will be returned.
intReturn = atoi(strConvert.c_str());
Converting integers to strings
• Uses the ostringstream (in the <sstream>)
—Put the integer into the stream, then put it back out
as string.
#include <sstream>
#include <iostream>
string GetStrVal(int intConvert) {
ostringstream cstr; //create the stream
cstr << intConvert; //put integer into the stream
return cstr.str(); //put out the string
Converting string to integer example:
int main() {
string str, str2;
str = "12";
str2 = "1d2";
cout <<"aoti method str: "<<GetIntVal(str)<<endl;
// prints out 12
cout <<"aoti method str2: "<<GetIntVal(str2)<<endl;
//prints out 1
cout <<"sscanf method str: "<<GetIntVal2(str)<<endl;
// prints out 12
cout <<"sscanf method str2:
//prints out 1
return 0;

similar documents