in Programming

C++ text file process summary

Summary

Read out a text file line by line in C++ into string Vector, and process it accordingly, by using string process function, and regular expressions, after the processing, removed the white space, tab space, empty line, comments etc.

Read line by line to Vector

Open and push each line into the string Vector is easy, refer to this Link, use function std::getline to read out each line and use store.push_back(textLine) to push each line into string vector, use for loop to iterate each line and process, as below code shows,

vector<string> store;
std::string textLine;
//sPath contains the file name and path.
std::ifstream inputFile (sPath);
//make sure file is properly opened.
if(!inputFile)
{
    szMsg.Format(_T("Failed to open the script file!"));
    throw szMsg;
}
//read out the file one line by one line and push into the Vector "store"
while (std::getline(inputFile, textLine))
{
    store.push_back(textLine);
}

//process the string Vector one line by one line
for (std::vector<string>::iterator it = store.begin() ; it != store.end(); ++it)
{
    std::stringstream apdu_str; // String stream to convert int to string   
    apdu_str << *it; // Convert to stringstream
    //remove all the white space or tabe space.
    std::string apdu_str_tmp = reduce(apdu_str.str(), ""); 
    //convert to upper case
    transform(apdu_str_tmp.begin(), apdu_str_tmp.end(), apdu_str_tmp.begin(), toupper);
    //check if the first character is hex or not, if it is not hex, and it\'s not "RESET", will continue the loop to process the next line
    if(!isxdigit(apdu_str_tmp[0]) && apdu_str_tmp.compare("RESET"))
    {
        //cout<<"it not hex and not RESET, illegal!!!!";
        continue;
    }
    //remove the comments, comments start with "//"
    std::tr1::regex regex_apdu ("(.*)(\\/\\/.*)");  
    //comments includes "//" and after "//" will be removed. 
    apdu_str_tmp = tr1::regex_replace (apdu_str_tmp,regex_apdu,std::string("$1"));

    LPWSTR new_apdu_str;
    //convert to LPWSTR and display in the GUI.
    new_apdu_str = ConvertString(apdu_str_tmp); 
    szMsg.Format(_T("%s"), new_apdu_str);   
    LOG_INFO_APPEND(szMsg);
    //delete the new_apdu_str to avoid memory leakage.
    delete[] new_apdu_str;
}

Function to remove the white space

Refer to this Link, to to remove or replace the white space or tab space in the string, splits into two steps, first step use trim function to remove the leading white space and ending white space, 2nd step remove or replace the white space in the middle of the string. Below is the two functions,

std::string trim(const std::string& str,
                 const std::string& whitespace = " t")
{
    std::size_t strBegin = str.find_first_not_of(whitespace);
    if (strBegin == std::string::npos)
        return ""; // no content

    std::size_t strEnd = str.find_last_not_of(whitespace);
    std::size_t strRange = strEnd - strBegin + 1;

    return str.substr(strBegin, strRange);
}

std::string reduce(const std::string& str,
                   const std::string& fill = " ",
                   const std::string& whitespace = " t")
{
    // trim first
    std::string result = trim(str, whitespace);

    // replace sub ranges
    std::size_t beginSpace = result.find_first_of(whitespace);
    while (beginSpace != std::string::npos)
    {
        std::size_t endSpace = result.find_first_not_of(whitespace, beginSpace);
        std::size_t range = endSpace - beginSpace;

        result.replace(beginSpace, range, fill);

        std::size_t newStart = beginSpace + fill.length();
        beginSpace = result.find_first_of(whitespace, newStart);
    }

    return result;
}

Regular Expressions to remove comments

Use the regular expression to find out the comments and remove it, for all the string start with “//” will be processed as comments and be removed. The regular expression is as:

std::tr1::regex regex_apdu ("(.*)(\\/\\/.*)"); 

To remove the comments, when matched, there will be three results be stored into variables, “$0” will store the whole matching, “$1” will store the string before the comments, “$2” will store the string after the comments (include the “//”), so if we output “$1”, it will be the result of removed comments, as below code shows,

apdu_str_tmp = tr1::regex_replace (apdu_str_tmp,regex_apdu,std::string("$1"));

As of above function, the 3rd parameter must be a string class, not a string literal, added string literal cast to a stringstd::string.
Refer to Link1, Link2 and Link3 for more examples.

Header and namespace usage

Used below header and namespace:

#include <vector>
#include <iterator>
#include <fstream>
#include <string>
#include <sstream>
#include <algorithm>
#include <regex>
using namespace tr1;
using namespace std;

Process Result

Before the process the read out text is as below, there is some space in the line 6:

Reset
Res
00A4040007A0000000041010
//This is comments 
80500000081122334455667788//this is comment1



not a hex
8482000010404142434445464748494A4B4C4D4E4F404142434445464748494A4B4C4D4E4F404142434445464748494A4B4C4D4E4F  //this is a comments2

After process the result is as below,

RESET
00A4040007A0000000041010
80500000081122334455667788
8482000010404142434445464748494A4B4C4D4E4F404142434445464748494A4B4C4D4E4F404142434445464748494A4B4C4D4E4F

Write a Comment

Comment