extract columns by matching ids in two files

Discussion in 'Perl' started by sheen, Apr 7, 2012.

  1. sheen

    sheen New Member

    Joined:
    Apr 7, 2012
    Messages:
    1
    Likes Received:
    0
    Trophy Points:
    0
    Blocks of code should be set as style "Formatted" like this.
    Code: Cpp
    Hello,

    I want to extract columns from file2 to file3 by matching ids between file1 and file2. The extracted columns should be in same order as file1 ids.

    for example:

    file1.txt
    1823
    607
    R2A9
    802
    771

    file2.txt
    1823 1 2 4
    22 11 4 29
    607 12 3 3
    R2A9 34 4 9
    D33 2 1 0
    802 30 8 1
    771
    3 0 9
    3RE 6 3 1



    output file3.txt should be printed in this way

    1823 1 2 4
    607 12 3 3
    R2A9 34 4 9
    802 30 8 1
    771
    3 0 9

    Please suggest me something.

    Thanks,
    /S
     
  2. dearvivekkumar

    dearvivekkumar New Member

    Joined:
    Feb 21, 2012
    Messages:
    29
    Likes Received:
    5
    Trophy Points:
    0
    Code:
    /*
    file1.txt 
    1823
    607
    R2A9
    802
    771
    
    file2.txt
    1823 1 2 4
    22 11 4 29
    607 12 3 3
    R2A9 34 4 9
    D33 2 1 0
    802 30 8 1
    771 3 0 9
    3RE 6 3 1
    
    
    
    output file3.txt should be printed in this way
    
    1823 1 2 4
    607 12 3 3
    R2A9 34 4 9
    802 30 8 1
    771 3 0 9
    */
    
    #include <fstream>
    #include <string>
    #include <vector>
    #include <map>
    
    void ExtractCol()
    {
    	do
    	{
    		/*
    		 * Open file one collects its data line-by-line in vector of string.
    		 */
    		std::fstream file;
    		file.open("file1.txt", std::ios::in);
    		if(!file)
    		{
    			break;
    		}
    		std::vector<std::string> file1Data;
    		std::string line("");
    		while(!file.eof())
    		{
    			line.clear();
    			std::getline(file, line, '\n');
    			file1Data.push_back(line);
    		}
    		file.close();
    
    		/*
    		 * Open file2 and collects its data in string-string map.
    		 * the first word of each line in file 2 will acts as a
    		 * key for the map and rest part of each line will be 
    		 * stored as its value.
    		 */
    		file.open("file2.txt", std::ios::in);
    		if(!file)
    			break;
    
    		typedef std::pair<std::string, std::string> strstrpair;
    		typedef std::map<std::string, std::string> strstrmap;
    		strstrmap file2Data;
    		while(!file.eof())
    		{
    			line.clear();
    			std::getline(file, line, '\n');
    			size_t found = line.find_first_of(" ");
    			file2Data.insert(strstrpair(line.substr(0, found), line.substr(found+1, line.length() - 1)));
    		}
    		file.close();
    
    		/*
    		 * Prepare data for file 3.
    		 * We need to put those lines of file 2 in file3 which 
    		 * is common in both file1 and file2's starting word.
    		 */
    		std::string file3Data("");
    		for(std::vector<std::string>::iterator it = file1Data.begin(); it != file1Data.end(); ++it)
    		{
    			strstrmap::iterator it2;
    			it2 = file2Data.find(*it);
    			if(it2 != file2Data.end())
    			{
    				file3Data.append(*it);
    				file3Data.append(" ");
    				file3Data.append(it2->second);
    				file3Data.append("\n");
    			}
    		}
    
    		/* 
    		 * finally create file 3.
    		 */
    		file.open("file3.txt", std::ios::out|std::ios::trunc);
    		if(!file)
    			break;
    		file.write(file3Data.c_str(), file3Data.length());
    		file.close();
    	}while(false);
    }
    
     
  3. ccharley

    ccharley New Member

    Joined:
    Apr 30, 2012
    Messages:
    2
    Likes Received:
    0
    Trophy Points:
    0
    Hello Sheen,

    Perl could solve this problem with code like that below. Notice the $trie, (pronounced 'try'), variable. Starting with perl 5.10 I believe, perl uses a trie to search for alternating strings. It is Big O1 or constant and scales well.

    My code builds a trie of the alternating values in file1. Then, it reads file 2 and if the beginning of any line matches the trie, it prints out that line from file 2. If you want that in a third file, simply open a file for wring and print there. My example just prints to STDOUT, (the console window).

    Chris

    Code:
    #!/usr/bin/perl
    use strict;
    use warnings;
    use 5.014;
    
    my $file1 = <<EOF;
    1823
    607
    R2A9
    802
    771
    EOF
    
    my $file2 = <<EOF;
    1823 1 2 4
    22 11 4 29
    607 12 3 3
    R2A9 34 4 9
    D33 2 1 0
    802 30 8 1
    771 3 0 9
    3RE 6 3 1
    EOF
    
    my $trie;
    {
    	local $/;
    	open my $fh, "<", \$file1;
    	$trie = join "|", split /\n/, <$fh>;
    	close $fh or die $!;
    }
    
    open my $fh, "<", \$file2;
    /^(?:$trie)/ && print  while <$fh>;
    close $fh or die $!;
    
    The output is:

    Code:
    C:\Old_Data\perlp>perl t.pl
    1823 1 2 4
    607 12 3 3
    R2A9 34 4 9
    802 30 8 1
    771 3 0 9
     
  4. ccharley

    ccharley New Member

    Joined:
    Apr 30, 2012
    Messages:
    2
    Likes Received:
    0
    Trophy Points:
    0
    Oh, just saw that you were looking for a Cpp solution.
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice