extract columns by matching ids in two files

sheen · Apr 7, 2012

Blocks of code should be set as style "Formatted" like this.
Code: Cpp
Hello,

I want to extract columns from file2 to file3 by matching ids between file1 and file2. The extracted columns should be in same order as file1 ids.

for example:

file1.txt
1823
607
R2A9
802
771

file2.txt
1823 1 2 4
22 11 4 29
607 12 3 3
R2A9 34 4 9
D33 2 1 0
802 30 8 1
771 3 0 9
3RE 6 3 1

output file3.txt should be printed in this way

1823 1 2 4
607 12 3 3
R2A9 34 4 9
802 30 8 1
771 3 0 9

Please suggest me something.

Thanks,
/S

dearvivekkumar · Apr 12, 2012

Code:

/*
file1.txt 
1823
607
R2A9
802
771

file2.txt
1823 1 2 4
22 11 4 29
607 12 3 3
R2A9 34 4 9
D33 2 1 0
802 30 8 1
771 3 0 9
3RE 6 3 1



output file3.txt should be printed in this way

1823 1 2 4
607 12 3 3
R2A9 34 4 9
802 30 8 1
771 3 0 9
*/

#include <fstream>
#include <string>
#include <vector>
#include <map>

void ExtractCol()
{
	do
	{
		/*
		 * Open file one collects its data line-by-line in vector of string.
		 */
		std::fstream file;
		file.open("file1.txt", std::ios::in);
		if(!file)
		{
			break;
		}
		std::vector<std::string> file1Data;
		std::string line("");
		while(!file.eof())
		{
			line.clear();
			std::getline(file, line, '\n');
			file1Data.push_back(line);
		}
		file.close();

		/*
		 * Open file2 and collects its data in string-string map.
		 * the first word of each line in file 2 will acts as a
		 * key for the map and rest part of each line will be 
		 * stored as its value.
		 */
		file.open("file2.txt", std::ios::in);
		if(!file)
			break;

		typedef std::pair<std::string, std::string> strstrpair;
		typedef std::map<std::string, std::string> strstrmap;
		strstrmap file2Data;
		while(!file.eof())
		{
			line.clear();
			std::getline(file, line, '\n');
			size_t found = line.find_first_of(" ");
			file2Data.insert(strstrpair(line.substr(0, found), line.substr(found+1, line.length() - 1)));
		}
		file.close();

		/*
		 * Prepare data for file 3.
		 * We need to put those lines of file 2 in file3 which 
		 * is common in both file1 and file2's starting word.
		 */
		std::string file3Data("");
		for(std::vector<std::string>::iterator it = file1Data.begin(); it != file1Data.end(); ++it)
		{
			strstrmap::iterator it2;
			it2 = file2Data.find(*it);
			if(it2 != file2Data.end())
			{
				file3Data.append(*it);
				file3Data.append(" ");
				file3Data.append(it2->second);
				file3Data.append("\n");
			}
		}

		/* 
		 * finally create file 3.
		 */
		file.open("file3.txt", std::ios::out|std::ios::trunc);
		if(!file)
			break;
		file.write(file3Data.c_str(), file3Data.length());
		file.close();
	}while(false);
}

ccharley · Apr 30, 2012

Hello Sheen,

Perl could solve this problem with code like that below. Notice the $trie, (pronounced 'try'), variable. Starting with perl 5.10 I believe, perl uses a trie to search for alternating strings. It is Big O1 or constant and scales well.

My code builds a trie of the alternating values in file1. Then, it reads file 2 and if the beginning of any line matches the trie, it prints out that line from file 2. If you want that in a third file, simply open a file for wring and print there. My example just prints to STDOUT, (the console window).

Chris
Code:
#!/usr/bin/perl
use strict;
use warnings;
use 5.014;

my $file1 = <<EOF;
1823
607
R2A9
802
771
EOF

my $file2 = <<EOF;
1823 1 2 4
22 11 4 29
607 12 3 3
R2A9 34 4 9
D33 2 1 0
802 30 8 1
771 3 0 9
3RE 6 3 1
EOF

my $trie;
{
	local $/;
	open my $fh, "<", \$file1;
	$trie = join "|", split /\n/, <$fh>;
	close $fh or die $!;
}

open my $fh, "<", \$file2;
/^(?:$trie)/ && print  while <$fh>;
close $fh or die $!;
The output is:
Code:
C:\Old_Data\perlp>perl t.pl
1823 1 2 4
607 12 3 3
R2A9 34 4 9
802 30 8 1
771 3 0 9

ccharley · Apr 30, 2012

Oh, just saw that you were looking for a Cpp solution.

Log in or Sign up

extract columns by matching ids in two files

sheen New Member

dearvivekkumar New Member

ccharley New Member

ccharley New Member

Share This Page

Log in or Sign up

extract columns by matching ids in two files

sheen New Member

dearvivekkumar New Member

ccharley New Member

ccharley New Member

Share This Page

Useful Searches