Posted on

有同学邮件请教perl脚本提取TCGA生存数据失败的原因,奈何进哥没有用过Perl,于是果断决定现学现卖,脚本如下,保存为survival_time.pl文件,在终端运行 perl survival_time.pl clinical.cart.2022-08-06.json

#!/usr/bin/perl -w
use strict;
use warnings;

my $file=$ARGV[0];

#use Data::Dumper;
use JSON;
my $json = new JSON;
my $js;

open JFILE, "$file";
while(<JFILE>) {
	$js .= "$_";
}
my $obj = $json->decode($js);
#print $obj->[0]->{'cases'}->[0]->{'diagnoses'}->[0]->{'vital_status'} . "\n";

open(WF,">time.txt") or  die $!;
print WF "id\tfutime\tfustat\n";
my %hash=();
for my $i(@{$obj})
{
	my $vitalsStatus=$i->{'demographic'}->{'vital_status'};
	my $submitterId=$i->{'demographic'}->{'submitter_id'};
	print $vitalsStatus;
	print $submitterId;

	my @subId=split(/\_/,$submitterId);
	print $subId[0] . "\n";
	if(exists $hash{$subId[0]})
	{
		
		next;
	}
	else
	{
		$hash{$subId[0]}=1;
	}
	
	if($vitalsStatus eq 'Alive')
	{
		my $days_to_last_follow_up =0;

		for my $item(@{$i->{'diagnoses'}})                                                                                                           
		{                                                                                                                                               
		  $days_to_last_follow_up= $item->{'days_to_last_follow_up'};                                                                                                                 
		};
		print $days_to_last_follow_up;

		if( $days_to_last_follow_up !=0)
		{
			print WF "$subId[0]\t$days_to_last_follow_up\t0\n";
		}
	}
	else
	{
		my $days_to_death=$i->{'demographic'}->{'days_to_death'};
		if(defined $days_to_death)
		{
			print $days_to_death;
			print WF "$subId[0]\t$days_to_death\t1\n";
		}
	}
}
close(WF);
#print Dumper $obj

测试数据clinical.cart.2022-08-06.json是从TCGA官网下载的json clinical文件,提取结果如下:

2 Replies to “Perl脚本提取新版TCGA clinical_json中的生存数据”

  1. 进哥,你好,假设现在有一个Sample ID的list,请问是否可以实现从clinical.json中仅提取这个Sample list的生存数据呢?

发表评论

邮箱地址不会被公开。 必填项已用*标注