有同学邮件请教perl脚本提取TCGA生存数据失败的原因,奈何进哥没有用过Perl,于是果断决定现学现卖,脚本如下,保存为survival_time.pl文件,在终端运行 perl survival_time.pl clinical.cart.2022-08-06.json
#!/usr/bin/perl -w
use strict;
use warnings;
my $file=$ARGV[0];
#use Data::Dumper;
use JSON;
my $json = new JSON;
my $js;
open JFILE, "$file";
while(<JFILE>) {
$js .= "$_";
}
my $obj = $json->decode($js);
#print $obj->[0]->{'cases'}->[0]->{'diagnoses'}->[0]->{'vital_status'} . "\n";
open(WF,">time.txt") or die $!;
print WF "id\tfutime\tfustat\n";
my %hash=();
for my $i(@{$obj})
{
my $vitalsStatus=$i->{'demographic'}->{'vital_status'};
my $submitterId=$i->{'demographic'}->{'submitter_id'};
print $vitalsStatus;
print $submitterId;
my @subId=split(/\_/,$submitterId);
print $subId[0] . "\n";
if(exists $hash{$subId[0]})
{
next;
}
else
{
$hash{$subId[0]}=1;
}
if($vitalsStatus eq 'Alive')
{
my $days_to_last_follow_up =0;
for my $item(@{$i->{'diagnoses'}})
{
$days_to_last_follow_up= $item->{'days_to_last_follow_up'};
};
print $days_to_last_follow_up;
if( $days_to_last_follow_up !=0)
{
print WF "$subId[0]\t$days_to_last_follow_up\t0\n";
}
}
else
{
my $days_to_death=$i->{'demographic'}->{'days_to_death'};
if(defined $days_to_death)
{
print $days_to_death;
print WF "$subId[0]\t$days_to_death\t1\n";
}
}
}
close(WF);
#print Dumper $obj
测试数据clinical.cart.2022-08-06.json是从TCGA官网下载的json clinical文件,提取结果如下:
进哥,你好,假设现在有一个Sample ID的list,请问是否可以实现从clinical.json中仅提取这个Sample list的生存数据呢?
当然可以。json其实就是一种表格,读取到R,提取就行