Is it suitable for de-redundancy in transcriptome assembly?

Hello teacher, I am having trouble removing redundant assembly, can you give me some advice?

I am doing a common transcriptome-based mining identification of viruses, and I assemble the sequences downloaded from the SRA separately after removing the host. My plan is to aggregate these assembly results for candidate virus alignment identification. I saw two deredundancy methods of mmseqs2 easy-cluster and easy-linclust, but also retrieved the deredundancy of cd-hit-est, I don't know if mmseqs2 is suitable for the purpose of deredundancy of my transcriptome assembly and merging, if I want to set a stricter clustering threshold, what parameters do I need to pay attention to, I hope you can help me.

I also initially tried the mmseqs2 easy-linclust which is much faster than cd-hit-est.

`mmseqs easy-linclust virus.candidate.fasta mmseqs.cluster ./mmseqs.tmp --threads 60`

And the results of mmseqs.cluster_all_seqs.fasta, mmseqs.cluster_cluster.tsv, mmseqs.cluster_rep_seq.fasta are obtained. I know mmseqs.cluster_rep_seq.fasta should be the result of deredundancy, but I want to get the information for clustering in order to find the distribution of the virus sequence across different samples, which file should be viewed, or what parameters are set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it suitable for de-redundancy in transcriptome assembly? #1064

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is it suitable for de-redundancy in transcriptome assembly? #1064

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions