Alaska: FAQ

What is Alaska?

Alaska is a program that is meant to help you perform RNA-sequence analysis. At the moment, Alaska can help you analyze RNA-seq experiments that compare mutants-vs-wild-type as well as two-factor designs where there are two distinct classes of perturbations.

How do I start using Alaska?

Go to the homepage and follow the instructions!

What organisms do Alaska support?

At this moment, Alaska supports any genome serviced by WormBase.

Are there plans to support more organisms?

Yes. Eventually.

How do I get help? How do I contact WormBase?

Click on the Contact us links and follow the instructions!

Is Alaska free?

Yes. Alaska is completely free to use.

Is Alaska open-source?

Yes, as is all the software that Alaska uses. You may access the full source code on Github here.

How do I help improve Alaska?

Please contact us if you run into problems. We need to know about bugs so we can fix them.

I closed my browser while I was preparing my project! How do I get back to where I left off?

You may return to your project at any time using the unique URL that was generated for your project. This unique URL can be found in the email that was sent to you when the project was initialized. If you know your project ID, you may manually enter the URL into your browser: http://alaska.caltech.edu/?id=[PROJECT_ID], replacing the [PROJECT_ID] with your actual project ID.

How do I upload my reads?

Raw reads can be uploaded using an FTP client. Alternatively, you can upload your reads through the command-line setting up an FTP connection. In the past, Cyberduck has worked well for us.

How long will my reads be stored?

We will store your reads for up to 10 days after the time your project was created. After that, Alaska will automatically remove them, so that we can free up space for other users to upload their reads as well. Your analysis results, including the Read Quantification & Alignment and Differential Expression Analysis, will be stored indefinitely.

If I upload my reads, does that mean they are open to the public?

No. None of your data will be made public.

What file extensions are supported by Alaska?

Alaska supports a variety of compression formats. Raw reads may either be .fastq or .fastq.gz. Archives may be .tar.gz, .tar, .zip, or .rar,

Alaska doesn't show some of my files! What should I do?

Please ensure all of the files you uploaded are in one of the supported formats.

What are MD5 checksums?

MD5 checksums are unique identifiers for your files. This is a way to make sure that a file got transferred correctly.

Some of the MD5 checksums are incorrect. What should I do?

You will need to reupload your reads. When the MD5 checksums do not agree, it is a sign the file was corrupted during transfer.

Alaska detects the wrong number of samples. What should I do?

Alaska identifies biological replicates using the directory structure you provide. Please make sure each biological replicate or sample is in its own, uniquely named folder, and that all of your uploaded files are in one of the supported formats.

What are "factor"s?

A “factor” refers to the number of classes of variables under measurement. For example, if we wish to find the differentially expressed genes between a mutant genotype and a control genotype, we wish to compare samples amongst a single factor: “genotype”. If we had three distinct mutant genotypes that we wanted to compare to a control genotype, our statistical comparison would have consist of a single "factor", since "genotype" is the only distinguishing feature amongst all our samples. On the other hand, if we wished to find genes that respond to heat-shock AND simultaneously find the genes that change between two genotypes, then our experiment will have two "factors": temperature and genotype. Conceptually, this is very similar to splitting our experiment into two single-factor experiments (one comparing genotypes and one comparing heat-shock status). However, the two-factor design is particularly powerful if we think that the two variables we are studying may interact in an interesting fashion. For example, imagine the following experiment with two C. elegans strains: an unc-54 (paralyzed) mutant, and a strain carrying an extrachromosomal array that expresses wild-type unc-54 in response to heat-shock. In this case, we need to use a 2-factor design to deconvolute the effect of heat-shock from the effects of unc-54 expression post-heat-shock. If we split up this design, we won't be able to perform this deconvolution (see next question).

What are 1-factor and 2-factor designs?

A 1-factor design only compares a single kind of experimental perturbation to a baseline. An example of a 1-factor design is a design where we find the differentially expressed genes of several single mutant genotypes relative to a wild-type control.

In a 2-factor design, we now have two kinds of experimental perturbations. For example, we may be interested in simultaneously exploring the effects of mutation and hypoxia. In this case, ‘genotype’ and ‘oxygen status’ are the two factors under investigation.

2-factor designs are interesting because they allow us to explore the independent effects of two perturbations simultaneously without increasing the number of comparisons we are doing. So, instead of finding the differentially expressed genes in mutant animals, hypoxic animals and mutant hypoxic animals relative to the wild-type through 3-pairwise comparisons, we could fit a 2-factor design to identify the effects of hypoxia and mutation that add.

In some cases, factor effects don’t add, however. In biology, an important kind of non-additivity is called epistasis and happens when both factors under exploration break the same module. In this case, the power of a 2-factor design lies in its ability to include a third parameter that identifies the genes where the 2-factors under investigation do not add. See this paper for a thorough introduction to 2-factor designs.

NOTE: If you want to quantify non-additivity, you will need to have samples for the entire matrix of the 2-factor design. That means you need to have at least 2 replicates for each square in the grid below:

	Factor 1 Control	Factor 1 perturbation
Factor 2 Control	Wild-type	Perturbation 1 phenotype
Factor 2 perturbation	Perturbation 2 phenotype	Epistasis!

I want to do a 2-factor analysis, but the button is disabled!

Have you filled out the complete experimental design matrix? For a 2-factor analysis, there are 4 experimental conditions that must be measured with at least 2 samples per condition (see above).

What tool(s) are used in the Quality Control step?

What tool(s) are used in the Read Alignment & Quantification step?

Kallisto

What tool(s) are used in the Differential Expression Analysis step?

How can I see the arguments used for each tool?

The output of each analysis contains lines starting with a hash symbol "#" which contain important information about the exact command and arguments used to run a specific tool. The output can be viewed either at the Analysis Progress page or in a .txt file within the project download.

I got a message saying there was an error during my analysis. What should I do?

Most issues are resolved by either refreshing the page or retrying the analysis. There will be a red retry icon located next to the project status indicator. If the issue persists, please contact WormBase with your project ID and analysis output.

Are there plans to support other tools?

Not at the moment. Alaska is meant as a minimalist tool for molecular biologists and is developed by a very small group of people.

How long does analysis usually take?

After you've filled out the metadata, analysis can take anywhere from 30 minutes to a couple of hours. This is not including the time your project may be held in the queue.

Do I have to submit my reads to the GEO through Alaska?

No. You may download a pre-archived file with everything you need to submit to the GEO yourself.

Frequently Asked Questions