A parser for Google Scholar's citations list

Posted on Nov 18, 2012

Scripts to parse the "citations page" of Google Scholar written in Python and using scholar.py.
Download from github https://github.com/carlosp420/google_scholar_parser

Usage:

  • Get citations for a publication using its DOI:
    python scholar.py -c 1 10.1111/j.1096-3642.2009.00627.x
  • Output:
    Title The radiation of Satyrini butterflies (Nymphalidae: Satyrinae)...
    URL http://onlinelibrary.wiley.com/doi/10.1111/j.1096-3642.2009.00627.x/full 
    Citations 14
    Versions 6
    Citations list http://scholar.google.com/scholar?cites=13407052944292989945&as_sdt=2005&sciodt=0,5&hl=en&num=1
    Versions list http://scholar.google.com/scholar?cluster=13407052944292989945&hl=en&num=1&as_sdt=0,5
    Year 2011
    
  • Grab the Citations list page:
    http://scholar.google.com/scholar?cites=13407052944292989945
    And feed it to the script scholar_cites.py:
    python scholar_cites.py http://scholar.google.com/scholar?cites=13407052944292989945
    So you will get all the DOIs of publications citing your article (up to 100 DOIs):
    10.1111/j.1463-6409.2010.00421.x
    10.1146/annurev-ecolsys-102710-145024
    10.1111/j.1420-9101.2011.02352.x  
    10.1111/j.1439-0469.2010.00587.x
    


VoSeq: a voucher and DNA sequence database for phylogenetic analysis

Posted on May 14, 2012

VoSeq is a database to store voucher and DNA sequence data for phylogenetic analysis.
Please let us know if you find bugs: Carlos Peña (mycalesis@gmail.com) or Tobias Malm (tobias.malm@uef.fi).

Citation:

Peña, C. & Malm, T. 2012. VoSeq: a Voucher and DNA Sequence Web Application. PLoS ONE, 7(6): e39071. doi:10.1371/journal.pone.0039071

Documentation
Please find the documentation here http://nymphalidae.utu.fi/cpena/VoSeq_docu.html
A test installation containing sample data is here:
VoSeq is multi-platform software.

Requirements
* Apache web server
* PHP
* MySQL

Contributing
VoSeq is an open source project, you are welcome to contribute.
Open source code and downloads here: https://github.com/carlosp420/VoSeq/tags

molecular phylogenetics, database, web application


Partitioned Bermer Support script for TNT: pbsup.run

Posted on Sept 23, 2011

If you have a dataset with several partitions (morphological and molecular data or several gene sequences) and you want to calculate the Partitioned Bremer Support (PBS) values for nodes using the Cladistic software TNT, you may want to try out my pbsup.run script.

It seems that there is more than one way to calculate Bremer support and Partitioned Bremer Support values. This script uses the methodology proposed by:
Gatesy, J. et al. 1999. Cladistics, 15, 271-313. doi:10.1111/j.1096-0031.1999.tb00268.x

This script was first used in our paper:

Peña, C., Wahlberg, N., Weingartner, E., Kodandaramaiah, U., Nylin, S., Freitas, A. V. L., & Brower, A. V. Z. (2006). Higher level phylogeny of Satyrinae butterflies (Lepidoptera: Nymphalidae) based on DNA sequence data. Molecular Phylogenetics and Evolution, 40(1), 29-49. doi:10.1016/j.ympev.2006.02.007

Please cite it if you find the pbsup.run script useful.

* Download the last version of the script [Updated Feb 2012]: pbsup.run

* An early version of the script can be found at the TNT website:
http://www.zmuc.dk/public/phylogeny/TNT/scripts/pbsup.run

  1. Your data should be in a file named dataset.tnt
  2. Each of your partitions should be in a different "block". It doesn't work if you have all your partitions in one continuous line. Thus, your data should be interleaved. It is important to precede each block with &[dna] or &[num] (if your partition is molecular data). Something like this:
    nstates dna
    xread
    'Exported by .......'
    4494 95
    
    &[dna]
    Aus_sp1 ACTAGACAGGATTA
    Aus_sp2 AGCAGAGCCAATAA
    ...
    ...
    
    &[dna]
    Aus_sp1 AACTTATATTGATAGCA
    Aus_sp2 AAGACGATAGACAGTAT
    ...
    ;
    proc/;
    
  3. Find all your most parsimonious trees and save them in parenthetical notation in a file named alltrees.tre -> Commands: tsave *alltrees.tre; save; tsave/;
  4. Calculate a strict consensus tree and save it (Commands: nelsen*; tsave *base.tre; save/; tsave/;) in a file named base.tre
  5. All files (including the script pbsup.run should be in the same folder.
  6. Enter TNT and type in the command line: pbsup N;
  7. N is the number of partitions you have in your dataset.
  8. When the script is done, you will see a new file pbs.out which is actually the strict consensus tree with the Partitioned Bremer Support values attached to their respective node.
  9. To see the tree and PBS values, enter TNT, open your dataset, load your pbs.out tree (Command proc pbs.out) and type the command ttag; or selecting Trees/MultipleTags/ShowSave if you are using the windows point and click version of TNT.

Troubleshooting

Posted on Feb 16, 2012 If you have big datasets (more than 250 taxa) you will need to increase the RAM memory allocated to TNT. By default TNT uses only 16MB of your computer's RAM memory.
  1. For example, you can give up to 200MB of RAM to TNT with the command:    mxram 200;
  2. For 300 taxa, the pbsup.run script needs to use little more than 600 array cells (by default around 350 are available). Use this command to set up to 15 loops and 2000 array cells:    macro*15 2000;
  3. You can include these commands in your scripts provided that they are executed before macros are enabled and before you read your data (otherwise the data will be lost and the macro* command will not take effect).
  4. The algorithms and parameters for the tree search strategy is in the script's line 55, you can change it according to your needs:
    xmult: level 10 fuse 5 drift 30 rss css xss rat 50;
  5. You might notice that the sum of PBS values doesn't exactly match the Bremer values. The sum might be off by 0.1-0.3 units. The most likely reason is that the pbsup.run does a rounding to 1 decimal during the calculations. One way to fix this is to change the rounding in the script to 2 or 3 decimals. For example, in the script, look for the code:
    macfloat 1
    and change it to
    macfloat 3
    (it appears twice in the code).
+Carlos Peña