87. iNaturalist Identification 1.2 a

iNaturalist Identification 1.2 available for download from “Transfer Big Files”:

http://tbf.me/a/BxAlcw (The link expires on February 10th).

Fixes included: configurable main window height ; API daily quota enforced (10k requests/day) ; “only_id=true” and “id_below” in observations pages requests ; grouped observations data download (resulting in half less API requests) ; ignore observations with identifications, still returned despite the filter “identified=false” for reviewing only the “Unknown” (cf. “User has opted-out of Community Taxon”).

Tracking: in every API request, the header ‘user-agent’ identifies the tool as “iNaturalist Identification/1.2”.

How to reduce the number of API requests?

Select “Skip observations submitted after: YYYY” at application startup.

Or use the same option in the settings file, for instance: SkipObservationsSubmittedAfter=2017

Less results requires less requests to keep these results uptodate.

Available for download:

iNaturalist Identification 1.2 - Deliverable.zip

It contains a minimal setup of the tool.

iNaturalist Identification 1.2 - Deliverable ; Observations preloaded.zip

It contains a setup of the tool and 7 search queries (Benin, Bolivia, Denmark, France, Netherlands, Philippines, Taiwan) and 8 AI based filters configured, and the data of all “Unknown” observations matching these search queries (51850 observations). About these search queries, see also:

https://forum.inaturalist.org/t/are-there-too-many-new-observations-to-identify/16109/92

iNaturalist Identification 1.2 - Other observations preloaded.zip

22 search queries (Russia, and many places in America: Central America, South America, Mexico, Bahamas, Greater Antilles, Lesser Antilles, Alabama, Arizona, Arkansas, California, Florida, Georgia, Hawaii, Louisiana, Mississippi, New Mexico, North Carolina, Oklahoma, South Carolina, Tennessee, Texas) and the data of all “Unknown” observations matching these search queries (201000 observations).

Setup:

Unzip. No installation or uninstallation process.

Get a token from this URL and copy/paste it in the “iNatIdentify - Settings.txt” file:

https://www.inaturalist.org/users/api_token

The token is required and enables the tool to submit IDs and comments on behalf of you.

Simply double-click on “iNatIdentify.exe” to run the tool.

Presentations of the tool while developing it:

https://forum.inaturalist.org/t/amount-of-unknown-records-is-decreasing/8594/394

https://forum.inaturalist.org/t/amount-of-unknown-records-is-decreasing/8594/405

https://forum.inaturalist.org/t/amount-of-unknown-records-is-decreasing/8594/455

https://forum.inaturalist.org/t/search-and-filter-identifications/1304/50

ndpoint – one request for each unknown observation in a given set – which may not be a good thing. it probably also hammers the observation API endpoint to get the observations, though in this case at least it can do this as one request per 200 observations. the response from the observation endpoint is still fairly large. so there’s probably a lot of data being retrieved and transferred in that case.)


I made every effort possible to spare the server resources. This tool makes 2 requests for each observation in a result page (200 observations/page): 1 request to get the observation data and 1 request to get the AI suggestions (including the taxa descriptions). The process is limited to a total of 60 requests/minute. Parallelization helps to reach this limit, although it takes several seconds to get the response to a request.

Note that I distribute many preloaded observations (observations data + AI suggestions + taxonomy) with the tool, so that you can start using the tool and ID many observations without almost downloading anything. (In that case it would spare the server ressources better than using the web application instead).

Should this tool have a large success (?..), and should the server resources become an issue, it would be then possible to create a feature request for providing bulk data to download (that would NOT need to be updated often). I mean providing files to download similar to those presently generated by the tool, for every place defined in the search queries:

image

About the web application:

There are things in the web application that do not spare the server resources as much as possible:

https://forum.inaturalist.org/t/ideas-for-a-revamped-explore-observations-search-page/8439/104

https://forum.inaturalist.org/t/on-observation-detail-page-show-on-the-map-any-taxon-selected/19561/5

(Ultimately, only a server side measure could tell for what the resources are spent.)

true, but there are some recommended practices (https://www.inaturalist.org/pages/api+recommended+practices), and in the context of what you’ve written about your tool, these points seem particularly relevant:

  • Please keep requests to about 1 per second, and around 10k API requests a day
  • We may block IPs that consistently exceed these limits
  • The API is meant to be used for building applications and for fetching small to medium batches of data. It is not meant to be a way to download data in bulk

it looks to me like a user just running your tool without understanding the nuances of what it is doing might easily inadvertently exceed the 10k req/day limit, leading to a block of their IP. so i guess something like this just makes me nervous for the user who uses it in unexpected ways.

All the photos are not referenced in A, but are referenced in B.

This is a reason to go on performing B, and not only A.

It does take a very long time to download many observations.

Only then, the user defined taxon based filters can be applied to the observations downloaded.

(An observation (data + AI) is downloaded only once and saved on the disk).

I never got my IP blocked because of the amount downloaded, even when downloading observations almost non-stop for several days. So, we need not be nervous about possible consequences for the users of the tool. (And I tested the tool for a long period of time at 90 requests/minute, without issue, so I am even more confident at 60 requests/minute).

I got blocked soon (but only for a very short period of time) whenever the scheduling was buggy, after a burst of requests was submitted in a very short period of time.

I figured out that even 60 requests/minutes was not supported for the downloading of taxonomic data, so that a slower scheduling is automatically applied in that case. Such taxonomic requests would happen if the taxonomic cache is deleted (you might wish to delete it if you want to get the common names in a different language). Such taxonomic requests also happen when the user defines a new filter, for generating an “Overview” taxonomy in relation to the new taxon based filter. I made extensive tests to avoid being blocked ever.

Should the “HTTP Error 429, Too Many Requests” still happen, the tool would immediately suspend all requests for 5 minutes and display a message “Suspended…” in the status bar.

The user may also reduce the nominal frequency, in the settings file, if something bad happens:

image

(I anticipate a possible future change in the rules or in the server behavior, without blocking the users until a tool update is made available).

This concern depends on your taxon based filter(s). For instance, “Phylum Tracheophyta” is a very high rank filter and you get too many observations to review. On the contrary, I could review all unidentified “Subfamily Caesalpinioideae” (lower rank filter) observations in Benin, Bolivia, Philippines, Taiwan (there were not that many). Note that filtering at a low taxonomic rank has motivated the development of this tool. The need for such a tool is lower if your interest is taxonomically broader.

How to reduce the number of observations downloaded by the tool?

  • You may select “Skip observations submitted after: 2017” at application startup. This will considerably reduce the amount of observations to download and/or to update. This seems to be a good answer to this concern, and encourages you to “purge” the oldest unknown observations. (I will add such an option in the settings file, so that you don’t need to select it at every startup).
  • You may also change the “MinDaysBeforeDisplay” (a mandatory option I added to “stabilize” the tool behavior (that became less important after reworking/optimizing the tool)), for instance switching it from the default value “2 days” to a relatively high value, for instance “150 days”, to prevent downloading new observations for a long period of time. In 150 days, switch back to “2 days”, let the tool download all the new observations (only those not identified by other reviewers meanwhile) and switch back to “150 days”.

Another aspect (also relevant to the web application) is that you want to see unknown observations that are still unknown. At startup, the tool has to request again pages of observations, in order to remove from the local cache (and from the display) the observations that do not match anymore the search query(ies), I mean to remove the observations that are not anymore unknown.

Note that this could be optimized by a new API feature, for providing in one request/response all the observations IDs matching a search query (without providing any observation data at all).

In short, the ability to define AI-based-and-custom-(low-rank)-taxon-based filters requires (in general) to download many observations (once), as long as the API does not offer “AI-assisted occurrence searches” (this topic). Then, as for the web application, it is required to keep uptodate what is displayed (to remove results not relevant anymore). The tool presently offers options to reduce the number of requests performed.

New API features (an AI based filter (this topic) ; a request to get all observations IDs matching a search query, without getting any other data than the IDs) could further reduce the usage of server resources.

BTW, another API feature that could help (at the margin) would be filtering API results by the observation ID, instead of filtering by the date submitted. I mean: a request to get (pages of) observations with IDs lower than 60000000 for instance (approximately equivalent to a date submitted earlier than “Sep 18, 2020”). Because of time zones, these filters are not equivalent. At some point, if we don’t want to miss observations, it is needed to submit overlapping (+/- 1 day) requests, and this is what the tool does. (The reason is that a request is over after we retrieved 50 pages of 200 observations/page. Another request, with another date filter, is required to get the next observations).

One may answer that, if I need, I may try to take into account the observation time zone (date submitted), to end up with something equivalent to an ID based filter. While trying, I found soon 2 observations at almost the same location in Florida that were registered with different time zones (with a gap of several hours). I didn’t investigate further.

Anyway, this would become pointless if there would exist “a request to get all observations IDs matching a search query, without getting any other data than the IDs” as suggested above.

Artificial Intelligence trends

APPLE Podcast: https://apple.co/32cYgdV
Spotify: https://spoti.fi/37Tk5CJ
WEB: https://bit.ly/3mASQkC
https://en.wikipedia.org/wiki/GPT-3
https://www.stiebel-eltron.nl/content/dam/ste/nl/services/Downloads/Planningsdocumenten/Ontwerp%20en%20installatie%20warmtepompen.pdf
https://www.lente-akkoord.nl/wp-content/uploads/2011/12/dos-en-donts-warmtepompen.pdf

https://www.xiaomiproducts.nl/x96-mini-android-tv-box.htm
Over alle artificial intelligence trends van dit moment én die van de toekomst.

https://www.youtube.com/watch?v=ovWaQlr6hxY

https://www.youtube.com/watch?v=ovWaQlr6hxY Friesland Campus Wetenschapslezing Maarten van Lonen
Lezing Gierzwaluwen door Johan Tinholt
https://www.youtube.com/watch?v=ovWaQlr6hxY

Friesland Campus Wetenschapslezing Maarten van Lonen
https://www.youtube.com/watch?v=ovWaQlr6hxY

https://www.youtube.com/watch?v=y4AH0dcUsZA Lezing Gierzwaluwen door Johan Tinholt

Lezing Gierzwaluwen door Johan Tinholt
https://www.youtube.com/watch?v=y4AH0dcUsZA
Het JKB Klimaatkandidatendebat

Het JKB Klimaatkandidatendebat
https://www.youtube.com/watch?v=itU5aPcQX1A&feature=youtu.be
Het JKB Klimaatkandidatendebat

https://www.youtube.com/watch?v=X7gVUrEDf_0

https://www.youtube.com/watch?v=X7gVUrEDf_0

https://www.youtube.com/watch?v=X7gVUrEDf_0 Community-powered biodiversity conservation: how your observations can guide coastal management

Wanted to invite those in the iNaturalist community who are interested in hearing more about the use of iNat data to a webinar we’re hosting next week - hope some of you will join us! Just a note that the webinar registration is full (or close to full), but it will be live streamed on YouTube at the same time and we’ll be monitoring the chat there to bring questions into the webinar.

The Community Science team at the California Academy of Sciences would like to invite you to join us for a webinar “Community-powered biodiversity conservation: how your observations can guide management on the California coast”.

Posted by ahospers ahospers, June 03, 2021 10:54 PM

Comments

No comments yet.

Add a Comment

Sign In or Sign Up to add comments