Fourth Street Photos

The Fourth Street photos show some promise, I think.  And so does the software, though it wasn’t until Sunday afternoon that I got it reliably running all the way through the chain without being kicked manually (does that sound any better than “kicked by hand”?).  Last night I dropped a batch in incoming and crashed, and it went through, and this morning I added a bunch more from the music session last night, and those went through, so I’m cautiously optimistic.

Friday Saturday Sunday

And fish fest yet to come (probably not a lot of photos from that).

Photo Blogging

I’m playing with the ability to post pictures quickly and semi-automatically — not all pictures I take, just ones I select. Photos posted this way will appear in snapshot gallery entries named “Photo Blog mm/dd“.

Vertical pictures are currently being presented on their side.  Other than that, the current collection of hacks seems to be hanging together a little bit.

We’ll see.

The underlying tech: the cheapest Eye-Fi card, home, hotel (Fourth Street) or Sprint Overdrive Wi-fi, ftp upload to a special user under my Dreamhost account, a cron job to pick the photos up and post them to the snapshot album.

Other than bug fixes, planned changes include: RSS feed, latest photo thumbnail in main blog sidebar, possibly creating a blog entry when a photo blog entry is created (but not updated), web access to set photo caption and description (or delete some).

Server Upgrade Chronicles IIb

Recreated wrack, the USB backup drive with the oldest data on it, using my updated scripts, and started a full backup, also with the updated scripts.  Was running fine when I went to bed.

Seems to be hung when I got up this morning, dammit.  System and pools are responsive, but there’s been no progress and no disk IO since I first checked when I got up. Haven’t tried to kill any processes yet; waiting to see if the zfs-discuss list has any data-gathering suggestions.

This older software version doesn’t support the -p option in zfs send, but that won’t be the cause of the hang; that will simply require me to recreate some key properties manually if I have to restore from backup.

ETA: My detailed report on the zfs-discuss mailing list.

It’s sitting at the same spot after work, after sitting all day.  Offiicially hung.

I wonder what it will take to stop the backup and export the pool?  Well, that’s nice; a straight “kill” terminated the processes, at least.

zpool status shows no errors. zfs list shows backup filesystems mounted.

zpool export -f is running…no disk I/O now…starting to look hung.

Ah, the zfs receive process is still in the process table.  kill -9 doesn’t help.

Kill and kill -9 won’t touch the zpool export process, either.

Pulling the USB cable on the drive doesn’t seem to be helping any either.

zfs list now hangs, but giving it a little longer just in case.

Kill -9 doesn’t touch any of the hung jobs.

Closing the ssh sessions doesn’t touch any of them either.

zfs list on pools other than bup-wrack works. zpool list works, and shows bup-wrack.

Attempting to set failmode=continue gives an I/O error.

Plugging the USB back in and then setting failmode gives the same I/O error.

cfgadm -al lists known disk drives and usb3/9 as “usb-storage connected”. I think that’s the USB disk that’s stuck.

cfgadm -cremove usb3/9 failed “configuration operation not supported”.

cfgadm -cdisconnect usb3/9 queried if I wanted to suspend activity, then failed with “cannot issue devctl to ap_id: /devices/pci@0,0/pci10de,cb84@2,1:9”

Still -al the same.

cfgadm -cunconfigure same error as disconnect.

I was able to list properties on bup-wrack:

bash-3.2$ zpool get all bup-wrack
NAME       PROPERTY       VALUE               SOURCE
bup-wrack  size           928G                -
bup-wrack  used           438G                -
bup-wrack  available      490G                -
bup-wrack  capacity       47%                 -
bup-wrack  altroot        /backups/bup-wrack  local
bup-wrack  health         UNAVAIL             -
bup-wrack  guid           2209605264342513453  default
bup-wrack  version        14                  default
bup-wrack  bootfs         -                   default
bup-wrack  delegation     on                  default
bup-wrack  autoreplace    off                 default
bup-wrack  cachefile      none                local
bup-wrack  failmode       wait                default
bup-wrack  listsnapshots  off                 default

It’s not healthy, alright. And the attempt to set failmode really did fail.

ETA: So I had to reboot.  However, that worked fine, and I recreated the pool, and I ran the same full backup script overnight, and it completed successfully.  Took 392:23, a bit over 6 hours, but it completed. (Something broke the ssh connection, but luckily I had run the backup under screen, so it just got detached and I could reconnect and see what happened. And it was making a log file, anyway.)

There’s a ‘cut’ version error in some of my after-backup processing that I’ll need to fix.

Server Upgrade Chronicles IIa

Didn’t find any useful information.  Asked today online, and found t/3 of what I need at least, so I can revisit the install and upgrade tomorrow (got better things to do tonight). I need to resolve the uncertainties about update before considering doing it to the real server. Maybe I should just wait for the next stable release, due in March and expected in April.

Meanwhile, yesterday the new SAS disk controller arrived. I’ve now got all the hardware to install into the box, and I can do that without changing anything associated with the current boot or data disks, so I probably will. Try things out on the new hardware and new disks, before cutting over.

Meanwhile, not sure why WordPress is displaying the time in UTC.

ETA: Okay, install log is fairly clean.  There’s “device pciclass,030000@3(display#0) keeps up device sd@0,0(sd#0), but the latter is not power managed” and “/usr/lib/powerd: [ID387247 daemon.error] Able to open /dev/srn” and “SUNW_piclmemcfg init mc failed!”

Added Emacs.  Going to shutdown, snapshot, and start update from there.

ETA: On boot, several popup errors, and this in the logs:

Feb  5 07:47:08 osol-play-002 nwamd[37]: [ID 116842 daemon.error] sysevent_bind_handle: Permission denied
Feb  5 07:47:11 osol-play-002 genunix: [ID 127566 kern.info] device pciclass,030000@2(display#0) keeps up device sd@0,0(sd#0), but the former is not power managed
Feb  5 07:47:11 osol-play-002 /usr/lib/power/powerd: [ID 387247 daemon.error] Able to open /dev/srn
Feb  5 07:48:53 osol-play-002 inetd[6089]: [ID 702911 daemon.error] Failed to update state of instance svc:/application/x11/xfs:default in repository: entity not found
Feb  5 07:48:53 osol-play-002 inetd[6089]: [ID 702911 daemon.error] Failed to get instance for svc:/application/x11/xfs:default
Feb  5 07:48:53 osol-play-002 inetd[6089]: [ID 702911 daemon.error] Failed to update state of instance svc:/application/x11/xfs:default in repository: No such file or directory
Feb  5 07:48:53 osol-play-002 inetd[6089]: [ID 702911 daemon.error] Failed to update state of instance svc:/application/x11/xfs:default in repository: entity not found
Feb  5 07:48:53 osol-play-002 inetd[6089]: [ID 702911 daemon.error] Failed to get instance for svc:/application/x11/xfs:default
Feb  5 07:48:53 osol-play-002 inetd[6089]: [ID 702911 daemon.error] Failed to update state of instance svc:/application/x11/xfs:default in repository: No such file or directory
Feb  5 07:49:01 osol-play-002 ip: [ID 224711 kern.warning] WARNING: Memory pressure: TCP defensive mode on

But, unlike the previous attempt, the system did boot. I halted and snapshotted it immediately after the boot.

Server Upgrade Chronicles II

Good news in automatic email: a bug I filed is fixed in build 122. Now, it’s a duplicate of another bug that I apparently failed to find, and there’s been an easy workaround all this time (turns out it was a pointer problem in parsing file paths, triggered by not having a “/” at the end of a directory path). This was apparently what was blocking my ability to do incremental backups with ZFS send/receive.

This makes software update key, not that it wasn’t already.

So I have had to re-install Virtualbox (because VMWare player won’t work with virtual machines on my network drive, whereas VirtualBox will), and reinstall Solaris. Then I will learn how to upgrade to various builds, because I’m ashamed to say I don’t know how except to “current”, which may not be the place to be.

Huh; almost looks like there isn’t a way.In future, I can update more often, and keep the old snapshots around. Though that doesn’t give any way to reinstall if what I really need is an old version.

I’m updating a virtual system, to test techniques and such. It’s downloading very slowly, equally slowly in bridged or NAT mode. So it’s not going to be done tonight, which means the testing will be delayed and the actual upgrade thus even more delayed. Well, things take time.

So far, knock on wood, nothing has gone terribly wrong.

ETA: The update (switching to the dev branch) completed overnight, with a number of errors. The new Boot Environment doesn’t come all the way up. No time to check more this morning.